Title: | Miscellaneous Tools for Sequence Analysis |
---|---|
Description: | It provides miscellaneous sequence analysis functions for describing episodes in individual sequences, measuring association between domains in multidimensional sequence analysis (see Piccarreta (2017) <doi:10.1177/0049124115591013>), heat maps of sequence data, Globally Interdependent Multidimensional Sequence Analysis (see Robette et al (2015) <doi:10.1177/0081175015570976>), smoothing sequences for index plots (see Piccarreta (2012) <doi:10.1177/0049124112452394>), coding sequences for Qualitative Harmonic Analysis (see Deville (1982)), measuring stress from multidimensional scaling factors (see Piccarreta and Lior (2010) <doi:10.1111/j.1467-985X.2009.00606.x>), symmetrical (or canonical) Partial Least Squares (see Bry (1996)). |
Authors: | Nicolas Robette |
Maintainer: | Nicolas Robette <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.1 |
Built: | 2025-01-22 03:57:45 UTC |
Source: | https://github.com/nicolas-robette/seqhandbook |
Computes various measures of association between dimensions of multidimensional sequence data.
assoc.domains(dlist, names, djsa)
assoc.domains(dlist, names, djsa)
dlist |
A list of dissimilarity matrices or dist objects (see |
names |
A character vector of the names of the dimensions of the multidimensional sequence data |
djsa |
A dissimilarity matrix or a dist object (see |
Nicolas Robette
Piccarreta R. (2017). Joint Sequence Analysis: Association and Clustering, Sociological Methods and Research, Vol. 46(2), 252-287.
library(TraMineR) data(biofam) ## Building one channel per type of event (left, children or married) bf <- as.matrix(biofam[, 10:25]) children <- bf==4 | bf==5 | bf==6 married <- bf == 2 | bf== 3 | bf==6 left <- bf==1 | bf==3 | bf==5 | bf==6 ## Building sequence objects child.seq <- seqdef(children) marr.seq <- seqdef(married) left.seq <- seqdef(left) ## Using Hamming distance mcdist <- seqdistmc(channels=list(child.seq, marr.seq, left.seq), method="HAM") child.dist <- seqdist(child.seq, method="HAM") marr.dist <- seqdist(marr.seq, method="HAM") left.dist <- seqdist(left.seq, method="HAM") ## Association between domains asso <- assoc.domains(list(child.dist,marr.dist,left.dist), c('child','marr','left'), mcdist) asso
library(TraMineR) data(biofam) ## Building one channel per type of event (left, children or married) bf <- as.matrix(biofam[, 10:25]) children <- bf==4 | bf==5 | bf==6 married <- bf == 2 | bf== 3 | bf==6 left <- bf==1 | bf==3 | bf==5 | bf==6 ## Building sequence objects child.seq <- seqdef(children) marr.seq <- seqdef(married) left.seq <- seqdef(left) ## Using Hamming distance mcdist <- seqdistmc(channels=list(child.seq, marr.seq, left.seq), method="HAM") child.dist <- seqdist(child.seq, method="HAM") marr.dist <- seqdist(marr.seq, method="HAM") left.dist <- seqdist(left.seq, method="HAM") ## Association between domains asso <- assoc.domains(list(child.dist,marr.dist,left.dist), c('child','marr','left'), mcdist) asso
Index plot of state sequences. Sequences are ordered according to the specified dendrogram. The dendrogram is also plotted on the side of the index plot.
seq_heatmap(seq, tree, with.missing = FALSE, ...)
seq_heatmap(seq, tree, with.missing = FALSE, ...)
seq |
a state sequence object created with the |
tree |
a dendrogram of the sequences (an object of class |
with.missing |
is there a 'missing value' state in the sequences? |
... |
additional parameters sent to |
http://joseph.larmarange.net/?Representer-un-tapis-de-sequences
if (require(TraMineR)) { data(mvad) mvad.seq <- seqdef(mvad[,17:86]) mvad.lcs <- seqdist(mvad.seq, method = "LCS") mvad.hc <- hclust(as.dist(mvad.lcs), method = "ward.D2") seq_heatmap(mvad.seq, mvad.hc) }
if (require(TraMineR)) { data(mvad) mvad.seq <- seqdef(mvad[,17:86]) mvad.lcs <- seqdist(mvad.seq, method = "LCS") mvad.hc <- hclust(as.dist(mvad.lcs), method = "ward.D2") seq_heatmap(mvad.seq, mvad.hc) }
Recodes sequence data into the shape used for qualitative harmonic analysis.
seq2qha(seqdata, periods)
seq2qha(seqdata, periods)
seqdata |
a sequence object (see |
periods |
numeric vector of the first positions of the periods used for recoding |
A data frame with one column by combination of period and state (i.e. number of columns = number of periods * number of states in the alphabet).
Nicolas Robette
Robette N., Thibault N. (2008). Comparing qualitative harmonic analysis and optimal matching. An exploratory study of occupational trajectories, Population-E, Vol. 64(3), 533-556. Deville J-C. (1982). Analyse de données chronologiques qualitatives: comment analyser des calendriers ?, Annales de l’INSEE, 45, 45-104. Deville J-C., Saporta G. (1980). Analyse harmonique qualitative, in Data analysis and informatics, E.Diday (ed.), Amsterdam, North Holland Publishing, 375-389.
data(trajact) seqact <- seqdef(trajact) qha <- seq2qha(seqact, periods=c(1,3,7,12,24)) head(qha)
data(trajact) seqact <- seqdef(trajact) qha <- seq2qha(seqact, periods=c(1,3,7,12,24)) head(qha)
A data frame describing mothers employment histories from age 14 to 60 and daughters employment histories from the completion of education to 15 years later. Sequences are sampled (N = 400) from "Biographies et entourage" survey (INED, 2001).
data("seqgimsa")
data("seqgimsa")
A data frame with 400 observations and 62 numeric variables. The first 15 variables (prefixed 'f') describe the daughters employment status a given year : 1 = education, 2 = inactivity, 3 = part-time job, 4 = full-time job. The following 47 variables (prefixed 'm') describe the mothers employment status at a given age : 1 = self-employment, 3 = higher level or intermediate occupation, 5 = lower level occupation, 8 = inactivity, 9 = education.
data(seqgimsa) str(seqgimsa)
data(seqgimsa) str(seqgimsa)
Returns whether each sequence comprises at least one episode in the states.
seqi1epi(seqdata)
seqi1epi(seqdata)
seqdata |
a sequence object (see |
Nicolas Robette
Gabadinho, A., G. Ritschard, N. S. Müller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37.
data(trajact) seqact <- seqdef(trajact) stat <- seqi1epi(seqact) head(stat)
data(trajact) seqact <- seqdef(trajact) stat <- seqi1epi(seqact) head(stat)
Returns the first position in each state.
seqifpos(seqdata)
seqifpos(seqdata)
seqdata |
a sequence object (see |
Nicolas Robette
Gabadinho, A., G. Ritschard, N. S. Müller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37.
data(trajact) seqact <- seqdef(trajact) stat <- seqifpos(seqact) head(stat)
data(trajact) seqact <- seqdef(trajact) stat <- seqifpos(seqact) head(stat)
Returns the number of episodes in the states.
seqinepi(seqdata)
seqinepi(seqdata)
seqdata |
a sequence object (see |
Nicolas Robette
Gabadinho, A., G. Ritschard, N. S. Müller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37.
data(trajact) seqact <- seqdef(trajact) stat <- seqinepi(seqact) head(stat)
data(trajact) seqact <- seqdef(trajact) stat <- seqinepi(seqact) head(stat)
Computes stress measure of multidimensional scaling data for different number of dimensions of the represented space
seqmds.stress(seqdist, mds)
seqmds.stress(seqdist, mds)
seqdist |
a dissimilarity matrix or a dist object (see |
mds |
a matrix with coordinates in the represented space (dimension 1 in column 1, dimension 2 in column 2, etc.) |
A numerical vector of stress values.
Nicolas Robette
Piccarreta R., Lior O. (2010). Exploring sequences: a graphical tool based on multi-dimensional scaling, Journal of the Royal Statistical Society (Series A), Vol. 173(1), 165-184.
data(trajact) seqact <- seqdef(trajact) dissim <- seqdist(seqact, method="HAM") mds <- cmdscale(dissim, k=20, eig=TRUE) stress <- seqmds.stress(dissim, mds) plot(stress, type='l', xlab='number of dimensions', ylab='stress')
data(trajact) seqact <- seqdef(trajact) dissim <- seqdist(seqact, method="HAM") mds <- cmdscale(dissim, k=20, eig=TRUE) stress <- seqmds.stress(dissim, mds) plot(stress, type='l', xlab='number of dimensions', ylab='stress')
A data frame describing respectively the matrimonial, parental and residential status from age 14 to age 35. It's sampled (N=500) from "Biographies et entourage" survey (INED, 2001).
data("seqmsa")
data("seqmsa")
A data frame with 500 observations and 66 variables. The first 22 variables (prefixed 'log') describe the residential status at a given age : 0 = not independent, 1 = independent. The next 22 variables (prefixed 'mat') describe the matrimonial status at a given age : 1 = never been in a relationship, 2 = cohabiting union, 3 = married, 4 = separated. The last 22 variables (prefixed 'nenf') describe the parental status at a given age : 0 = no child, 1 = one child, 2 = two children, 3 = three children or more.
data(seqmsa) str(seqmsa)
data(seqmsa) str(seqmsa)
Smoothing of sequence data, using for each sequence the medoid of the sequences in its neighborhood. The results can be used to get a smoothed index plot.
seqsmooth(seqdata, diss, k=20, r=NULL)
seqsmooth(seqdata, diss, k=20, r=NULL)
seqdata |
a sequence object (see |
diss |
a dissimilarity matrix, giving the pairwise distances between sequences. |
k |
size of the neighborhood. Default is 20. |
r |
radius of the neighborhood. If NULL (default), the radius is not used for smoothing. |
A list with the following elements:
seqdata |
a sequence object (see |
R2 |
pseudo-R2 measure of the goodness of fit of the smoothing |
S2 |
stress measure of the goodness of fit of the smoothing |
Nicolas Robette
Piccarreta R. (2012). Graphical and Smoothing Techniques for Sequence Analysis, Sociological Methods and Research, Vol. 41(2), 362-380.
data(trajact) seqact <- seqdef(trajact) dissim <- seqdist(seqact, method="LCS") mds <- cmdscale(dissim, k=1) smoothed <- seqsmooth(seqact, dissim, k=30)$seqdata seqIplot(smoothed, sortv=mds, xtlab=14:50, with.legend=FALSE, yaxis=FALSE, ylab=NA)
data(trajact) seqact <- seqdef(trajact) dissim <- seqdist(seqact, method="LCS") mds <- cmdscale(dissim, k=1) smoothed <- seqsmooth(seqact, dissim, k=30)$seqdata seqIplot(smoothed, sortv=mds, xtlab=14:50, with.legend=FALSE, yaxis=FALSE, ylab=NA)
A data frame with sociodemographic variables for a sample of 500 interviewees from "Biographies et entourage" survey (INED, 2001).
data("socdem")
data("socdem")
A data frame with 500 observations on the following 9 variables.
annais
year of birth (numeric)
nbenf
number of children (factor)
nbunion
number of relationships (factor)
mereactive
whether mother was active or not (factor)
sexe
gender (factor)
PCS
occupational category (factor)
PCSpere
occupational category of the father (factor)
diplome
degree (factor)
nationalite
nationality (factor)
data(socdem) str(socdem)
data(socdem) str(socdem)
Computes symmetric (or canonical) PLS for two groups of continuous variables
symPLS(a,b)
symPLS(a,b)
a |
data frame of the first group of continuous variables |
b |
data frame of the second group of continuous variables |
Nicolas Robette, Xavier Bry
Bry X. (1996). Analyses Factorielles Multiples. Paris, Economica Poche. de Jong S., Wise B.M. and Ricker N.L. (2001). Canonical Partial Least Squares and Continuum Power Regression. Journal of Chemometrics, Vol. 15, 85–100.
A data frame describing the employment status from age 14 to age 50. It's a sample of 500 interviewees from "Biographies et entourage" survey (INED, 2001).
data("trajact")
data("trajact")
A data frame with 500 observations and 37 variables. Each variable is numeric and describes the employment status at a given age : 1 = education, 2 = full-time job, 3 = part-time job, 4 = small jobs, 5 = inactivity, 6 = military service.
data(trajact) str(trajact)
data(trajact) str(trajact)