Package 'seqhandbook' reference manual

Title:	Miscellaneous Tools for Sequence Analysis
Description:	It provides miscellaneous sequence analysis functions for describing episodes in individual sequences, measuring association between domains in multidimensional sequence analysis (see Piccarreta (2017) <doi:10.1177/0049124115591013>), heat maps of sequence data, Globally Interdependent Multidimensional Sequence Analysis (see Robette et al (2015) <doi:10.1177/0081175015570976>), smoothing sequences for index plots (see Piccarreta (2012) <doi:10.1177/0049124112452394>), coding sequences for Qualitative Harmonic Analysis (see Deville (1982)), measuring stress from multidimensional scaling factors (see Piccarreta and Lior (2010) <doi:10.1111/j.1467-985X.2009.00606.x>), symmetrical (or canonical) Partial Least Squares (see Bry (1996)).
Authors:	Nicolas Robette
Maintainer:	Nicolas Robette <[email protected]>
License:	GPL (>= 2)
Version:	0.1.1
Built:	2025-02-21 04:16:07 UTC
Source:	https://github.com/nicolas-robette/seqhandbook

Association measures between domains in multidimensional sequence analysis

Description

Computes various measures of association between dimensions of multidimensional sequence data.

Usage

assoc.domains(dlist, names, djsa)
assoc.domains(dlist, names, djsa)

Arguments

`dlist`	A list of dissimilarity matrices or dist objects (see `dist`), with one element per dimension of the multidimensional sequence data
`names`	A character vector of the names of the dimensions of the multidimensional sequence data
`djsa`	A dissimilarity matrix or a dist object (see `dist`), corresponding to the distances between the multimdimensional sequences

Author(s)

Nicolas Robette

References

Piccarreta R. (2017). Joint Sequence Analysis: Association and Clustering, Sociological Methods and Research, Vol. 46(2), 252-287.

Examples


library(TraMineR)
data(biofam)

## Building one channel per type of event (left, children or married)
bf <- as.matrix(biofam[, 10:25])
children <-  bf==4 | bf==5 | bf==6
married <- bf == 2 | bf== 3 | bf==6
left <- bf==1 | bf==3 | bf==5 | bf==6

## Building sequence objects
child.seq <- seqdef(children)
marr.seq <- seqdef(married)
left.seq <- seqdef(left)

## Using Hamming distance
mcdist <- seqdistmc(channels=list(child.seq, marr.seq, left.seq),
 	method="HAM")
child.dist <- seqdist(child.seq, method="HAM")
marr.dist <- seqdist(marr.seq, method="HAM")
left.dist <- seqdist(left.seq, method="HAM")

## Association between domains
asso <- assoc.domains(list(child.dist,marr.dist,left.dist), c('child','marr','left'), mcdist)
asso

library(TraMineR)
data(biofam)

## Building one channel per type of event (left, children or married)
bf <- as.matrix(biofam[, 10:25])
children <-  bf==4 | bf==5 | bf==6
married <- bf == 2 | bf== 3 | bf==6
left <- bf==1 | bf==3 | bf==5 | bf==6

## Building sequence objects
child.seq <- seqdef(children)
marr.seq <- seqdef(married)
left.seq <- seqdef(left)

## Using Hamming distance
mcdist <- seqdistmc(channels=list(child.seq, marr.seq, left.seq),
 	method="HAM")
child.dist <- seqdist(child.seq, method="HAM")
marr.dist <- seqdist(marr.seq, method="HAM")
left.dist <- seqdist(left.seq, method="HAM")

## Association between domains
asso <- assoc.domains(list(child.dist,marr.dist,left.dist), c('child','marr','left'), mcdist)
asso

Index plot of sequences ordered according to a dendrogram

Description

Index plot of state sequences. Sequences are ordered according to the specified dendrogram. The dendrogram is also plotted on the side of the index plot.

Usage

seq_heatmap(seq, tree, with.missing = FALSE, ...)
seq_heatmap(seq, tree, with.missing = FALSE, ...)

Arguments

`seq`	a state sequence object created with the `seqdef` function
`tree`	a dendrogram of the sequences (an object of class `hclust`, `dendrogram` or `agnes`)
`with.missing`	is there a 'missing value' state in the sequences?
`...`	additional parameters sent to `heatmap`

Source

http://joseph.larmarange.net/?Representer-un-tapis-de-sequences

Examples

if (require(TraMineR)) {
  data(mvad)
  mvad.seq <- seqdef(mvad[,17:86])
  mvad.lcs <- seqdist(mvad.seq, method = "LCS")
  mvad.hc <- hclust(as.dist(mvad.lcs), method = "ward.D2")
  seq_heatmap(mvad.seq, mvad.hc)
}
if (require(TraMineR)) {
  data(mvad)
  mvad.seq <- seqdef(mvad[,17:86])
  mvad.lcs <- seqdist(mvad.seq, method = "LCS")
  mvad.hc <- hclust(as.dist(mvad.lcs), method = "ward.D2")
  seq_heatmap(mvad.seq, mvad.hc)
}

Recoding sequences for qualitative harmonic analysis

Description

Recodes sequence data into the shape used for qualitative harmonic analysis.

Usage

seq2qha(seqdata, periods)
seq2qha(seqdata, periods)

Arguments

`seqdata`	a sequence object (see `seqdef` function).
`periods`	numeric vector of the first positions of the periods used for recoding

Value

A data frame with one column by combination of period and state (i.e. number of columns = number of periods * number of states in the alphabet).

Author(s)

Nicolas Robette

References

Robette N., Thibault N. (2008). Comparing qualitative harmonic analysis and optimal matching. An exploratory study of occupational trajectories, Population-E, Vol. 64(3), 533-556. Deville J-C. (1982). Analyse de données chronologiques qualitatives: comment analyser des calendriers ?, Annales de l’INSEE, 45, 45-104. Deville J-C., Saporta G. (1980). Analyse harmonique qualitative, in Data analysis and informatics, E.Diday (ed.), Amsterdam, North Holland Publishing, 375-389.

Examples

data(trajact)
seqact <- seqdef(trajact)
qha <- seq2qha(seqact, periods=c(1,3,7,12,24))
head(qha)
data(trajact)
seqact <- seqdef(trajact)
qha <- seq2qha(seqact, periods=c(1,3,7,12,24))
head(qha)

Sample of mothers and daughters employment histories

Description

A data frame describing mothers employment histories from age 14 to 60 and daughters employment histories from the completion of education to 15 years later. Sequences are sampled (N = 400) from "Biographies et entourage" survey (INED, 2001).

Usage

data("seqgimsa")data("seqgimsa")

Format

A data frame with 400 observations and 62 numeric variables. The first 15 variables (prefixed 'f') describe the daughters employment status a given year : 1 = education, 2 = inactivity, 3 = part-time job, 4 = full-time job. The following 47 variables (prefixed 'm') describe the mothers employment status at a given age : 1 = self-employment, 3 = higher level or intermediate occupation, 5 = lower level occupation, 8 = inactivity, 9 = education.

Examples

data(seqgimsa)
str(seqgimsa)
data(seqgimsa)
str(seqgimsa)

At least one episode in the states

Description

Returns whether each sequence comprises at least one episode in the states.

Usage

seqi1epi(seqdata)
seqi1epi(seqdata)

Arguments

seqdata

a sequence object (see seqdef function).

Author(s)

Nicolas Robette

References

Gabadinho, A., G. Ritschard, N. S. Müller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37.

Examples

data(trajact)
seqact <- seqdef(trajact)
stat <- seqi1epi(seqact)
head(stat)
data(trajact)
seqact <- seqdef(trajact)
stat <- seqi1epi(seqact)
head(stat)

First position in each state

Description

Returns the first position in each state.

Usage

seqifpos(seqdata)
seqifpos(seqdata)

Arguments

seqdata

a sequence object (see seqdef function).

Author(s)

Nicolas Robette

References

Gabadinho, A., G. Ritschard, N. S. Müller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37.

Examples

data(trajact)
seqact <- seqdef(trajact)
stat <- seqifpos(seqact)
head(stat)
data(trajact)
seqact <- seqdef(trajact)
stat <- seqifpos(seqact)
head(stat)

Number of episodes in each state

Description

Returns the number of episodes in the states.

Usage

seqinepi(seqdata)
seqinepi(seqdata)

Arguments

seqdata

a sequence object (see seqdef function).

Author(s)

Nicolas Robette

References

Gabadinho, A., G. Ritschard, N. S. Müller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37.

Examples

data(trajact)
seqact <- seqdef(trajact)
stat <- seqinepi(seqact)
head(stat)
data(trajact)
seqact <- seqdef(trajact)
stat <- seqinepi(seqact)
head(stat)

Stress measure of multidimensional scaling factors

Description

Computes stress measure of multidimensional scaling data for different number of dimensions of the represented space

Usage

seqmds.stress(seqdist, mds)
seqmds.stress(seqdist, mds)

Arguments

`seqdist`	a dissimilarity matrix or a dist object (see `dist`)
`mds`	a matrix with coordinates in the represented space (dimension 1 in column 1, dimension 2 in column 2, etc.)

Value

A numerical vector of stress values.

Author(s)

Nicolas Robette

References

Piccarreta R., Lior O. (2010). Exploring sequences: a graphical tool based on multi-dimensional scaling, Journal of the Royal Statistical Society (Series A), Vol. 173(1), 165-184.

Examples

data(trajact)
seqact <- seqdef(trajact)
dissim <- seqdist(seqact, method="HAM")
mds <- cmdscale(dissim, k=20, eig=TRUE)
stress <- seqmds.stress(dissim, mds)
plot(stress, type='l', xlab='number of dimensions', ylab='stress')
data(trajact)
seqact <- seqdef(trajact)
dissim <- seqdist(seqact, method="HAM")
mds <- cmdscale(dissim, k=20, eig=TRUE)
stress <- seqmds.stress(dissim, mds)
plot(stress, type='l', xlab='number of dimensions', ylab='stress')

Sample of marital, parental and residential sequences

Description

A data frame describing respectively the matrimonial, parental and residential status from age 14 to age 35. It's sampled (N=500) from "Biographies et entourage" survey (INED, 2001).

Usage

data("seqmsa")data("seqmsa")

Format

A data frame with 500 observations and 66 variables. The first 22 variables (prefixed 'log') describe the residential status at a given age : 0 = not independent, 1 = independent. The next 22 variables (prefixed 'mat') describe the matrimonial status at a given age : 1 = never been in a relationship, 2 = cohabiting union, 3 = married, 4 = separated. The last 22 variables (prefixed 'nenf') describe the parental status at a given age : 0 = no child, 1 = one child, 2 = two children, 3 = three children or more.

Examples

data(seqmsa)
str(seqmsa)
data(seqmsa)
str(seqmsa)

Smoothing sequence data

Description

Smoothing of sequence data, using for each sequence the medoid of the sequences in its neighborhood. The results can be used to get a smoothed index plot.

Usage

seqsmooth(seqdata, diss, k=20, r=NULL)
seqsmooth(seqdata, diss, k=20, r=NULL)

Arguments

`seqdata`	a sequence object (see `seqdef` function).
`diss`	a dissimilarity matrix, giving the pairwise distances between sequences.
`k`	size of the neighborhood. Default is 20.
`r`	radius of the neighborhood. If NULL (default), the radius is not used for smoothing.

Value

A list with the following elements:

`seqdata`	a sequence object (see `seqdef` function)
`R2`	pseudo-R2 measure of the goodness of fit of the smoothing
`S2`	stress measure of the goodness of fit of the smoothing

Author(s)

Nicolas Robette

References

Piccarreta R. (2012). Graphical and Smoothing Techniques for Sequence Analysis, Sociological Methods and Research, Vol. 41(2), 362-380.

Examples

data(trajact)
seqact <- seqdef(trajact)
dissim <- seqdist(seqact, method="LCS")
mds <- cmdscale(dissim, k=1)
smoothed <- seqsmooth(seqact, dissim, k=30)$seqdata
seqIplot(smoothed, sortv=mds, xtlab=14:50, with.legend=FALSE, yaxis=FALSE, ylab=NA)
data(trajact)
seqact <- seqdef(trajact)
dissim <- seqdist(seqact, method="LCS")
mds <- cmdscale(dissim, k=1)
smoothed <- seqsmooth(seqact, dissim, k=30)$seqdata
seqIplot(smoothed, sortv=mds, xtlab=14:50, with.legend=FALSE, yaxis=FALSE, ylab=NA)

Sample of sociodemographic variables

Description

A data frame with sociodemographic variables for a sample of 500 interviewees from "Biographies et entourage" survey (INED, 2001).

Usage

data("socdem")data("socdem")

Format

A data frame with 500 observations on the following 9 variables.

annais: year of birth (numeric)
nbenf: number of children (factor)
nbunion: number of relationships (factor)
mereactive: whether mother was active or not (factor)
sexe: gender (factor)
PCS: occupational category (factor)
PCSpere: occupational category of the father (factor)
diplome: degree (factor)
nationalite: nationality (factor)

Examples

data(socdem)
str(socdem)
data(socdem)
str(socdem)

Symmetric (or canonical) PLS

Description

Computes symmetric (or canonical) PLS for two groups of continuous variables

Usage

symPLS(a,b)
symPLS(a,b)

Arguments

`a`	data frame of the first group of continuous variables
`b`	data frame of the second group of continuous variables

Author(s)

Nicolas Robette, Xavier Bry

References

Bry X. (1996). Analyses Factorielles Multiples. Paris, Economica Poche. de Jong S., Wise B.M. and Ricker N.L. (2001). Canonical Partial Least Squares and Continuum Power Regression. Journal of Chemometrics, Vol. 15, 85–100.

Sample of employment histories

Description

A data frame describing the employment status from age 14 to age 50. It's a sample of 500 interviewees from "Biographies et entourage" survey (INED, 2001).

Usage

data("trajact")data("trajact")

Format

A data frame with 500 observations and 37 variables. Each variable is numeric and describes the employment status at a given age : 1 = education, 2 = full-time job, 3 = part-time job, 4 = small jobs, 5 = inactivity, 6 = military service.

Examples

data(trajact)
str(trajact)
data(trajact)
str(trajact)

Package 'seqhandbook'

Help Index

Association measures between domains in multidimensional sequence analysis

Description

Usage

Arguments

Author(s)

References

Examples

Index plot of sequences ordered according to a dendrogram

Description

Usage

Arguments

Source

See Also

Examples

Recoding sequences for qualitative harmonic analysis

Description

Usage

Arguments

Value

Author(s)

References

Examples

Sample of mothers and daughters employment histories

Description

Usage

Format

Examples

At least one episode in the states

Description

Usage

Arguments

Author(s)

References

See Also

Examples

First position in each state

Description

Usage

Arguments

Author(s)

References

See Also

Examples

Number of episodes in each state

Description

Usage

Arguments

Author(s)

References

See Also

Examples

Stress measure of multidimensional scaling factors

Description

Usage

Arguments

Value

Author(s)

References

Examples

Sample of marital, parental and residential sequences

Description

Usage

Format

Examples

Smoothing sequence data

Description

Usage

Arguments

Value

Author(s)

References

Examples

Sample of sociodemographic variables

Description

Usage

Format

Examples

Symmetric (or canonical) PLS