Package 'descriptio' reference manual

Title:	Descriptive Statistical Analysis
Description:	Description of statistical associations between variables : measures of local and global association between variables (phi, Cramér V, correlations, eta-squared, Goodman and Kruskal tau, permutation tests, etc.), multiple graphical representations of the associations between variables (using 'ggplot2') and weighted statistics.
Authors:	Nicolas Robette [aut, cre]
Maintainer:	Nicolas Robette <[email protected]>
License:	GPL (>= 2)
Version:	1.4
Built:	2025-03-08 04:47:28 UTC
Source:	https://github.com/nicolas-robette/descriptio

Measures the association between a categorical variable and a continuous variable

Description

Measures the association between a categorical variable and a continuous variable

Usage

assoc.catcont(x, y, weights = NULL,
              na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
              nperm = NULL, distrib = "asympt", digits = 3)
assoc.catcont(x, y, weights = NULL,
              na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
              nperm = NULL, distrib = "asympt", digits = 3)

Arguments

`x`	the categorical variable (must be a factor)
`y`	the continuous variable (must be a numeric vector)
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm.cat`	logical, indicating whether NA values in the categorical variable (i.e. x) should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variable (see na.value.cat argument).
`na.value.cat`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE.
`na.rm.cont`	logical, indicating whether NA values in the continuous variable (i.e. y) should be silently removed before the computation proceeds. Default is FALSE.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`"asympt"`, default) or via Monte Carlo resampling (`"approx"`.
`digits`	integer. The number of digits (default is 3).

Value

A list with the following elements :

`summary`	summary statistics (mean, median, etc.) of the continuous variable for each level of the categorical variable
`eta.squared`	eta-squared between the two variables
`permutation.pvalue`	p-value from a permutation (i.e. non-parametric) test of independence
`cor`	point biserial correlation between the two variables, for each level of the categorical variable
`cor.perm.pval`	permutation p-value of the correlation between the two variables, for each level of the categorical variable
`test.values`	test-values as proposed by Lebart et al (1984)
`test.values.pval`	p-values corresponding to the test-values

Author(s)

Nicolas Robette

References

Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]

Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.

Examples

data(Movies)
with(Movies, assoc.catcont(Country, Budget, nperm = 10))
data(Movies)
with(Movies, assoc.catcont(Country, Budget, nperm = 10))

Measures the groupwise association between a categorical variable and a continuous variable

Description

Measures the association between a categorical variable and a continuous variable, for each category of a group variable

Usage

assoc.catcont.by(x, y, by, weights = NULL,
                 na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
                 nperm = NULL, distrib = "asympt", digits = 3)
assoc.catcont.by(x, y, by, weights = NULL,
                 na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
                 nperm = NULL, distrib = "asympt", digits = 3)

Arguments

`x`	factor : the categorical variable
`y`	numeric vector : the continuous variable
`by`	factor : the group variable
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm.cat`	logical, indicating whether NA values in the categorical variable (i.e. x) should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variable (see na.value.cat argument).
`na.value.cat`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE.
`na.rm.cont`	logical, indicating whether NA values in the continuous variable (i.e. y) should be silently removed before the computation proceeds. Default is FALSE.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`"asympt"`, default) or via Monte Carlo resampling (`"approx"`.
`digits`	integer. The number of digits (default is 3).

Value

A list of items, one for each category of the group variable. Each item is a list with the following elements :

`summary`	summary statistics (mean, median, etc.) of the continuous variable for each level of the categorical variable
`eta.squared`	eta-squared between the two variables
`permutation.pvalue`	p-value from a permutation (i.e. non-parametric) test of independence
`cor`	point biserial correlation between the two variables, for each level of the categorical variable
`cor.perm.pval`	permutation p-value of the correlation between the two variables, for each level of the categorical variable
`test.values`	test-values as proposed by Lebart et al (1984)
`test.values.pval`	p-values corresponding to the test-values

Author(s)

Nicolas Robette

References

Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]

Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.

Examples

data(Movies)
with(Movies, assoc.catcont.by(Country, Budget, ArtHouse, nperm = 10))
data(Movies)
with(Movies, assoc.catcont.by(Country, Budget, ArtHouse, nperm = 10))

Cross-tabulation and measures of association between two categorical variables

Description

Cross-tabulation and measures of association between two categorical variables

Usage

assoc.twocat(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs",
             nperm = NULL, distrib = "asympt")
assoc.twocat(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs",
             nperm = NULL, distrib = "asympt")

Arguments

`x`	the first categorical variable (must be a factor)
`y`	the second categorical variable (must be a factor)
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`asympt`, default) or via Monte Carlo resampling (`approx`).

Value

A list of lists with the following elements :

tables list :

`freq`	cross-tabulation frequencies
`prop`	percentages
`rprop`	row percentages
`cprop`	column percentages
`expected`	expected values

global list :

`chi.squared`	chi-squared value
`cramer.v`	Cramer's V between the two variables
`permutation.pvalue`	p-value from a permutation (i.e. non-parametric) test of independence
`global.pem`	global PEM
`GK.tau.xy`	Goodman and Kruskal tau (forward association, i.e. x is the predictor and y is the response)
`GK.tau.yx`	Goodman and Kruskal tau (backward association, i.e. y is the predictor and x is the respons)

local list :

`std.residuals`	the table of standardized (i.e. Pearson) residuals.
`adj.residuals`	the table of adjusted standardized residuals.
`adj.res.pval`	the table of p-values of adjusted standardized residuals.
`odds.ratios`	the table of odds ratios.
`local.pem`	the table of local PEM
`phi`	the table of the phi coefficients for each pair of levels
`phi.perm.pval`	the table of permutation p-values for each pair of levels

gather : a data frame gathering informations, with one row per cell of the cross-tabulation.

Note

The adjusted standardized residuals are strictly equivalent to test-values for nominal variables as proposed by Lebart et al (1984).

Author(s)

Nicolas Robette

References

Agresti, A. (2007). An Introduction to Categorical Data Analysis, 2nd ed. New York: John Wiley & Sons.

Rakotomalala R., Comprendre la taille d'effet (effect size), http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf

Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.

Examples

data(Movies)
assoc.twocat(Movies$Country, Movies$ArtHouse, nperm=100)
data(Movies)
assoc.twocat(Movies$Country, Movies$ArtHouse, nperm=100)

Groupwise cross-tabulation and measures of association between two categorical variables

Description

Cross-tabulation and measures of association between two categorical variables, for each category of a group variable

Usage

assoc.twocat.by(x, y, by, weights = NULL, na.rm = FALSE, na.value = "NAs",
                nperm = NULL, distrib = "asympt")
assoc.twocat.by(x, y, by, weights = NULL, na.rm = FALSE, na.value = "NAs",
                nperm = NULL, distrib = "asympt")

Arguments

`x`	factor : the first categorical variable
`y`	factor : the second categorical variable
`by`	factor : the group variable
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`asympt`, default) or via Monte Carlo resampling (`approx`).

Value

A list of items, one for each category of the group variable. Each item is a list of lists with the following elements :

tables list :

`freq`	cross-tabulation frequencies
`prop`	percentages
`rprop`	row percentages
`cprop`	column percentages
`expected`	expected values

global list :

`chi.squared`	chi-squared value
`cramer.v`	Cramer's V between the two variables
`permutation.pvalue`	p-value from a permutation (i.e. non-parametric) test of independence
`global.pem`	global PEM
`GK.tau.xy`	Goodman and Kruskal tau (forward association, i.e. x is the predictor and y is the response)
`GK.tau.yx`	Goodman and Kruskal tau (backward association, i.e. y is the predictor and x is the respons)

local list :

`std.residuals`	the table of standardized (i.e.Pearson) residuals.
`adj.residuals`	the table of adjusted standardized residuals.
`adj.res.pval`	the table of p-values of adjusted standardized residuals.
`odds.ratios`	the table of odds ratios.
`local.pem`	the table of local PEM
`phi`	the table of the phi coefficients for each pair of levels
`phi.perm.pval`	the table of permutation p-values for each pair of levels

gather : a data frame gathering informations, with one row per cell of the cross-tabulation.

Note

The adjusted standardized residuals are strictly equivalent to test-values for nominal variables as proposed by Lebart et al (1984).

Author(s)

Nicolas Robette

References

Agresti, A. (2007). An Introduction to Categorical Data Analysis, 2nd ed. New York: John Wiley & Sons.

Rakotomalala R., Comprendre la taille d'effet (effect size), http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf

Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.

Examples

data(Movies)
assoc.twocat.by(Movies$Country, Movies$ArtHouse, Movies$Festival, nperm=100)
data(Movies)
assoc.twocat.by(Movies$Country, Movies$ArtHouse, Movies$Festival, nperm=100)

Measures the association between two continuous variables

Description

Measures the association between two continuous variables with Pearson, Spearman and Kendall correlations.

Usage

assoc.twocont(x, y, weights = NULL, na.rm = FALSE,
              nperm = NULL, distrib = "asympt")
assoc.twocont(x, y, weights = NULL, na.rm = FALSE,
              nperm = NULL, distrib = "asympt")

Arguments

`x`	a continuous variable (must be a numeric vector)
`y`	a continuous variable (must be a numeric vector)
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`"asympt"`, default) or via Monte Carlo resampling (`"approx"`.

Value

A data frame with Pearson, Spearman and Kendall correlations. The correlation value is in the first row and a p-value from a permutation (so non parametric) test of independence is in the second row.

Author(s)

Nicolas Robette

Examples

## Hollander & Wolfe (1973), p. 187f.
## Assessment of tuna quality.  We compare the Hunter L measure of
##  lightness to the averages of consumer panel scores (recoded as
##  integer values from 1 to 6 and averaged over 80 such values) in
##  9 lots of canned tuna.
x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)
assoc.twocont(x,y,nperm=100)
## Hollander & Wolfe (1973), p. 187f.
## Assessment of tuna quality.  We compare the Hunter L measure of
##  lightness to the averages of consumer panel scores (recoded as
##  integer values from 1 to 6 and averaged over 80 such values) in
##  9 lots of canned tuna.
x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)
assoc.twocont(x,y,nperm=100)

Measures the groupwise association between two continuous variables

Description

Measures the association between two continuous variables with Pearson, Spearman and Kendall correlations, for each category of a group variable.

Usage

assoc.twocont.by(x, y, by, weights = NULL, na.rm = FALSE,
                 nperm = NULL, distrib = "asympt")
assoc.twocont.by(x, y, by, weights = NULL, na.rm = FALSE,
                 nperm = NULL, distrib = "asympt")

Arguments

`x`	numeric vector : a continuous variable
`y`	numeric vector : a continuous variable
`by`	factor : the group variable
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`"asympt"`, default) or via Monte Carlo resampling (`"approx"`.

Value

A list of items, one for each category of the groupe variable. Each item is a data frame with Pearson, Spearman and Kendall correlations. The correlation value is in the first row and a p-value from a permutation (so non parametric) test of independence is in the second row.

Author(s)

Nicolas Robette

Examples

## Hollander & Wolfe (1973), p. 187f.
## Assessment of tuna quality.  We compare the Hunter L measure of
##  lightness to the averages of consumer panel scores (recoded as
##  integer values from 1 to 6 and averaged over 80 such values) in
##  9 lots of canned tuna.
x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)
group <- factor(c("A","B","C","C","B","A","A","C","B"))
assoc.twocont.by(x,y,group,nperm=100)
## Hollander & Wolfe (1973), p. 187f.
## Assessment of tuna quality.  We compare the Hunter L measure of
##  lightness to the averages of consumer panel scores (recoded as
##  integer values from 1 to 6 and averaged over 80 such values) in
##  9 lots of canned tuna.
x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)
group <- factor(c("A","B","C","C","B","A","A","C","B"))
assoc.twocont.by(x,y,group,nperm=100)

Bivariate association measures between pairs of variables.

Description

Computes bivariate association measures between every pairs of variables from a data frame.

Usage

  assoc.xx(x, weights = NULL, correlation = "kendall",
  na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
  nperm = NULL, distrib = "asympt", dec = c(3,3))
assoc.xx(x, weights = NULL, correlation = "kendall",
  na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
  nperm = NULL, distrib = "asympt", dec = c(3,3))

Arguments

`x`	the data frame of variables
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`correlation`	character. The type of measure of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default).
`na.rm.cat`	logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument).
`na.value.cat`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE.
`na.rm.cont`	logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`"asympt"`, default) or via Monte Carlo resampling (`"approx"`).
`dec`	vector of 2 integers for number of decimals. The first value if for association measures, the second for permutation p-values. Default is c(3,3).

Details

The function computes an association measure : Pearson's, Spearman's or Kendall's correlation for pairs of numeric variables, Cramer's V for pairs of factors and eta-squared for pairs numeric-factor. It can also compute the p-value of a permutation test of association for each pair of variables.

Value

A table with the following elements :

`measure`	: name of the association measure
`association`	: value of the association measure
`permutation.pvalue`	: p-value from the permutation test

Author(s)

Nicolas Robette

Examples

  data(iris)
  iris2 = iris
  iris2$Species = factor(iris$Species == "versicolor")
  assoc.xx(iris2, nperm = 10)
data(iris)
  iris2 = iris
  iris2$Species = factor(iris$Species == "versicolor")
  assoc.xx(iris2, nperm = 10)

Bivariate association measures between a response and predictor variables.

Description

Computes bivariate association measures between a response and predictor variables (and, optionnaly, between every pairs of predictor variables.)

Usage

  assoc.yx(y, x, weights = NULL, xx = TRUE, correlation = "kendall",
  na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
  nperm = NULL, distrib = "asympt", dec = c(3,3))
assoc.yx(y, x, weights = NULL, xx = TRUE, correlation = "kendall",
  na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
  nperm = NULL, distrib = "asympt", dec = c(3,3))

Arguments

`y`	the response variable
`x`	the predictor variables
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`xx`	whether the association measures should be computed for couples of predictor variables (default) or not. With a lot of predictors, consider setting xx to FALSE (for reasons of computation time).
`correlation`	character. The type of measure of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default).
`na.rm.cat`	logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument).
`na.value.cat`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE.
`na.rm.cont`	logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`"asympt"`, default) or via Monte Carlo resampling (`"approx"`).
`dec`	vector of 2 integers for number of decimals. The first value if for association measures, the second for permutation p-values. Default is c(3,3).

Details

Value

A list of the following items :

`YX`	: a table with the association measures between the response and predictor variables
`XX`	: a table with the association measures between every pairs of predictor variables

In each table :

`measure`	: name of the association measure
`association`	: value of the association measure
`permutation.pvalue`	: p-value from the permutation test

Author(s)

Nicolas Robette

Examples

  data(iris)
  iris2 = iris
  iris2$Species = factor(iris$Species == "versicolor")
  assoc.yx(iris2$Species,iris2[,1:4],nperm=10)
data(iris)
  iris2 = iris
  iris2$Species = factor(iris$Species == "versicolor")
  assoc.yx(iris2$Species,iris2[,1:4],nperm=10)

Measures the association between a categorical variable and some continuous and/or categorical variables

Description

Measures the association between a categorical variable and some continuous and/or categorical variables

Usage

catdesc(y, x, weights = NULL, 
na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
measure = "phi", limit = NULL, correlation = "kendall", robust = TRUE, 
nperm = NULL, distrib = "asympt", digits = 2)
catdesc(y, x, weights = NULL, 
na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
measure = "phi", limit = NULL, correlation = "kendall", robust = TRUE, 
nperm = NULL, distrib = "asympt", digits = 2)

Arguments

`y`	the categorical variable to describe (must be a factor)
`x`	a data frame with continuous and/or categorical variables
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm.cat`	logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument).
`na.value.cat`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE.
`na.rm.cont`	logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE.
`measure`	character. The measure of local association between categories of categorical variables. Can be "phi" for phi coefficient (default), "or" for odds ratios, "std.residuals" for standardized (i.e. Pearson) residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence.
`limit`	for the relationship between y and a categorical variable, only associations higher or equal to `limit` will be displayed. If NULL (default), they are all displayed.
`correlation`	character. The type of measure of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default).
`robust`	logical. If TRUE (default), median and mad are used instead of mean and standard deviation.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`"asympt"`, default) or via Monte Carlo resampling (`"approx"`).
`digits`	numeric. Number of digits for mean, median, standard deviation and mad. Default is 2.

Value

A list of the following items :

`variables`	associations between y and the variables in x
`bylevel`	a list with one element for each level of y

Each element in bylevel has the following items :

`categories`	a data frame with categorical variables from x and local associations
`continuous.var`	a data frame with continuous variables from x and associations measured by correlation coefficients

Note

If nperm is not NULL, permutation tests of independence are computed and the p-values from these tests are provided.

Author(s)

Nicolas Robette

References

Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]

Examples

data(Movies)
catdesc(Movies$ArtHouse, Movies[,c("Budget","Genre","Country")])
data(Movies)
catdesc(Movies$ArtHouse, Movies[,c("Budget","Genre","Country")])

Bivariate statistics between a categorical variable and a set of variables

Description

Computes bivariate statistics for a set of variables according to the subgroups of observations defined by a categorical variable.

Usage

cattab(x, y, weights = NULL, percent = "column",
       robust = TRUE, show.n = TRUE, show.asso = TRUE,
       digits = c(1,1), na.rm = TRUE, na.value = "NAs")
cattab(x, y, weights = NULL, percent = "column",
       robust = TRUE, show.n = TRUE, show.asso = TRUE,
       digits = c(1,1), na.rm = TRUE, na.value = "NAs")

Arguments

`x`	data frame. The variables which are described in rows. They can be numerical or factors.
`y`	factor. The categorical variable which defines subgroups of observations described in columns.
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`percent`	character. Whether to compute row percentages ("row") or column percentages ("column", default).
`robust`	logical. Whether to use medians instead of means. Default is TRUE.
`show.n`	logical. Whether to display frequencies (between brackets) in addition to the percentages. Default is TRUE.
`show.asso`	logical. Whether to add a column with measures of global association (Cramer's V and eta-squared). Default is TRUE.
`digits`	vector of 2 integers. The first value sets the number of digits for percentages, the second one sets the number of digits for medians and means. Default is c(1,1). If NULL, the results are not rounded.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see `na.value` argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.

Details

The function uses gtsummary package to build the table of statistics, and then gt package to finalize the layout. Weights are handled silently with survey package.

Besides, the function is compatible with the attribute labels assigned with labelled package : these labels are displayed automatically.

Value

An object of class gt_tbl.

Note

This function is quite similar to profiles, but displays the results in a fancier way.

Author(s)

Nicolas Robette

Examples

data(Movies)
cattab(x = Movies[, c("Genre", "ArtHouse", "Critics", "BoxOffice")],
       y = Movies$Country)
data(Movies)
cattab(x = Movies[, c("Genre", "ArtHouse", "Critics", "BoxOffice")],
       y = Movies$Country)

Measures the association between a continuous variable and some continuous and/or categorical variables

Description

Measures the association between a continuous variable and some continuous and/or categorical variables

Usage

condesc(y, x, weights = NULL, 
na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
limit = NULL, correlation = "kendall", robust = TRUE, 
nperm = NULL, distrib = "asympt", digits = 2)
condesc(y, x, weights = NULL, 
na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
limit = NULL, correlation = "kendall", robust = TRUE, 
nperm = NULL, distrib = "asympt", digits = 2)

Arguments

`y`	the continuous variable to describe
`x`	a data frame with continuous and/or categorical variables
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm.cat`	logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument).
`na.value.cat`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE.
`na.rm.cont`	logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE.
`limit`	for the relationship between y and a category of a categorical variable, only associations (point-biserial correlations) higher or equal to `limit` will be displayed. If NULL (default), they are all displayed.
`correlation`	character. The type of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default).
`robust`	logical. If TRUE (default), meadian and mad are used instead of mean and standard deviation.
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`"asympt"`, default) or via Monte Carlo resampling (`"approx"`).
`digits`	numeric. Number of digits for mean, median, standard deviation and mad. Default is 2.

Value

A list of the following items :

`variables`	associations between y and the variables in x
`categories`	a data frame with categorical variables from x and associations measured by point biserial correlation.

Note

If nperm is not NULL, permutation tests of independence are computed and the p-values from these tests are provided.

Author(s)

Nicolas Robette

References

Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]

Examples

data(Movies)
condesc(Movies$BoxOffice, Movies[,c("Budget","Genre","Country")])
data(Movies)
condesc(Movies$BoxOffice, Movies[,c("Budget","Genre","Country")])

Bivariate statistics between a continuous variable and a set of variables

Description

Computes bivariate statistics between a continuous variable and a set of variables, possibly according to a strata variable.

Usage

contab(x, y, strata = NULL, weights = NULL, robust = TRUE,
       digits = c(1,3), na.rm = TRUE, na.value = "NAs")
contab(x, y, strata = NULL, weights = NULL, robust = TRUE,
       digits = c(1,3), na.rm = TRUE, na.value = "NAs")

Arguments

`x`	data frame. The variables which are described in rows. They can be numerical or factors.
`y`	factor. The categorical variable which defines subgroups of observations described in columns.
`strata`	optional categorical variable to stratify the table by column. Default is NULL, which means no strata.
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`robust`	logical. Whether to use medians (and mads) instead of means (and standard deviations). Default is TRUE.
`digits`	vector of 2 integers. The first value sets the number of digits for medians, mads, means and standard deviations (categorical variables). The second one sets the number of digits for slopes (continuous variables). Default is c(1,3). If NULL, the results are not rounded.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables with NA values (see `na.value` argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.

Details

For categorical variables in x, the function computes :

- column 1 : the median and the mad of y for each level of the variable

- column 2 : the global association between the variable and y, measured by the eta-squared

For continous variables in x, it computes :

- column 1 : the slope of the linear regression of y according to the variable

- column 2 : the global association between the variable and y, measured by Pearson and Spearman correlations

Value

An object of class gt_tbl.

Author(s)

Nicolas Robette

Examples

data(Movies)
contab(x = Movies[, c("Genre", "ArtHouse", "Budget")],
       y = Movies$BoxOffice)
data(Movies)
contab(x = Movies[, c("Genre", "ArtHouse", "Budget")],
       y = Movies$BoxOffice)

Pretty 2, 3 or 4-way cross-tabulations

Description

Displays pretty 2, 3 or 4-way cross-tabulations, from possibly weighted data, and with the opportunity to color the cells of the table according to a local measure of association (phi coefficients, standardized residuals or PEM).

Usage

crosstab(x, 
         y,
         xstrata = NULL,
         ystrata = NULL,
         weights = NULL,
         stat = "rprop",
         show.n = FALSE,
         show.cramer = TRUE,
         na.rm = FALSE,
         na.value = "NAs",
         digits = 1,
         sort = "none",
         color.cells = FALSE,
         measure = "phi",
         limits = c(-1, 1),
         min.asso = 0.1, 
         palette = "PRGn",
         reverse = FALSE)
crosstab(x, 
         y,
         xstrata = NULL,
         ystrata = NULL,
         weights = NULL,
         stat = "rprop",
         show.n = FALSE,
         show.cramer = TRUE,
         na.rm = FALSE,
         na.value = "NAs",
         digits = 1,
         sort = "none",
         color.cells = FALSE,
         measure = "phi",
         limits = c(-1, 1),
         min.asso = 0.1, 
         palette = "PRGn",
         reverse = FALSE)

Arguments

`x`	the row categorical variable
`y`	the column categorical variable
`xstrata`	optional categorical variable to stratify the table by rows. Default is NULL, which means no row strata.
`ystrata`	optional categorical variable to stratify the table by columns. Default is NULL, which means no column strata.
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`stat`	character. Whether to compute a contingency table ("freq", default), percentages ("prop"), row percentages ("rprop") or column percentages ("cprop").
`show.n`	logical. Whether to display frequencies (between brackets) in addition to the percentages. Ignored if stat = "freq". Default is FALSE.
`show.cramer`	logical. If TRUE (default), Cramer's V measure of association is displayed beside the table.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see `na.value` argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`digits`	integer. The number of digits (default is 1). If NULL, the results are not rounded.
`sort`	character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done.
`color.cells`	logical, indicating whether the cells of the table should be colored according to local measures of association. Default is FALSE.
`measure`	character. The measure of association used to color the cells. Can be "phi" for phi coefficient (default), "std.residuals" for standardized residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence. Only used if color.cells = TRUE.
`limits`	a numeric vector of length 2 providing limits of the scale. Default is c(-1,1). Only used if color.cells = TRUE.
`min.asso`	numerical value. The cells with a local association below min.asso (in absolute value) are kept blank. Only used if color.cells = TRUE.
`palette`	The colours or colour function that values will be mapped to (see details).
`reverse`	Whether the colors (or color function) in palette should be used in reverse order. For example, if the default order of a palette goes from blue to green, then reverse = TRUE will result in the colors going from green to blue. Default is FALSE. Only used if color.cells = TRUE.

Details

The function uses gtsummary package to build the cross-tabulation, and then gt package to finalize the layout and color the cells. Weights are handled silently with survey package.

Besides, the function is compatible with the attribute labels assigned with labelled package : these labels are displayed automatically.

The palette argument can be any of the following :

1. A character vector of RGB or named colours. Examples: palette(), c("#000000", "#0000FF", "#FFFFFF"), topo.colors(10)

2. The name of an RColorBrewer palette, e.g. "BuPu" or "Greens".

3. The full name of a viridis palette: "viridis", "magma", "inferno", or "plasma".

4. A function that receives a single value between 0 and 1 and returns a colour. Examples: colorRamp(c("#000000", "#FFFFFF"), interpolate="spline").

Value

An object of class gt_tbl.

Example Output

Example 1

Example 2

Author(s)

Nicolas Robette

Examples

data(Movies)
# example 1
crosstab(Movies$Genre, Movies$Country)
# example 2
with(Movies, crosstab(Genre, Country, ystrata = ArtHouse, show.n = TRUE, color.cells = TRUE))
data(Movies)
# example 1
crosstab(Movies$Genre, Movies$Country)
# example 2
with(Movies, crosstab(Genre, Country, ystrata = ArtHouse, show.n = TRUE, color.cells = TRUE))

Describes Associations as in a Regression Model Analysis.

Description

Computes bivariate association measures between a response and predictor variables, producing a summary looking like a regression analysis.

Usage

darma(y, x, weights = NULL, target = 1,
      na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
      correlation = "kendall",
      nperm = NULL, distrib = "asympt", dec = c(1,3,3))
darma(y, x, weights = NULL, target = 1,
      na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
      correlation = "kendall",
      nperm = NULL, distrib = "asympt", dec = c(1,3,3))

Arguments

`y`	the response variable
`x`	the predictor variables
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`target`	rank or name of the category of interest when y is categorical
`na.rm.cat`	logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument).
`na.value.cat`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE.
`na.rm.cont`	logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE.
`correlation`	character. The type of measure of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default).
`nperm`	numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
`distrib`	the null distribution of permutation test of independence can be approximated by its asymptotic distribution (`"asympt"`, default) or via Monte Carlo resampling (`"approx"`).
`dec`	vector of 3 integers for number of decimals. The first value if for percents or medians, the second for association measures, the third for permutation p-values. Default is c(1,3,3).

Details

The function computes association measures (phi, correlation coefficient, Kendall's correlation) between the variable of interest and the other variables. It can also compute the p-values permutation tests.

Value

A data frame

Author(s)

Nicolas Robette

Examples

  data(iris)
  iris2 = iris
  iris2$Species = factor(iris$Species == "versicolor")
  darma(iris2$Species, iris2[,1:4], target=2, nperm=100)
data(iris)
  iris2 = iris
  iris2$Species = factor(iris$Species == "versicolor")
  darma(iris2$Species, iris2[,1:4], target=2, nperm=100)

Association plot

Description

For a cross-tabulation, plots measures of local association with bars of varying height and width, using ggplot2.

Usage

ggassoc_assocplot(data, mapping, measure = "std.residuals",
                  limits = NULL, sort = "none",
                  na.rm = FALSE, na.value = "NAs",
                  colors = NULL, direction = 1, legend = "right")
ggassoc_assocplot(data, mapping, measure = "std.residuals",
                  limits = NULL, sort = "none",
                  na.rm = FALSE, na.value = "NAs",
                  colors = NULL, direction = 1, legend = "right")

Arguments

`data`	dataset to use for plot
`mapping`	aesthetics being used. x and y are required, weight can also be specified.
`measure`	character. The measure of association used to fill the rectangles. Can be "phi" for phi coefficient, "or" for odds ratios, "std.residuals" (default) for standardized (i.e. Pearson) residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence.
`limits`	a numeric vector of length two providing limits of the scale. If NULL (default), the limits are automatically adjusted to the data.
`sort`	character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`colors`	vector of colors that will be interpolated to produce a color gradient. If NULL (default), the "Temps" palette from `rcartocolors` package is used.
`direction`	Sets the order of colours in the scale. If 1, the default, colours are as output by RColorBrewer::brewer.pal(). If -1, the order of colours is reversed.
`legend`	the position of legend ("none", "left", "right", "bottom", "top"). If "none", no legend is displayed.

Details

The measure of local association measures how much each combination of categories of x and y is over/under-represented.

The bars vary in width according to the square root of the expected frequency. They vary in height and color shading according to the measure of association. If the measure chosen is "std.residuals" (Pearson's residuals), as in the original association plot from Cohen and Friendly, the area of the bars is proportional to the difference in observed and expected frequencies.

This function can be used as a high-level plot with ggduo and ggpairs functions of the GGally package.

Value

a ggplot object

Author(s)

Nicolas Robette

References

Cohen, A. (1980), On the graphical display of the significant components in a two-way contingency table. Communications in Statistics—Theory and Methods, 9, 1025–1041. doi:10.1080/03610928008827940.

Friendly, M. (1992), Graphical methods for categorical data. SAS User Group International Conference Proceedings, 17, 190–200. http://datavis.ca/papers/sugi/sugi17.pdf

Examples

data(Movies)
ggassoc_assocplot(data=Movies, mapping=ggplot2::aes(Country, Genre))
data(Movies)
ggassoc_assocplot(data=Movies, mapping=ggplot2::aes(Country, Genre))

Bar plot of a crosstabulation inspired by Bertin

Description

For a cross-tabulation, plots bars for the conditional percentages of variable y according to variable x, using ggplot2. The general display is inspired by Bertin's plots.

Usage

ggassoc_bertin(data, mapping, prop.width = FALSE, 
sort = "none", add.gray = FALSE, add.rprop = FALSE,
na.rm = FALSE, na.value ="NAs")
ggassoc_bertin(data, mapping, prop.width = FALSE, 
sort = "none", add.gray = FALSE, add.rprop = FALSE,
na.rm = FALSE, na.value ="NAs")

Arguments

`data`	dataset to use for plot
`mapping`	aesthetics being used. x and y are required, weight can also be specified.
`prop.width`	logical. If TRUE, the width of the bars is proportional to the margin percentages of variable x.
`sort`	character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only variable x is sorted. If "y", only variable y is sorted. If "none" (default), no sorting is done.
`add.gray`	logical. If FALSE (default), only white and black are used to fill the bars. If TRUE, gray is used additionally to fill the part of the bars corresponding to margin percentages of variable y.
`add.rprop`	logical. If TRUE, row percentages are displayed on top of the bars. Default is FALSE.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.

Details

The height of the bars is proportional to the conditional frequency of variable y. The bars are filled in black if the conditional frequency is higher than the marginal frequency; otherwise it's filled in white.

This graphical representation is inspired by the principles of Jacques Bertin and the online AMADO tool (https://paris-timemachine.huma-num.fr/amado/main.html).

Note : It does not allow faceting.

Value

a ggplot object

Author(s)

Nicolas Robette

References

J. Bertin: La graphique et le traitement graphique de l'information. Flammarion: Paris 1977.

Examples

data(Movies)
ggassoc_bertin(Movies, ggplot2::aes(x = Country, y = Genre))
ggassoc_bertin(Movies, ggplot2::aes(x = Country, y = Genre),
 sort = "both", prop.width = TRUE, add.gray = 3, add.rprop = TRUE)
data(Movies)
ggassoc_bertin(Movies, ggplot2::aes(x = Country, y = Genre))
ggassoc_bertin(Movies, ggplot2::aes(x = Country, y = Genre),
 sort = "both", prop.width = TRUE, add.gray = 3, add.rprop = TRUE)

Boxplots with violins

Description

Displays of boxplot and combines it with a violin plot, using ggplot2.

Usage

ggassoc_boxplot(data, mapping, 
na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
axes.labs = TRUE, ticks.labs = TRUE, text.size = 3,
sort = FALSE, box = TRUE, notch = FALSE, violin = TRUE)
ggassoc_boxplot(data, mapping, 
na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
axes.labs = TRUE, ticks.labs = TRUE, text.size = 3,
sort = FALSE, box = TRUE, notch = FALSE, violin = TRUE)

Arguments

`data`	dataset to use for plot
`mapping`	aesthetic being used. It must specify x and y.
`na.rm.cat`	logical, indicating whether NA values in the categorical variable (i.e. x) should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variable (see na.value.cat argument).
`na.value.cat`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`na.rm.cont`	logical, indicating whether NA values in the continuous variable (i.e. y) should be silently removed before the computation proceeds. Default is FALSE.
`axes.labs`	Whether to display the labels of the axes, i.e. the names of x and y. Default is TRUE.
`ticks.labs`	Whether to display the labels of the categories of x and y. Default is TRUE.
`text.size`	Size of the association measure. If NULL, the text is not added to the plot.
`sort`	logical. If TRUE, the levels of the categorical variable are reordered according to the conditional medians, so that boxplots are sorted. Default is FALSE.
`box`	Whether to draw boxplots. Default is TRUE.
`notch`	If FALSE (default) make a standard box plot. If TRUE, make a notched box plot. Notches are used to compare groups; if the notches of two boxes do not overlap, this suggests that the medians are significantly different.
`violin`	Whether to draw a violin plot. Default is TRUE.

Details

Eta-squared measure of global association between x and y is displayed in upper-left corner of the plot.

This function can be used as a high-level plot with ggduo and ggpairs functions of the GGally package.

Value

a ggplot object

Author(s)

Nicolas Robette

Examples

data(Movies)
ggassoc_boxplot(Movies, mapping = ggplot2::aes(x = Critics, y = ArtHouse))
data(Movies)
ggassoc_boxplot(Movies, mapping = ggplot2::aes(x = Critics, y = ArtHouse))

Plots counts and associations of a crosstabulation

Description

For a cross-tabulation, plots the number of observations by using rectangles with proportional areas, and the phi measures of association between the categories with a diverging gradient of colour, using ggplot2.

Usage

ggassoc_chiasmogram(data, mapping, measure = "phi",
limits = NULL, sort = "none",
na.rm = FALSE, na.value = "NAs",
colors = NULL, direction = 1)
ggassoc_chiasmogram(data, mapping, measure = "phi",
limits = NULL, sort = "none",
na.rm = FALSE, na.value = "NAs",
colors = NULL, direction = 1)

Arguments

`data`	dataset to use for plot
`mapping`	aesthetics being used. x and y are required, weight can also be specified.
`measure`	character. The measure of association used for filling the rectangles. Can be "phi" for phi coefficient (default), "or" for odds ratios, "residuals" for Pearson residuals, "std.residuals" for standardized Pearson residuals or "pem" for local percentages of maximum deviation from independence.
`limits`	a numeric vector of length two providing limits of the scale. If NULL (default), the limits are automatically adjusted to the data.
`sort`	character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`colors`	vector of colors that will be interpolated to produce a color gradient. If NULL (default), the "Temps" palette from `rcartocolors` package is used.
`direction`	Sets the order of colours in the scale. If 1, the default, colours are as output by RColorBrewer::brewer.pal(). If -1, the order of colours is reversed.

Details

The height of the rectangles is proportional to the marginal frequency of the row variable ; their width is proportional to the marginal frequency of the column variable. So the area of the rectangles is proportional to the expected frequency.

The rectangles are filled according to a measure of local association, which measures how much each combination of categories of x and y is over/under-represented.

This function can be used as a high-level plot with ggduo and ggpairs functions of the GGally package.

Note : It does not allow faceting.

Value

a ggplot object

Author(s)

Nicolas Robette

References

Bozon Michel, Héran François. La découverte du conjoint. II. Les scènes de rencontre dans l'espace social. Population, 43(1), 1988, pp. 121-150.

Examples

data(Movies)
ggassoc_chiasmogram(data=Movies, mapping=ggplot2::aes(Genre, Country))
data(Movies)
ggassoc_chiasmogram(data=Movies, mapping=ggplot2::aes(Genre, Country))

Proportional area plot

Description

For a cross-tabulation, plots the observed (or expected) frequencies by using rectangles with proportional areas, and the measures of local association between the categories with a diverging gradient of colour, using ggplot2.

Usage

ggassoc_crosstab(data, mapping, size = "freq", max.size =  20,
                 measure = "phi", limits = NULL, sort = "none", 
                 na.rm = FALSE, na.value = "NAs",
                 colors = NULL, direction = 1, legend = "right")
ggassoc_crosstab(data, mapping, size = "freq", max.size =  20,
                 measure = "phi", limits = NULL, sort = "none", 
                 na.rm = FALSE, na.value = "NAs",
                 colors = NULL, direction = 1, legend = "right")

Arguments

`data`	dataset to use for plot
`mapping`	aesthetics being used. x and y are required, weight can also be specified.
`size`	character. If "freq" (default), areas are proportional to observed frequencies. If "expected", they are proportional to expected frequencies.
`max.size`	numeric value, specifying the maximum size of the squares. Default is 20.
`measure`	character. The measure of association used for filling the rectangles. Can be "phi" for phi coefficient (default), "or" for odds ratios, "std.residuals" for standardized residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence.
`limits`	a numeric vector of length two providing limits of the scale. If NULL (default), the limits are automatically adjusted to the data.
`sort`	character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`colors`	vector of colors that will be interpolated to produce a color gradient. If NULL (default), the "Temps" palette from `rcartocolors` package is used.
`direction`	Sets the order of colours in the scale. If 1, the default, colours are as output by RColorBrewer::brewer.pal(). If -1, the order of colours is reversed.
`legend`	the position of legend ("none", "left", "right", "bottom", "top"). If "none", no legend is displayed.

Details

The measure of local association measures how much each combination of categories of x and y is over/under-represented.

The areas of the rectangles are proportional to observed or expected frequencies. Their color shading varies according to the measure of association.

This function can be used as a high-level plot with ggduo and ggpairs functions of the GGally package.

Value

a ggplot object

Author(s)

Nicolas Robette

Examples

data(Movies)
ggassoc_crosstab(data=Movies, mapping=ggplot2::aes(Genre, Country))
data(Movies)
ggassoc_crosstab(data=Movies, mapping=ggplot2::aes(Genre, Country))

Marimekko plot

Description

For a cross-tabulation, plots a marimekko chart (also called mosaic plot), using ggplot2.

Usage

ggassoc_marimekko(data, mapping, type = "classic", 
measure = "phi", limits = NULL, 
na.rm = FALSE, na.value = "NAs",
palette = NULL, colors = NULL, direction = 1, 
linecolor = "gray60", linewidth = 0.1, 
sort = "none", legend = "right")
ggassoc_marimekko(data, mapping, type = "classic", 
measure = "phi", limits = NULL, 
na.rm = FALSE, na.value = "NAs",
palette = NULL, colors = NULL, direction = 1, 
linecolor = "gray60", linewidth = 0.1, 
sort = "none", legend = "right")

Arguments

`data`	dataset to use for plot
`mapping`	aesthetics being used. x and y are required, weight can also be specified.
`type`	character. If "classic" (default), a simple marimekko chart is plotted, with no use of local associations. If type is "shades", tiles are shaded according to the local associations between categories. If type is "patterns", tiles are filled with patterns, and the density of patterns is proportional to the absolute level of local association between categories.
`measure`	character. The measure of association used for filling (if type is "shades) or patterning (if type is "patterns") the tiles. Can be "phi" for phi coefficient, "or" for odds ratios, "std.residuals" (default) for standardized (i.e. Pearson) residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence.
`limits`	a numeric vector of length two providing limits of the scale. If NULL (default), the limits are automatically adjusted to the data. Only used for type "shades".
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`palette`	A character vector of color codes. The number of colors should be equal or higher than the number of categories in y. If NULL (default), the "Tableau" palette from `ggthemes` package is used. Only used for types "classic" and "patterns".
`colors`	vector of colors that will be interpolated to produce a color gradient. If NULL (default), the "Temps" palette from `rcartocolors` package is used. Only used for type "shades".
`direction`	Sets the order of colours in the scale. If 1, the default, colours are as output by RColorBrewer::brewer.pal(). If -1, the order of colours is reversed.
`linecolor`	character. Color of the contour lines of the tiles. Default is gray60.
`linewidth`	numeric. Width of the contour lines of the tiles. Default is 0.1.
`sort`	character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done.
`legend`	the position of legend ("none", "left", "right", "bottom", "top"). If "none", no legend is displayed.

Details

The measure of local association measures how much each combination of categories of x and y is over/under-represented.

This function can be used as a high-level plot with ggduo and ggpairs functions of the GGally package.

Note : It does not allow faceting.

Value

a ggplot object

Author(s)

Nicolas Robette

References

Hartigan, J.A., and Kleiner, B. (1984), "A mosaic of television ratings". The American Statistician, 38, 32–35.

Friendly, M. (1994), "Mosaic displays for multi-way contingency tables". Journal of the American Statistical Association, 89, 190–200.

Examples

data(Movies)
ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country))
ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country), type = "patterns")
ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country), type = "shades")
data(Movies)
ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country))
ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country), type = "patterns")
ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country), type = "shades")

Bar plot of measures of local association of a crosstabulation

Description

For a cross-tabulation, plots the measures of local association with bars of varying height, using ggplot2.

Usage

ggassoc_phiplot(data, mapping, measure = "phi", 
                limit = NULL, sort = "none",
                na.rm = FALSE, na.value = "NAs")
ggassoc_phiplot(data, mapping, measure = "phi", 
                limit = NULL, sort = "none",
                na.rm = FALSE, na.value = "NAs")

Arguments

`data`	dataset to use for plot
`mapping`	aesthetics being used. x and y are required, weight can also be specified.
`measure`	character. The measure of association used for filling the rectangles. Can be "phi" for phi coefficient (default), "or" for odds ratios, "std.residuals" for standardized residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence.
`limit`	numeric value, specifying the upper limit of the scale for the height of the bars, i.e. for the measures of association (the lower limit is set to 0-limit). It corresponds to the maximum absolute value of association one wants to represent in the plot. If NULL (default), the limit is automatically adjusted to the data.
`sort`	character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.

Details

The measure of association measures how much each combination of categories of x and y is over/under-represented. The bars vary in width according to the number of observations in the categories of the column variable. They vary in height according to the measure of association. Bars are black if the association is positive and white if it is negative.

The genuine version of this plot (see Cibois, 2004) uses the measure of association called "pem", i.e. the local percentages of maximum deviation from independence.

This function can be used as a high-level plot with ggduo and ggpairs functions of the GGally package.

Value

a ggplot object

Author(s)

Nicolas Robette

References

Cibois Philippe, 2004, Les écarts à l'indépendance. Techniques simples pour analyser des données d'enquêtes, Collection "Méthodes quantitatives pour les sciences sociales"

Examples

data(Movies)
ggassoc_phiplot(data=Movies, mapping=ggplot2::aes(Country, Genre))
data(Movies)
ggassoc_phiplot(data=Movies, mapping=ggplot2::aes(Country, Genre))

Scatter plot with a smoothing line

Description

Displays of scatter plot and adds a smoothing line, using ggplot2.

Usage

ggassoc_scatter(data, mapping, na.rm = FALSE,
axes.labs = TRUE, ticks.labs = TRUE, text.size = 3)
ggassoc_scatter(data, mapping, na.rm = FALSE,
axes.labs = TRUE, ticks.labs = TRUE, text.size = 3)

Arguments

`data`	dataset to use for plot
`mapping`	aesthetic being used. It must specify x and y.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.
`axes.labs`	Whether to display the labels of the axes, i.e. the names of x and y. Default is TRUE.
`ticks.labs`	Whether to display the labels of the categories of x and y. Default is TRUE.
`text.size`	Size of the association measure. If NULL, the text is not added to the plot.

Details

Kendall's tau rank correlation between x and y is displayed in upper-left corner of the plot.

Smoothing is performed with gam.

This function can be used as a high-level plot with ggduo and ggpairs functions of the GGally package.

Value

a ggplot object

Author(s)

Nicolas Robette

Examples

data(Movies)
ggassoc_scatter(Movies, mapping = ggplot2::aes(x = Budget, y = Critics))
data(Movies)
ggassoc_scatter(Movies, mapping = ggplot2::aes(x = Budget, y = Critics))

Movies (data)

Description

The data concerns a sample of 1000 Movies which were on screens in France and come of their characteristics.

Usage

data(Movies)data(Movies)

Format

A data frame with 1000 observations and the following 7 variables:

Budget: numeric vector of movie budgets
Genre: is a factor with 9 levels
Country: is a factor with 4 level. Country of origin of the movie.
ArtHouse: is a factor with levels No, Yes. Whether the movie had the "Art House" label.
Festival: is a factor with levels No, Yes. Whether the movie was selected in Cannes, Berlin or Venise film festivals.
Critics: numeric vector of average ratings from intellectual criticism.
BoxOffice: numeric vector of number of admissions.

Examples

data(Movies)
str(Movies)
data(Movies)
str(Movies)

Computes the odds ratios for every cells of a contingency table

Description

Computes the odds ratio for every cells of the cross-tabulation between two categorical variables

Usage

or.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3)
or.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3)

Arguments

`x`	the first categorical variable
`y`	the second categorical variable
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`digits`	integer. The number of digits (default is 3). If NULL, the results are not rounded.

Value

A table with the odds ratios

Author(s)

Nicolas Robette

Examples

data(Movies)
or.table(Movies$Country, Movies$ArtHouse)
data(Movies)
or.table(Movies$Country, Movies$ArtHouse)

Computes the local and global Percentages of Maximum Deviation from Independence (pem)

Description

Computes the local and global Percentages of Maximum Deviation from Independence (pem) of a contingency table.

Usage

pem.table(x, y, weights = NULL, sort = FALSE, na.rm = FALSE, na.value = "NAs", digits = 1)
pem.table(x, y, weights = NULL, sort = FALSE, na.rm = FALSE, na.value = "NAs", digits = 1)

Arguments

`x`	the first categorical variable
`y`	the second categorical variable
`weights`	an optional numeric vector of weights (by default, a vector of 1 for uniform weights)
`sort`	logical. Whether rows and columns are sorted according to a correspondence analysis or not (default is FALSE).
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`digits`	integer. The number of digits (default is 1). If NULL, the results are not rounded.

Details

The Percentage of Maximum Deviation from Independence (pem) is an association measure for contingency tables and also provides attraction (resp. repulsion) measures in each cell of the crosstabulation (see Cibois, 1993). It is an alternative to khi2, Cramer's V coefficient, etc.

Value

Returns a list:

`peml`	Table with local percentages of maximum deviation from independence
`pemg`	Numeric value, i.e. the global percentage of maximum deviation from independence

Author(s)

Nicolas Robette

References

Cibois P., 1993, Le pem, pourcentage de l'ecart maximum : un indice de liaison entre modalites d'un tableau de contingence, Bulletin de methodologie sociologique, n40, p.43-63. https://cibois.pagesperso-orange.fr/bms93.pdf

Examples

data(Movies)
pem.table(Movies$Country, Movies$ArtHouse)
data(Movies)
pem.table(Movies$Country, Movies$ArtHouse)

Computes the phi coefficient for every cells of a contingency table

Description

Computes the phi coefficient for every cells of the cross-tabulation between two categorical variables

Usage

phi.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3)
phi.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3)

Arguments

`x`	the first categorical variable
`y`	the second categorical variable
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`digits`	integer. The number of digits (default is 3). If NULL, the results are not rounded.

Value

A table with the phi coefficients

Author(s)

Nicolas Robette

References

Rakotomalala R., 'Comprendre la taille d'effet (effect size)', http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf

Examples

data(Movies)
phi.table(Movies$Country, Movies$ArtHouse)
data(Movies)
phi.table(Movies$Country, Movies$ArtHouse)

Profiles by level of a categorical variable

Description

Computes profiles (frequencies or percentages) for subgroups of observations defined by the levels of a categorical variable.

Usage

profiles(X, y, weights = NULL, stat = "cprop",
 mar = TRUE, digits = 1)
profiles(X, y, weights = NULL, stat = "cprop",
 mar = TRUE, digits = 1)

Arguments

`X`	data frame. The variables which are described in the profiles. There should be only factors.
`y`	factor. The categorical variable which defines subgroups of observations whose profiles will be computed.
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`stat`	character. Whether to compute frequencies ("freq"), percentages ("prop"), row percentages ("rprop") or column percentages ("cprop", default).
`mar`	logical, indicating whether to compute margins. Default is TRUE.
`digits`	numeric. Number of digits. Default is 1.

Value

A data frame with profiles in columns

Author(s)

Nicolas Robette

Examples

data(Movies)
profiles(Movies[,c(2,4,5)], Movies$Country)
data(Movies)
profiles(Movies[,c(2,4,5)], Movies$Country)

Univariate and Multivariate Regressions and Their Average Marginal Effects

Description

Computes linear or binomial regressions in two steps : univariate regressions and a multivariate regressions. All the results are nicely displayed side by side with average marginal effects.

Usage

regtab(x, y, weights = NULL, continuous = "slopes", 
 show.ci = TRUE, conf.level = 0.95)
regtab(x, y, weights = NULL, continuous = "slopes", 
 show.ci = TRUE, conf.level = 0.95)

Arguments

`x`	data frame. The explanatory (i.e. independent) variables used in regressions. They can be numerical or factors.
`y`	vector. The outcome (i.e. dependent) variable. It can be numerical (linear regression) or a factor with 2 levels (binomial regression).
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`continuous`	character. The kind of average marginal effects computed for continuous explanatory variables. If "slopes" (defaults), these are average marginal slopes. If "predictions", these are average marginal predictions for a set of values.
`show.ci`	logical. Whether to display the confidence intervals
`conf.level`	numerical value. Defaults to 0.95, which corresponds to a 95 percent confidence interval. Must be strictly greater than 0 and less than 1.

Details

This function is basically a wrapper for regression functions in the gtsummary function. It computes a series of univariate regressions (one for each explanatory variable), then a multivariate regression (with all explanatory variables) and displays the results side by side. These results are presented in the form of average marginal effects : average marginal predictions for categorical variables and average marginal slopes (or predictions) for continuous variables.

Besides, the function is compatible with the attribute labels assigned with labelled package : these labels are displayed automatically.

Value

an object of class tbl_merge from gtsummary package

Author(s)

Nicolas Robette

References

Arel-Bundock V, Greifer N, Heiss A (Forthcoming). “How to Interpret Statistical Models Using marginaleffects in R and Python.” Journal of Statistical Software.

Larmarange J., 2024, “Prédictions marginales, contrastes marginaux & effets marginaux”, in Guide-R, Guide pour l’analyse de données d’enquêtes avec R, https://larmarange.github.io/guide-R/analyses/estimations-marginales.html

Examples

data(Movies)
regtab(x = Movies[, c("Genre", "Budget", "Festival", "Critics")],
       y = Movies$BoxOffice)
data(Movies)
regtab(x = Movies[, c("Genre", "Budget", "Festival", "Critics")],
       y = Movies$BoxOffice)

Cross-tabulation statistics for ggplot2

Description

Computes statistics of a cross-tabulation using assoc.twocat function.

Usage

stat_twocat(mapping = NULL, 
            data = NULL,
            geom = "point",
            position = "identity",
            ...,
            show.legend = NA,
            inherit.aes = TRUE)
stat_twocat(mapping = NULL, 
            data = NULL,
            geom = "point",
            position = "identity",
            ...,
            show.legend = NA,
            inherit.aes = TRUE)

Arguments

`mapping`	Set of aesthetic mappings created by `aes()`. If specified and `inherit.aes = TRUE` (the default), it is combined with the default mapping at the top level of the plot. You must supply `mapping` if there is no plot mapping.
`data`	The data to be displayed in this layer. There are three options: If `NULL`, the default, the data is inherited from the plot data as specified in the call to `ggplot()`. A `data.frame`, or other object, will override the plot data. All objects will be fortified to produce a data frame. See `fortify()` for which variables will be created. A `function` will be called with a single argument, the plot data. The return value must be a `data.frame`, and will be used as the layer data. A `function` can be created from a `formula` (e.g. `~ head(.x, 10)`).
`geom`	Override the default connection with `ggplot2::geom_point()`.
`position`	Position adjustment, either as a string naming the adjustment (e.g. `"jitter"` to use `position_jitter`), or the result of a call to a position adjustment function. Use the latter if you need to change the settings of the adjustment.
`...`	Other arguments passed on to `layer()`. These are often aesthetics, used to set an aesthetic to a fixed value, like `colour = "red"` or `size = 3`. They may also be parameters to the paired geom/stat.
`show.legend`	logical. Should this layer be included in the legends? `NA`, the default, includes if any aesthetics are mapped. `FALSE` never includes, and `TRUE` always includes. It can also be a named logical vector to finely select the aesthetics to display.
`inherit.aes`	If `FALSE`, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. `borders()`.

Value

A ggplot2 plot with the added statistic.

Author(s)

Nicolas Robette

Standardized residuals of a contingency table

Description

Computes standardized or adjusted residuals of a (possibly) weighted contingency table

Usage

stdres.table(x, y, weights = NULL, na.rm = FALSE,
  na.value = "NAs", digits = 3, residuals = "std")
stdres.table(x, y, weights = NULL, na.rm = FALSE,
  na.value = "NAs", digits = 3, residuals = "std")

Arguments

`x`	the first categorical variable
`y`	the second categorical variable
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`digits`	integer. The number of digits (default is 3). If NULL, the results are not rounded.
`residuals`	If "std" (default), standardized (i.e. Pearson) residuals are computed. If "adj", adjusted standardized residuals are computed.

Value

A table with the residuals

Note

The adjusted standardized residuals are strictly equivalent to test-values for nominal variables as proposed by Lebart et al (1984).

Author(s)

Nicolas Robette

References

Agresti, A. (2007). An Introduction to Categorical Data Analysis, 2nd ed. New York: John Wiley & Sons.

Rakotomalala R., Comprendre la taille d'effet (effect size), http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf

Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.

Examples

data(Movies)
stdres.table(Movies$Country, Movies$ArtHouse)
data(Movies)
stdres.table(Movies$Country, Movies$ArtHouse)

Weighted correlation

Description

Computes the weighted correlation between two distributions. This can be Pearson, Spearman or Kendall correlation.

Usage

weighted.cor(x, y, weights = NULL, method = "pearson", na.rm = FALSE)
weighted.cor(x, y, weights = NULL, method = "pearson", na.rm = FALSE)

Arguments

`x`	numeric vector
`y`	numeric vector
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`method`	a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman".
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.

Value

a length-one numeric vector

Author(s)

Nicolas Robette

Examples

data(Movies)
weighted.cor(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500))
weighted.cor(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500), method = "spearman")
data(Movies)
weighted.cor(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500))
weighted.cor(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500), method = "spearman")

Weighted correlations

Description

Computes a matrix of weighted correlations between the columns of x and the columns of y. This can be Pearson, Spearman or Kendall correlation.

Usage

weighted.cor2(x, y = NULL, weights = NULL, method = "pearson", na.rm = FALSE)
weighted.cor2(x, y = NULL, weights = NULL, method = "pearson", na.rm = FALSE)

Arguments

`x`	a data frame of numeric vectors
`y`	an optional data frame of numeric vectors. Default is NULL, which means that correlations between the columns of `x` are computed.
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`method`	a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman".
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.

Value

a matrix of correlations

Author(s)

Nicolas Robette

Examples

data(Movies)
weighted.cor2(Movies[,c("Budget", "Critics", "BoxOffice")], weights = rep(c(.8,1.2), 500))
data(Movies)
weighted.cor2(Movies[,c("Budget", "Critics", "BoxOffice")], weights = rep(c(.8,1.2), 500))

Weighted covariance

Description

Computes the weighted covariance between two distributions.

Usage

weighted.cov(x, y, weights = NULL, na.rm = FALSE)
weighted.cov(x, y, weights = NULL, na.rm = FALSE)

Arguments

`x`	numeric vector
`y`	numeric vector
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.

Value

a length-one numeric vector

Author(s)

Nicolas Robette

Examples

data(Movies)
weighted.cov(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500))
data(Movies)
weighted.cov(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500))

Weighted covariances

Description

Computes a matrix of weighted covariances between the columns of x and the columns of y.

Usage

weighted.cov2(x, y = NULL, weights = NULL, na.rm = FALSE)
weighted.cov2(x, y = NULL, weights = NULL, na.rm = FALSE)

Arguments

`x`	a data frame of numeric vectors
`y`	an optional data frame of numeric vectors. Default is NULL, which means that covariances between the columns of `x` are computed.
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.

Value

a matrix of covariances

Author(s)

Nicolas Robette

Examples

data(Movies)
weighted.cov2(Movies[,c("Budget", "Critics", "BoxOffice")], weights = rep(c(.8,1.2), 500))
data(Movies)
weighted.cov2(Movies[,c("Budget", "Critics", "BoxOffice")], weights = rep(c(.8,1.2), 500))

Cramer's V

Description

Computes Cramer's V measure of association between two (possibly weighted) categorical variables

Usage

weighted.cramer(x, y, weights = NULL, na.rm = FALSE)
weighted.cramer(x, y, weights = NULL, na.rm = FALSE)

Arguments

`x`	the first categorical variable
`y`	the second categorical variable
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds.

Value

Numerical value with Cramer's V.

Author(s)

Nicolas Robette

References

Rakotomalala R., 'Comprendre la taille d'effet (effect size)', http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf

Examples

data(Movies)
weighted.cramer(Movies$Country, Movies$ArtHouse)
data(Movies)
weighted.cramer(Movies$Country, Movies$ArtHouse)

Weighted median absolute deviation to median

Description

Computes the weighted median absolute deviation to median (aka MAD) of a distribution.

Usage

weighted.mad(x, weights = NULL, na.rm = FALSE)
weighted.mad(x, weights = NULL, na.rm = FALSE)

Arguments

`x`	numeric vector
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.

Value

a length-one numeric vector

Author(s)

Nicolas Robette

Examples

data(Movies)
weighted.mad(Movies$Critics, weights = rep(c(.8,1.2), 500))
data(Movies)
weighted.mad(Movies$Critics, weights = rep(c(.8,1.2), 500))

Weighted quantiles

Description

Computes the weighted quantiles of a distribution.

Usage

weighted.quantile(x, weights = NULL, probs = seq(0, 1, 0.25),
                  na.rm = FALSE, names = FALSE)
weighted.quantile(x, weights = NULL, probs = seq(0, 1, 0.25),
                  na.rm = FALSE, names = FALSE)

Arguments

`x`	numeric vector whose sample quantiles are wanted
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`probs`	numeric vector of probabilities with values in [0,1]
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.
`names`	logical. if TRUE, the result has a names attribute. Default is FALSE.

Value

A numeric vector of the same length as probs argument.

Note

This function is taken from https://stackoverflow.com/questions/2748725/is-there-a-weighted-median-function

Examples

data(Movies)
weighted.quantile(Movies$Critics, weights = rep(c(.8,1.2), 500), names = TRUE)
data(Movies)
weighted.quantile(Movies$Critics, weights = rep(c(.8,1.2), 500), names = TRUE)

Weighted standard deviation

Description

Computes the weighted standard deviation of a distribution.

Usage

weighted.sd(x, weights = NULL, na.rm = FALSE)
weighted.sd(x, weights = NULL, na.rm = FALSE)

Arguments

`x`	numeric vector
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE.

Value

a length-one numeric vector

Author(s)

Nicolas Robette

Examples

data(Movies)
weighted.sd(Movies$Critics, weights = rep(c(.8,1.2), 500))
data(Movies)
weighted.sd(Movies$Critics, weights = rep(c(.8,1.2), 500))

Computes a (possibly weighted) contingency table

Description

Computes a contingency table from one or two vectors, with the possibility of specifying weights.

Usage

weighted.table(x, y = NULL, weights = NULL, stat = "freq",
              mar = FALSE, na.rm = FALSE, na.value = "NAs", digits = 1)weighted.table(x, y = NULL, weights = NULL, stat = "freq",
              mar = FALSE, na.rm = FALSE, na.value = "NAs", digits = 1)

Arguments

`x`	an object which can be interpreted as factor
`y`	an optional object which can be interpreted as factor
`weights`	numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
`stat`	character. Whether to compute a contingency table ("freq", default), percentages ("prop"), row percentages ("rprop") or column percentages ("cprop").
`mar`	logical, indicating whether to compute margins. Default is FALSE.
`na.rm`	logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).
`na.value`	character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.
`digits`	integer indicating the number of decimal places (default is 1)

Value

Returns a contingency table.

Author(s)

Nicolas Robette

Examples

data(Movies)
weighted.table(Movies$Country, Movies$ArtHouse)
data(Movies)
weighted.table(Movies$Country, Movies$ArtHouse)

Package 'descriptio'

Help Index

Measures the association between a categorical variable and a continuous variable

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Measures the groupwise association between a categorical variable and a continuous variable

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Cross-tabulation and measures of association between two categorical variables

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Groupwise cross-tabulation and measures of association between two categorical variables

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Measures the association between two continuous variables

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Measures the groupwise association between two continuous variables

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Bivariate association measures between pairs of variables.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Bivariate association measures between a response and predictor variables.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Measures the association between a categorical variable and some continuous and/or categorical variables

Description

Usage

Arguments

Value

Note