Title: | Descriptive Statistical Analysis |
---|---|
Description: | Description of statistical associations between variables : measures of local and global association between variables (phi, Cramér V, correlations, eta-squared, Goodman and Kruskal tau, permutation tests, etc.), multiple graphical representations of the associations between variables (using 'ggplot2') and weighted statistics. |
Authors: | Nicolas Robette [aut, cre] |
Maintainer: | Nicolas Robette <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.4 |
Built: | 2025-02-06 05:13:04 UTC |
Source: | https://github.com/nicolas-robette/descriptio |
Measures the association between a categorical variable and a continuous variable
assoc.catcont(x, y, weights = NULL, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, nperm = NULL, distrib = "asympt", digits = 3)
assoc.catcont(x, y, weights = NULL, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, nperm = NULL, distrib = "asympt", digits = 3)
x |
the categorical variable (must be a factor) |
y |
the continuous variable (must be a numeric vector) |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm.cat |
logical, indicating whether NA values in the categorical variable (i.e. x) should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variable (see na.value.cat argument). |
na.value.cat |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE. |
na.rm.cont |
logical, indicating whether NA values in the continuous variable (i.e. y) should be silently removed before the computation proceeds. Default is FALSE. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
digits |
integer. The number of digits (default is 3). |
A list with the following elements :
summary |
summary statistics (mean, median, etc.) of the continuous variable for each level of the categorical variable |
eta.squared |
eta-squared between the two variables |
permutation.pvalue |
p-value from a permutation (i.e. non-parametric) test of independence |
cor |
point biserial correlation between the two variables, for each level of the categorical variable |
cor.perm.pval |
permutation p-value of the correlation between the two variables, for each level of the categorical variable |
test.values |
test-values as proposed by Lebart et al (1984) |
test.values.pval |
p-values corresponding to the test-values |
Nicolas Robette
Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]
Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.
assoc.twocat
, assoc.twocont
, assoc.yx
, condesc
,
catdesc
, darma
data(Movies) with(Movies, assoc.catcont(Country, Budget, nperm = 10))
data(Movies) with(Movies, assoc.catcont(Country, Budget, nperm = 10))
Measures the association between a categorical variable and a continuous variable, for each category of a group variable
assoc.catcont.by(x, y, by, weights = NULL, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, nperm = NULL, distrib = "asympt", digits = 3)
assoc.catcont.by(x, y, by, weights = NULL, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, nperm = NULL, distrib = "asympt", digits = 3)
x |
factor : the categorical variable |
y |
numeric vector : the continuous variable |
by |
factor : the group variable |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm.cat |
logical, indicating whether NA values in the categorical variable (i.e. x) should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variable (see na.value.cat argument). |
na.value.cat |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE. |
na.rm.cont |
logical, indicating whether NA values in the continuous variable (i.e. y) should be silently removed before the computation proceeds. Default is FALSE. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
digits |
integer. The number of digits (default is 3). |
A list of items, one for each category of the group variable. Each item is a list with the following elements :
summary |
summary statistics (mean, median, etc.) of the continuous variable for each level of the categorical variable |
eta.squared |
eta-squared between the two variables |
permutation.pvalue |
p-value from a permutation (i.e. non-parametric) test of independence |
cor |
point biserial correlation between the two variables, for each level of the categorical variable |
cor.perm.pval |
permutation p-value of the correlation between the two variables, for each level of the categorical variable |
test.values |
test-values as proposed by Lebart et al (1984) |
test.values.pval |
p-values corresponding to the test-values |
Nicolas Robette
Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]
Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.
assoc.catcont
, assoc.twocat
, assoc.twocont
, assoc.yx
, condesc
,
catdesc
, darma
data(Movies) with(Movies, assoc.catcont.by(Country, Budget, ArtHouse, nperm = 10))
data(Movies) with(Movies, assoc.catcont.by(Country, Budget, ArtHouse, nperm = 10))
Cross-tabulation and measures of association between two categorical variables
assoc.twocat(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", nperm = NULL, distrib = "asympt")
assoc.twocat(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", nperm = NULL, distrib = "asympt")
x |
the first categorical variable (must be a factor) |
y |
the second categorical variable (must be a factor) |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
A list of lists with the following elements :
tables
list :
freq |
cross-tabulation frequencies |
prop |
percentages |
rprop |
row percentages |
cprop |
column percentages |
expected |
expected values |
global
list :
chi.squared |
chi-squared value |
cramer.v |
Cramer's V between the two variables |
permutation.pvalue |
p-value from a permutation (i.e. non-parametric) test of independence |
global.pem |
global PEM |
GK.tau.xy |
Goodman and Kruskal tau (forward association, i.e. x is the predictor and y is the response) |
GK.tau.yx |
Goodman and Kruskal tau (backward association, i.e. y is the predictor and x is the respons) |
local
list :
std.residuals |
the table of standardized (i.e. Pearson) residuals. |
adj.residuals |
the table of adjusted standardized residuals. |
adj.res.pval |
the table of p-values of adjusted standardized residuals. |
odds.ratios |
the table of odds ratios. |
local.pem |
the table of local PEM |
phi |
the table of the phi coefficients for each pair of levels |
phi.perm.pval |
the table of permutation p-values for each pair of levels |
gather
: a data frame gathering informations, with one row per cell of the cross-tabulation.
The adjusted standardized residuals are strictly equivalent to test-values for nominal variables as proposed by Lebart et al (1984).
Nicolas Robette
Agresti, A. (2007). An Introduction to Categorical Data Analysis, 2nd ed. New York: John Wiley & Sons.
Rakotomalala R., Comprendre la taille d'effet (effect size), http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf
Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.
assoc.catcont
, assoc.twocont
, assoc.yx
, condesc
,
catdesc
, darma
data(Movies) assoc.twocat(Movies$Country, Movies$ArtHouse, nperm=100)
data(Movies) assoc.twocat(Movies$Country, Movies$ArtHouse, nperm=100)
Cross-tabulation and measures of association between two categorical variables, for each category of a group variable
assoc.twocat.by(x, y, by, weights = NULL, na.rm = FALSE, na.value = "NAs", nperm = NULL, distrib = "asympt")
assoc.twocat.by(x, y, by, weights = NULL, na.rm = FALSE, na.value = "NAs", nperm = NULL, distrib = "asympt")
x |
factor : the first categorical variable |
y |
factor : the second categorical variable |
by |
factor : the group variable |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
A list of items, one for each category of the group variable. Each item is a list of lists with the following elements :
tables
list :
freq |
cross-tabulation frequencies |
prop |
percentages |
rprop |
row percentages |
cprop |
column percentages |
expected |
expected values |
global
list :
chi.squared |
chi-squared value |
cramer.v |
Cramer's V between the two variables |
permutation.pvalue |
p-value from a permutation (i.e. non-parametric) test of independence |
global.pem |
global PEM |
GK.tau.xy |
Goodman and Kruskal tau (forward association, i.e. x is the predictor and y is the response) |
GK.tau.yx |
Goodman and Kruskal tau (backward association, i.e. y is the predictor and x is the respons) |
local
list :
std.residuals |
the table of standardized (i.e.Pearson) residuals. |
adj.residuals |
the table of adjusted standardized residuals. |
adj.res.pval |
the table of p-values of adjusted standardized residuals. |
odds.ratios |
the table of odds ratios. |
local.pem |
the table of local PEM |
phi |
the table of the phi coefficients for each pair of levels |
phi.perm.pval |
the table of permutation p-values for each pair of levels |
gather
: a data frame gathering informations, with one row per cell of the cross-tabulation.
The adjusted standardized residuals are strictly equivalent to test-values for nominal variables as proposed by Lebart et al (1984).
Nicolas Robette
Agresti, A. (2007). An Introduction to Categorical Data Analysis, 2nd ed. New York: John Wiley & Sons.
Rakotomalala R., Comprendre la taille d'effet (effect size), http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf
Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.
assoc.twocat
, assoc.catcont
, assoc.twocont
, assoc.yx
, condesc
,
catdesc
, darma
data(Movies) assoc.twocat.by(Movies$Country, Movies$ArtHouse, Movies$Festival, nperm=100)
data(Movies) assoc.twocat.by(Movies$Country, Movies$ArtHouse, Movies$Festival, nperm=100)
Measures the association between two continuous variables with Pearson, Spearman and Kendall correlations.
assoc.twocont(x, y, weights = NULL, na.rm = FALSE, nperm = NULL, distrib = "asympt")
assoc.twocont(x, y, weights = NULL, na.rm = FALSE, nperm = NULL, distrib = "asympt")
x |
a continuous variable (must be a numeric vector) |
y |
a continuous variable (must be a numeric vector) |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
A data frame with Pearson, Spearman and Kendall correlations. The correlation value is in the first row and a p-value from a permutation (so non parametric) test of independence is in the second row.
Nicolas Robette
assoc.twocat
, assoc.catcont
, assoc.yx
, condesc
,
catdesc
, darma
## Hollander & Wolfe (1973), p. 187f. ## Assessment of tuna quality. We compare the Hunter L measure of ## lightness to the averages of consumer panel scores (recoded as ## integer values from 1 to 6 and averaged over 80 such values) in ## 9 lots of canned tuna. x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1) y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8) assoc.twocont(x,y,nperm=100)
## Hollander & Wolfe (1973), p. 187f. ## Assessment of tuna quality. We compare the Hunter L measure of ## lightness to the averages of consumer panel scores (recoded as ## integer values from 1 to 6 and averaged over 80 such values) in ## 9 lots of canned tuna. x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1) y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8) assoc.twocont(x,y,nperm=100)
Measures the association between two continuous variables with Pearson, Spearman and Kendall correlations, for each category of a group variable.
assoc.twocont.by(x, y, by, weights = NULL, na.rm = FALSE, nperm = NULL, distrib = "asympt")
assoc.twocont.by(x, y, by, weights = NULL, na.rm = FALSE, nperm = NULL, distrib = "asympt")
x |
numeric vector : a continuous variable |
y |
numeric vector : a continuous variable |
by |
factor : the group variable |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
A list of items, one for each category of the groupe variable. Each item is a data frame with Pearson, Spearman and Kendall correlations. The correlation value is in the first row and a p-value from a permutation (so non parametric) test of independence is in the second row.
Nicolas Robette
assoc.twocont
, assoc.twocat
, assoc.catcont
, assoc.yx
, condesc
,
catdesc
, darma
## Hollander & Wolfe (1973), p. 187f. ## Assessment of tuna quality. We compare the Hunter L measure of ## lightness to the averages of consumer panel scores (recoded as ## integer values from 1 to 6 and averaged over 80 such values) in ## 9 lots of canned tuna. x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1) y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8) group <- factor(c("A","B","C","C","B","A","A","C","B")) assoc.twocont.by(x,y,group,nperm=100)
## Hollander & Wolfe (1973), p. 187f. ## Assessment of tuna quality. We compare the Hunter L measure of ## lightness to the averages of consumer panel scores (recoded as ## integer values from 1 to 6 and averaged over 80 such values) in ## 9 lots of canned tuna. x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1) y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8) group <- factor(c("A","B","C","C","B","A","A","C","B")) assoc.twocont.by(x,y,group,nperm=100)
Computes bivariate association measures between every pairs of variables from a data frame.
assoc.xx(x, weights = NULL, correlation = "kendall", na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, nperm = NULL, distrib = "asympt", dec = c(3,3))
assoc.xx(x, weights = NULL, correlation = "kendall", na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, nperm = NULL, distrib = "asympt", dec = c(3,3))
x |
the data frame of variables |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
correlation |
character. The type of measure of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default). |
na.rm.cat |
logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument). |
na.value.cat |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE. |
na.rm.cont |
logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
dec |
vector of 2 integers for number of decimals. The first value if for association measures, the second for permutation p-values. Default is c(3,3). |
The function computes an association measure : Pearson's, Spearman's or Kendall's correlation for pairs of numeric variables, Cramer's V for pairs of factors and eta-squared for pairs numeric-factor. It can also compute the p-value of a permutation test of association for each pair of variables.
A table with the following elements :
measure |
: name of the association measure |
association |
: value of the association measure |
permutation.pvalue |
: p-value from the permutation test |
Nicolas Robette
darma
, assoc.twocat
, assoc.twocont
, assoc.catcont
, condesc
, catdesc
, assoc.yx
data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") assoc.xx(iris2, nperm = 10)
data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") assoc.xx(iris2, nperm = 10)
Computes bivariate association measures between a response and predictor variables (and, optionnaly, between every pairs of predictor variables.)
assoc.yx(y, x, weights = NULL, xx = TRUE, correlation = "kendall", na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, nperm = NULL, distrib = "asympt", dec = c(3,3))
assoc.yx(y, x, weights = NULL, xx = TRUE, correlation = "kendall", na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, nperm = NULL, distrib = "asympt", dec = c(3,3))
y |
the response variable |
x |
the predictor variables |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
xx |
whether the association measures should be computed for couples of predictor variables (default) or not. With a lot of predictors, consider setting xx to FALSE (for reasons of computation time). |
correlation |
character. The type of measure of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default). |
na.rm.cat |
logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument). |
na.value.cat |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE. |
na.rm.cont |
logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
dec |
vector of 2 integers for number of decimals. The first value if for association measures, the second for permutation p-values. Default is c(3,3). |
The function computes an association measure : Pearson's, Spearman's or Kendall's correlation for pairs of numeric variables, Cramer's V for pairs of factors and eta-squared for pairs numeric-factor. It can also compute the p-value of a permutation test of association for each pair of variables.
A list of the following items :
YX |
: a table with the association measures between the response and predictor variables |
XX |
: a table with the association measures between every pairs of predictor variables |
In each table :
measure |
: name of the association measure |
association |
: value of the association measure |
permutation.pvalue |
: p-value from the permutation test |
Nicolas Robette
darma
, assoc.twocat
, assoc.twocont
, assoc.catcont
, condesc
, catdesc
data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") assoc.yx(iris2$Species,iris2[,1:4],nperm=10)
data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") assoc.yx(iris2$Species,iris2[,1:4],nperm=10)
Measures the association between a categorical variable and some continuous and/or categorical variables
catdesc(y, x, weights = NULL, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, measure = "phi", limit = NULL, correlation = "kendall", robust = TRUE, nperm = NULL, distrib = "asympt", digits = 2)
catdesc(y, x, weights = NULL, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, measure = "phi", limit = NULL, correlation = "kendall", robust = TRUE, nperm = NULL, distrib = "asympt", digits = 2)
y |
the categorical variable to describe (must be a factor) |
x |
a data frame with continuous and/or categorical variables |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm.cat |
logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument). |
na.value.cat |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE. |
na.rm.cont |
logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE. |
measure |
character. The measure of local association between categories of categorical variables. Can be "phi" for phi coefficient (default), "or" for odds ratios, "std.residuals" for standardized (i.e. Pearson) residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence. |
limit |
for the relationship between y and a categorical variable, only associations higher or equal to |
correlation |
character. The type of measure of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default). |
robust |
logical. If TRUE (default), median and mad are used instead of mean and standard deviation. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
digits |
numeric. Number of digits for mean, median, standard deviation and mad. Default is 2. |
A list of the following items :
variables |
associations between y and the variables in x |
bylevel |
a list with one element for each level of y |
Each element in bylevel has the following items :
categories |
a data frame with categorical variables from x and local associations |
continuous.var |
a data frame with continuous variables from x and associations measured by correlation coefficients |
If nperm is not NULL, permutation tests of independence are computed and the p-values from these tests are provided.
Nicolas Robette
Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]
catdes
, condesc
, assoc.yx
, darma
data(Movies) catdesc(Movies$ArtHouse, Movies[,c("Budget","Genre","Country")])
data(Movies) catdesc(Movies$ArtHouse, Movies[,c("Budget","Genre","Country")])
Computes bivariate statistics for a set of variables according to the subgroups of observations defined by a categorical variable.
cattab(x, y, weights = NULL, percent = "column", robust = TRUE, show.n = TRUE, show.asso = TRUE, digits = c(1,1), na.rm = TRUE, na.value = "NAs")
cattab(x, y, weights = NULL, percent = "column", robust = TRUE, show.n = TRUE, show.asso = TRUE, digits = c(1,1), na.rm = TRUE, na.value = "NAs")
x |
data frame. The variables which are described in rows. They can be numerical or factors. |
y |
factor. The categorical variable which defines subgroups of observations described in columns. |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
percent |
character. Whether to compute row percentages ("row") or column percentages ("column", default). |
robust |
logical. Whether to use medians instead of means. Default is TRUE. |
show.n |
logical. Whether to display frequencies (between brackets) in addition to the percentages. Default is TRUE. |
show.asso |
logical. Whether to add a column with measures of global association (Cramer's V and eta-squared). Default is TRUE. |
digits |
vector of 2 integers. The first value sets the number of digits for percentages, the second one sets the number of digits for medians and means. Default is c(1,1). If NULL, the results are not rounded. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
The function uses gtsummary
package to build the table of statistics, and then gt
package to finalize the layout. Weights are handled silently with survey
package.
Besides, the function is compatible with the attribute labels assigned with labelled
package : these labels are displayed automatically.
An object of class gt_tbl
.
This function is quite similar to profiles
, but displays the results in a fancier way.
Nicolas Robette
catdesc
, assoc.yx
, darma
, assoc.twocat
,
assoc.twocat.by
, profiles
data(Movies) cattab(x = Movies[, c("Genre", "ArtHouse", "Critics", "BoxOffice")], y = Movies$Country)
data(Movies) cattab(x = Movies[, c("Genre", "ArtHouse", "Critics", "BoxOffice")], y = Movies$Country)
Measures the association between a continuous variable and some continuous and/or categorical variables
condesc(y, x, weights = NULL, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, limit = NULL, correlation = "kendall", robust = TRUE, nperm = NULL, distrib = "asympt", digits = 2)
condesc(y, x, weights = NULL, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, limit = NULL, correlation = "kendall", robust = TRUE, nperm = NULL, distrib = "asympt", digits = 2)
y |
the continuous variable to describe |
x |
a data frame with continuous and/or categorical variables |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm.cat |
logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument). |
na.value.cat |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE. |
na.rm.cont |
logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE. |
limit |
for the relationship between y and a category of a categorical variable, only associations (point-biserial correlations) higher or equal to |
correlation |
character. The type of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default). |
robust |
logical. If TRUE (default), meadian and mad are used instead of mean and standard deviation. |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
digits |
numeric. Number of digits for mean, median, standard deviation and mad. Default is 2. |
A list of the following items :
variables |
associations between y and the variables in x |
categories |
a data frame with categorical variables from x and associations measured by point biserial correlation. |
If nperm is not NULL, permutation tests of independence are computed and the p-values from these tests are provided.
Nicolas Robette
Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]
condes
, catdesc
, assoc.yx
, darma
data(Movies) condesc(Movies$BoxOffice, Movies[,c("Budget","Genre","Country")])
data(Movies) condesc(Movies$BoxOffice, Movies[,c("Budget","Genre","Country")])
Computes bivariate statistics between a continuous variable and a set of variables, possibly according to a strata variable.
contab(x, y, strata = NULL, weights = NULL, robust = TRUE, digits = c(1,3), na.rm = TRUE, na.value = "NAs")
contab(x, y, strata = NULL, weights = NULL, robust = TRUE, digits = c(1,3), na.rm = TRUE, na.value = "NAs")
x |
data frame. The variables which are described in rows. They can be numerical or factors. |
y |
factor. The categorical variable which defines subgroups of observations described in columns. |
strata |
optional categorical variable to stratify the table by column. Default is NULL, which means no strata. |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
robust |
logical. Whether to use medians (and mads) instead of means (and standard deviations). Default is TRUE. |
digits |
vector of 2 integers. The first value sets the number of digits for medians, mads, means and standard deviations (categorical variables). The second one sets the number of digits for slopes (continuous variables). Default is c(1,3). If NULL, the results are not rounded. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables with NA values (see |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
For categorical variables in x
, the function computes :
- column 1 : the median and the mad of y
for each level of the variable
- column 2 : the global association between the variable and y
, measured by the eta-squared
For continous variables in x
, it computes :
- column 1 : the slope of the linear regression of y
according to the variable
- column 2 : the global association between the variable and y
, measured by Pearson and Spearman correlations
An object of class gt_tbl
.
Nicolas Robette
regtab
, condesc
, assoc.yx
, darma
, assoc.twocont
,
assoc.twocont.by
data(Movies) contab(x = Movies[, c("Genre", "ArtHouse", "Budget")], y = Movies$BoxOffice)
data(Movies) contab(x = Movies[, c("Genre", "ArtHouse", "Budget")], y = Movies$BoxOffice)
Displays pretty 2, 3 or 4-way cross-tabulations, from possibly weighted data, and with the opportunity to color the cells of the table according to a local measure of association (phi coefficients, standardized residuals or PEM).
crosstab(x, y, xstrata = NULL, ystrata = NULL, weights = NULL, stat = "rprop", show.n = FALSE, show.cramer = TRUE, na.rm = FALSE, na.value = "NAs", digits = 1, sort = "none", color.cells = FALSE, measure = "phi", limits = c(-1, 1), min.asso = 0.1, palette = "PRGn", reverse = FALSE)
crosstab(x, y, xstrata = NULL, ystrata = NULL, weights = NULL, stat = "rprop", show.n = FALSE, show.cramer = TRUE, na.rm = FALSE, na.value = "NAs", digits = 1, sort = "none", color.cells = FALSE, measure = "phi", limits = c(-1, 1), min.asso = 0.1, palette = "PRGn", reverse = FALSE)
x |
the row categorical variable |
y |
the column categorical variable |
xstrata |
optional categorical variable to stratify the table by rows. Default is NULL, which means no row strata. |
ystrata |
optional categorical variable to stratify the table by columns. Default is NULL, which means no column strata. |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
stat |
character. Whether to compute a contingency table ("freq", default), percentages ("prop"), row percentages ("rprop") or column percentages ("cprop"). |
show.n |
logical. Whether to display frequencies (between brackets) in addition to the percentages. Ignored if stat = "freq". Default is FALSE. |
show.cramer |
logical. If TRUE (default), Cramer's V measure of association is displayed beside the table. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
digits |
integer. The number of digits (default is 1). If NULL, the results are not rounded. |
sort |
character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done. |
color.cells |
logical, indicating whether the cells of the table should be colored according to local measures of association. Default is FALSE. |
measure |
character. The measure of association used to color the cells. Can be "phi" for phi coefficient (default), "std.residuals" for standardized residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence. Only used if color.cells = TRUE. |
limits |
a numeric vector of length 2 providing limits of the scale. Default is c(-1,1). Only used if color.cells = TRUE. |
min.asso |
numerical value. The cells with a local association below min.asso (in absolute value) are kept blank. Only used if color.cells = TRUE. |
palette |
The colours or colour function that values will be mapped to (see details). |
reverse |
Whether the colors (or color function) in palette should be used in reverse order. For example, if the default order of a palette goes from blue to green, then reverse = TRUE will result in the colors going from green to blue. Default is FALSE. Only used if color.cells = TRUE. |
The function uses gtsummary
package to build the cross-tabulation, and then gt
package to finalize the layout and color the cells. Weights are handled silently with survey
package.
Besides, the function is compatible with the attribute labels assigned with labelled
package : these labels are displayed automatically.
The palette
argument can be any of the following :
1. A character vector of RGB or named colours. Examples: palette(), c("#000000", "#0000FF", "#FFFFFF"), topo.colors(10)
2. The name of an RColorBrewer
palette, e.g. "BuPu" or "Greens".
3. The full name of a viridis
palette: "viridis", "magma", "inferno", or "plasma".
4. A function that receives a single value between 0 and 1 and returns a colour. Examples: colorRamp(c("#000000", "#FFFFFF"), interpolate="spline").
An object of class gt_tbl
.
Example 1
Example 2
Nicolas Robette
assoc.twocat
,weighted.table
, phi.table
data(Movies) # example 1 crosstab(Movies$Genre, Movies$Country) # example 2 with(Movies, crosstab(Genre, Country, ystrata = ArtHouse, show.n = TRUE, color.cells = TRUE))
data(Movies) # example 1 crosstab(Movies$Genre, Movies$Country) # example 2 with(Movies, crosstab(Genre, Country, ystrata = ArtHouse, show.n = TRUE, color.cells = TRUE))
Computes bivariate association measures between a response and predictor variables, producing a summary looking like a regression analysis.
darma(y, x, weights = NULL, target = 1, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, correlation = "kendall", nperm = NULL, distrib = "asympt", dec = c(1,3,3))
darma(y, x, weights = NULL, target = 1, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, correlation = "kendall", nperm = NULL, distrib = "asympt", dec = c(1,3,3))
y |
the response variable |
x |
the predictor variables |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
target |
rank or name of the category of interest when y is categorical |
na.rm.cat |
logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument). |
na.value.cat |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE. |
na.rm.cont |
logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE. |
correlation |
character. The type of measure of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default). |
nperm |
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. |
distrib |
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ( |
dec |
vector of 3 integers for number of decimals. The first value if for percents or medians, the second for association measures, the third for permutation p-values. Default is c(1,3,3). |
The function computes association measures (phi, correlation coefficient, Kendall's correlation) between the variable of interest and the other variables. It can also compute the p-values permutation tests.
A data frame
Nicolas Robette
assoc.yx
, assoc.twocat
, assoc.twocont
, assoc.catcont
, condesc
, catdesc
data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") darma(iris2$Species, iris2[,1:4], target=2, nperm=100)
data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") darma(iris2$Species, iris2[,1:4], target=2, nperm=100)
For a cross-tabulation, plots measures of local association with bars of varying height and width, using ggplot2.
ggassoc_assocplot(data, mapping, measure = "std.residuals", limits = NULL, sort = "none", na.rm = FALSE, na.value = "NAs", colors = NULL, direction = 1, legend = "right")
ggassoc_assocplot(data, mapping, measure = "std.residuals", limits = NULL, sort = "none", na.rm = FALSE, na.value = "NAs", colors = NULL, direction = 1, legend = "right")
data |
dataset to use for plot |
mapping |
aesthetics being used. x and y are required, weight can also be specified. |
measure |
character. The measure of association used to fill the rectangles. Can be "phi" for phi coefficient, "or" for odds ratios, "std.residuals" (default) for standardized (i.e. Pearson) residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence. |
limits |
a numeric vector of length two providing limits of the scale. If NULL (default), the limits are automatically adjusted to the data. |
sort |
character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
colors |
vector of colors that will be interpolated to produce a color gradient. If NULL (default), the "Temps" palette from |
direction |
Sets the order of colours in the scale. If 1, the default, colours are as output by RColorBrewer::brewer.pal(). If -1, the order of colours is reversed. |
legend |
the position of legend ("none", "left", "right", "bottom", "top"). If "none", no legend is displayed. |
The measure of local association measures how much each combination of categories of x and y is over/under-represented.
The bars vary in width according to the square root of the expected frequency. They vary in height and color shading according to the measure of association. If the measure chosen is "std.residuals" (Pearson's residuals), as in the original association plot from Cohen and Friendly, the area of the bars is proportional to the difference in observed and expected frequencies.
This function can be used as a high-level plot with ggduo
and ggpairs
functions of the GGally
package.
a ggplot object
Nicolas Robette
Cohen, A. (1980), On the graphical display of the significant components in a two-way contingency table. Communications in Statistics—Theory and Methods, 9, 1025–1041. doi:10.1080/03610928008827940.
Friendly, M. (1992), Graphical methods for categorical data. SAS User Group International Conference Proceedings, 17, 190–200. http://datavis.ca/papers/sugi/sugi17.pdf
assoc.twocat
, phi.table
, catdesc
,
assoc.yx
, darma
, ggassoc_crosstab
, ggpairs
data(Movies) ggassoc_assocplot(data=Movies, mapping=ggplot2::aes(Country, Genre))
data(Movies) ggassoc_assocplot(data=Movies, mapping=ggplot2::aes(Country, Genre))
For a cross-tabulation, plots bars for the conditional percentages of variable y according to variable x, using ggplot2. The general display is inspired by Bertin's plots.
ggassoc_bertin(data, mapping, prop.width = FALSE, sort = "none", add.gray = FALSE, add.rprop = FALSE, na.rm = FALSE, na.value ="NAs")
ggassoc_bertin(data, mapping, prop.width = FALSE, sort = "none", add.gray = FALSE, add.rprop = FALSE, na.rm = FALSE, na.value ="NAs")
data |
dataset to use for plot |
mapping |
aesthetics being used. x and y are required, weight can also be specified. |
prop.width |
logical. If TRUE, the width of the bars is proportional to the margin percentages of variable x. |
sort |
character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only variable x is sorted. If "y", only variable y is sorted. If "none" (default), no sorting is done. |
add.gray |
logical. If FALSE (default), only white and black are used to fill the bars. If TRUE, gray is used additionally to fill the part of the bars corresponding to margin percentages of variable y. |
add.rprop |
logical. If TRUE, row percentages are displayed on top of the bars. Default is FALSE. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
The height of the bars is proportional to the conditional frequency of variable y. The bars are filled in black if the conditional frequency is higher than the marginal frequency; otherwise it's filled in white.
This graphical representation is inspired by the principles of Jacques Bertin and the online AMADO tool (https://paris-timemachine.huma-num.fr/amado/main.html).
Note : It does not allow faceting.
a ggplot object
Nicolas Robette
J. Bertin: La graphique et le traitement graphique de l'information. Flammarion: Paris 1977.
assoc.twocat
, phi.table
, catdesc
,
ggassoc_crosstab
, ggassoc_assocplot
,
ggassoc_phiplot
, ggassoc_chiasmogram
data(Movies) ggassoc_bertin(Movies, ggplot2::aes(x = Country, y = Genre)) ggassoc_bertin(Movies, ggplot2::aes(x = Country, y = Genre), sort = "both", prop.width = TRUE, add.gray = 3, add.rprop = TRUE)
data(Movies) ggassoc_bertin(Movies, ggplot2::aes(x = Country, y = Genre)) ggassoc_bertin(Movies, ggplot2::aes(x = Country, y = Genre), sort = "both", prop.width = TRUE, add.gray = 3, add.rprop = TRUE)
Displays of boxplot and combines it with a violin plot, using ggplot2.
ggassoc_boxplot(data, mapping, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, axes.labs = TRUE, ticks.labs = TRUE, text.size = 3, sort = FALSE, box = TRUE, notch = FALSE, violin = TRUE)
ggassoc_boxplot(data, mapping, na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE, axes.labs = TRUE, ticks.labs = TRUE, text.size = 3, sort = FALSE, box = TRUE, notch = FALSE, violin = TRUE)
data |
dataset to use for plot |
mapping |
aesthetic being used. It must specify x and y. |
na.rm.cat |
logical, indicating whether NA values in the categorical variable (i.e. x) should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variable (see na.value.cat argument). |
na.value.cat |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
na.rm.cont |
logical, indicating whether NA values in the continuous variable (i.e. y) should be silently removed before the computation proceeds. Default is FALSE. |
axes.labs |
Whether to display the labels of the axes, i.e. the names of x and y. Default is TRUE. |
ticks.labs |
Whether to display the labels of the categories of x and y. Default is TRUE. |
text.size |
Size of the association measure. If NULL, the text is not added to the plot. |
sort |
logical. If TRUE, the levels of the categorical variable are reordered according to the conditional medians, so that boxplots are sorted. Default is FALSE. |
box |
Whether to draw boxplots. Default is TRUE. |
notch |
If FALSE (default) make a standard box plot. If TRUE, make a notched box plot. Notches are used to compare groups; if the notches of two boxes do not overlap, this suggests that the medians are significantly different. |
violin |
Whether to draw a violin plot. Default is TRUE. |
Eta-squared measure of global association between x and y is displayed in upper-left corner of the plot.
This function can be used as a high-level plot with ggduo
and ggpairs
functions of the GGally
package.
a ggplot object
Nicolas Robette
assoc.catcont
, condesc
, assoc.yx
,
darma
, ggpairs
data(Movies) ggassoc_boxplot(Movies, mapping = ggplot2::aes(x = Critics, y = ArtHouse))
data(Movies) ggassoc_boxplot(Movies, mapping = ggplot2::aes(x = Critics, y = ArtHouse))
For a cross-tabulation, plots the number of observations by using rectangles with proportional areas, and the phi measures of association between the categories with a diverging gradient of colour, using ggplot2.
ggassoc_chiasmogram(data, mapping, measure = "phi", limits = NULL, sort = "none", na.rm = FALSE, na.value = "NAs", colors = NULL, direction = 1)
ggassoc_chiasmogram(data, mapping, measure = "phi", limits = NULL, sort = "none", na.rm = FALSE, na.value = "NAs", colors = NULL, direction = 1)
data |
dataset to use for plot |
mapping |
aesthetics being used. x and y are required, weight can also be specified. |
measure |
character. The measure of association used for filling the rectangles. Can be "phi" for phi coefficient (default), "or" for odds ratios, "residuals" for Pearson residuals, "std.residuals" for standardized Pearson residuals or "pem" for local percentages of maximum deviation from independence. |
limits |
a numeric vector of length two providing limits of the scale. If NULL (default), the limits are automatically adjusted to the data. |
sort |
character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
colors |
vector of colors that will be interpolated to produce a color gradient. If NULL (default), the "Temps" palette from |
direction |
Sets the order of colours in the scale. If 1, the default, colours are as output by RColorBrewer::brewer.pal(). If -1, the order of colours is reversed. |
The height of the rectangles is proportional to the marginal frequency of the row variable ; their width is proportional to the marginal frequency of the column variable. So the area of the rectangles is proportional to the expected frequency.
The rectangles are filled according to a measure of local association, which measures how much each combination of categories of x and y is over/under-represented.
This function can be used as a high-level plot with ggduo
and ggpairs
functions of the GGally
package.
Note : It does not allow faceting.
a ggplot object
Nicolas Robette
Bozon Michel, Héran François. La découverte du conjoint. II. Les scènes de rencontre dans l'espace social. Population, 43(1), 1988, pp. 121-150.
assoc.twocat
, phi.table
, catdesc
,
assoc.yx
, darma
, ggassoc_phiplot
, ggpairs
data(Movies) ggassoc_chiasmogram(data=Movies, mapping=ggplot2::aes(Genre, Country))
data(Movies) ggassoc_chiasmogram(data=Movies, mapping=ggplot2::aes(Genre, Country))
For a cross-tabulation, plots the observed (or expected) frequencies by using rectangles with proportional areas, and the measures of local association between the categories with a diverging gradient of colour, using ggplot2.
ggassoc_crosstab(data, mapping, size = "freq", max.size = 20, measure = "phi", limits = NULL, sort = "none", na.rm = FALSE, na.value = "NAs", colors = NULL, direction = 1, legend = "right")
ggassoc_crosstab(data, mapping, size = "freq", max.size = 20, measure = "phi", limits = NULL, sort = "none", na.rm = FALSE, na.value = "NAs", colors = NULL, direction = 1, legend = "right")
data |
dataset to use for plot |
mapping |
aesthetics being used. x and y are required, weight can also be specified. |
size |
character. If "freq" (default), areas are proportional to observed frequencies. If "expected", they are proportional to expected frequencies. |
max.size |
numeric value, specifying the maximum size of the squares. Default is 20. |
measure |
character. The measure of association used for filling the rectangles. Can be "phi" for phi coefficient (default), "or" for odds ratios, "std.residuals" for standardized residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence. |
limits |
a numeric vector of length two providing limits of the scale. If NULL (default), the limits are automatically adjusted to the data. |
sort |
character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
colors |
vector of colors that will be interpolated to produce a color gradient. If NULL (default), the "Temps" palette from |
direction |
Sets the order of colours in the scale. If 1, the default, colours are as output by RColorBrewer::brewer.pal(). If -1, the order of colours is reversed. |
legend |
the position of legend ("none", "left", "right", "bottom", "top"). If "none", no legend is displayed. |
The measure of local association measures how much each combination of categories of x and y is over/under-represented.
The areas of the rectangles are proportional to observed or expected frequencies. Their color shading varies according to the measure of association.
This function can be used as a high-level plot with ggduo
and ggpairs
functions of the GGally
package.
a ggplot object
Nicolas Robette
assoc.twocat
, phi.table
, catdesc
,
assoc.yx
, darma
, ggassoc_phiplot
, ggpairs
data(Movies) ggassoc_crosstab(data=Movies, mapping=ggplot2::aes(Genre, Country))
data(Movies) ggassoc_crosstab(data=Movies, mapping=ggplot2::aes(Genre, Country))
For a cross-tabulation, plots a marimekko chart (also called mosaic plot), using ggplot2.
ggassoc_marimekko(data, mapping, type = "classic", measure = "phi", limits = NULL, na.rm = FALSE, na.value = "NAs", palette = NULL, colors = NULL, direction = 1, linecolor = "gray60", linewidth = 0.1, sort = "none", legend = "right")
ggassoc_marimekko(data, mapping, type = "classic", measure = "phi", limits = NULL, na.rm = FALSE, na.value = "NAs", palette = NULL, colors = NULL, direction = 1, linecolor = "gray60", linewidth = 0.1, sort = "none", legend = "right")
data |
dataset to use for plot |
mapping |
aesthetics being used. x and y are required, weight can also be specified. |
type |
character. If "classic" (default), a simple marimekko chart is plotted, with no use of local associations. If type is "shades", tiles are shaded according to the local associations between categories. If type is "patterns", tiles are filled with patterns, and the density of patterns is proportional to the absolute level of local association between categories. |
measure |
character. The measure of association used for filling (if type is "shades) or patterning (if type is "patterns") the tiles. Can be "phi" for phi coefficient, "or" for odds ratios, "std.residuals" (default) for standardized (i.e. Pearson) residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence. |
limits |
a numeric vector of length two providing limits of the scale. If NULL (default), the limits are automatically adjusted to the data. Only used for type "shades". |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
palette |
A character vector of color codes. The number of colors should be equal or higher than the number of categories in y. If NULL (default), the "Tableau" palette from |
colors |
vector of colors that will be interpolated to produce a color gradient. If NULL (default), the "Temps" palette from |
direction |
Sets the order of colours in the scale. If 1, the default, colours are as output by RColorBrewer::brewer.pal(). If -1, the order of colours is reversed. |
linecolor |
character. Color of the contour lines of the tiles. Default is gray60. |
linewidth |
numeric. Width of the contour lines of the tiles. Default is 0.1. |
sort |
character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done. |
legend |
the position of legend ("none", "left", "right", "bottom", "top"). If "none", no legend is displayed. |
The measure of local association measures how much each combination of categories of x and y is over/under-represented.
This function can be used as a high-level plot with ggduo
and ggpairs
functions of the GGally
package.
Note : It does not allow faceting.
a ggplot object
Nicolas Robette
Hartigan, J.A., and Kleiner, B. (1984), "A mosaic of television ratings". The American Statistician, 38, 32–35.
Friendly, M. (1994), "Mosaic displays for multi-way contingency tables". Journal of the American Statistical Association, 89, 190–200.
assoc.twocat
, phi.table
, catdesc
,
assoc.yx
, darma
, ggassoc_crosstab
, ggpairs
data(Movies) ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country)) ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country), type = "patterns") ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country), type = "shades")
data(Movies) ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country)) ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country), type = "patterns") ggassoc_marimekko(data=Movies, mapping=ggplot2::aes(Genre, Country), type = "shades")
For a cross-tabulation, plots the measures of local association with bars of varying height, using ggplot2.
ggassoc_phiplot(data, mapping, measure = "phi", limit = NULL, sort = "none", na.rm = FALSE, na.value = "NAs")
ggassoc_phiplot(data, mapping, measure = "phi", limit = NULL, sort = "none", na.rm = FALSE, na.value = "NAs")
data |
dataset to use for plot |
mapping |
aesthetics being used. x and y are required, weight can also be specified. |
measure |
character. The measure of association used for filling the rectangles. Can be "phi" for phi coefficient (default), "or" for odds ratios, "std.residuals" for standardized residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence. |
limit |
numeric value, specifying the upper limit of the scale for the height of the bars, i.e. for the measures of association (the lower limit is set to 0-limit). It corresponds to the maximum absolute value of association one wants to represent in the plot. If NULL (default), the limit is automatically adjusted to the data. |
sort |
character. If "both", rows and columns are sorted according to the first factor of a correspondence analysis of the contingency table. If "x", only rows are sorted. If "y", only columns are sorted. If "none" (default), no sorting is done. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
The measure of association measures how much each combination of categories of x and y is over/under-represented. The bars vary in width according to the number of observations in the categories of the column variable. They vary in height according to the measure of association. Bars are black if the association is positive and white if it is negative.
The genuine version of this plot (see Cibois, 2004) uses the measure of association called "pem", i.e. the local percentages of maximum deviation from independence.
This function can be used as a high-level plot with ggduo
and ggpairs
functions of the GGally
package.
a ggplot object
Nicolas Robette
Cibois Philippe, 2004, Les écarts à l'indépendance. Techniques simples pour analyser des données d'enquêtes, Collection "Méthodes quantitatives pour les sciences sociales"
assoc.twocat
, phi.table
, catdesc
,
assoc.yx
, darma
, ggassoc_crosstab
, ggpairs
data(Movies) ggassoc_phiplot(data=Movies, mapping=ggplot2::aes(Country, Genre))
data(Movies) ggassoc_phiplot(data=Movies, mapping=ggplot2::aes(Country, Genre))
Displays of scatter plot and adds a smoothing line, using ggplot2.
ggassoc_scatter(data, mapping, na.rm = FALSE, axes.labs = TRUE, ticks.labs = TRUE, text.size = 3)
ggassoc_scatter(data, mapping, na.rm = FALSE, axes.labs = TRUE, ticks.labs = TRUE, text.size = 3)
data |
dataset to use for plot |
mapping |
aesthetic being used. It must specify x and y. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
axes.labs |
Whether to display the labels of the axes, i.e. the names of x and y. Default is TRUE. |
ticks.labs |
Whether to display the labels of the categories of x and y. Default is TRUE. |
text.size |
Size of the association measure. If NULL, the text is not added to the plot. |
Kendall's tau rank correlation between x and y is displayed in upper-left corner of the plot.
Smoothing is performed with gam.
This function can be used as a high-level plot with ggduo
and ggpairs
functions of the GGally
package.
a ggplot object
Nicolas Robette
assoc.twocont
, condesc
, assoc.yx
,
darma
, ggpairs
data(Movies) ggassoc_scatter(Movies, mapping = ggplot2::aes(x = Budget, y = Critics))
data(Movies) ggassoc_scatter(Movies, mapping = ggplot2::aes(x = Budget, y = Critics))
The data concerns a sample of 1000 Movies which were on screens in France and come of their characteristics.
data(Movies)
data(Movies)
A data frame with 1000 observations and the following 7 variables:
Budget
numeric vector of movie budgets
Genre
is a factor with 9 levels
Country
is a factor with 4 level. Country of origin of the movie.
ArtHouse
is a factor with levels No
, Yes
. Whether the movie had the "Art House" label.
Festival
is a factor with levels No
, Yes
. Whether the movie was selected in Cannes, Berlin or Venise film festivals.
Critics
numeric vector of average ratings from intellectual criticism.
BoxOffice
numeric vector of number of admissions.
data(Movies) str(Movies)
data(Movies) str(Movies)
Computes the odds ratio for every cells of the cross-tabulation between two categorical variables
or.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3)
or.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3)
x |
the first categorical variable |
y |
the second categorical variable |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
digits |
integer. The number of digits (default is 3). If NULL, the results are not rounded. |
A table with the odds ratios
Nicolas Robette
assoc.twocat
,assoc.catcont
, condesc
, catdesc
data(Movies) or.table(Movies$Country, Movies$ArtHouse)
data(Movies) or.table(Movies$Country, Movies$ArtHouse)
Computes the local and global Percentages of Maximum Deviation from Independence (pem) of a contingency table.
pem.table(x, y, weights = NULL, sort = FALSE, na.rm = FALSE, na.value = "NAs", digits = 1)
pem.table(x, y, weights = NULL, sort = FALSE, na.rm = FALSE, na.value = "NAs", digits = 1)
x |
the first categorical variable |
y |
the second categorical variable |
weights |
an optional numeric vector of weights (by default, a vector of 1 for uniform weights) |
sort |
logical. Whether rows and columns are sorted according to a correspondence analysis or not (default is FALSE). |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
digits |
integer. The number of digits (default is 1). If NULL, the results are not rounded. |
The Percentage of Maximum Deviation from Independence (pem) is an association measure for contingency tables and also provides attraction (resp. repulsion) measures in each cell of the crosstabulation (see Cibois, 1993). It is an alternative to khi2, Cramer's V coefficient, etc.
Returns a list:
peml |
Table with local percentages of maximum deviation from independence |
pemg |
Numeric value, i.e. the global percentage of maximum deviation from independence |
Nicolas Robette
Cibois P., 1993, Le pem, pourcentage de l'ecart maximum : un indice de liaison entre modalites d'un tableau de contingence, Bulletin de methodologie sociologique, n40, p.43-63. https://cibois.pagesperso-orange.fr/bms93.pdf
table
, chisq.test
, phi.table
, assocstats
data(Movies) pem.table(Movies$Country, Movies$ArtHouse)
data(Movies) pem.table(Movies$Country, Movies$ArtHouse)
Computes the phi coefficient for every cells of the cross-tabulation between two categorical variables
phi.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3)
phi.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3)
x |
the first categorical variable |
y |
the second categorical variable |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
digits |
integer. The number of digits (default is 3). If NULL, the results are not rounded. |
A table with the phi coefficients
Nicolas Robette
Rakotomalala R., 'Comprendre la taille d'effet (effect size)', http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf
assoc.twocat
,assoc.catcont
, condesc
, catdesc
data(Movies) phi.table(Movies$Country, Movies$ArtHouse)
data(Movies) phi.table(Movies$Country, Movies$ArtHouse)
Computes profiles (frequencies or percentages) for subgroups of observations defined by the levels of a categorical variable.
profiles(X, y, weights = NULL, stat = "cprop", mar = TRUE, digits = 1)
profiles(X, y, weights = NULL, stat = "cprop", mar = TRUE, digits = 1)
X |
data frame. The variables which are described in the profiles. There should be only factors. |
y |
factor. The categorical variable which defines subgroups of observations whose profiles will be computed. |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
stat |
character. Whether to compute frequencies ("freq"), percentages ("prop"), row percentages ("rprop") or column percentages ("cprop", default). |
mar |
logical, indicating whether to compute margins. Default is TRUE. |
digits |
numeric. Number of digits. Default is 1. |
A data frame with profiles in columns
Nicolas Robette
catdesc
, assoc.yx
, darma
, assoc.twocat
, assoc.twocat.by
data(Movies) profiles(Movies[,c(2,4,5)], Movies$Country)
data(Movies) profiles(Movies[,c(2,4,5)], Movies$Country)
Computes linear or binomial regressions in two steps : univariate regressions and a multivariate regressions. All the results are nicely displayed side by side with average marginal effects.
regtab(x, y, weights = NULL, continuous = "slopes", show.ci = TRUE, conf.level = 0.95)
regtab(x, y, weights = NULL, continuous = "slopes", show.ci = TRUE, conf.level = 0.95)
x |
data frame. The explanatory (i.e. independent) variables used in regressions. They can be numerical or factors. |
y |
vector. The outcome (i.e. dependent) variable. It can be numerical (linear regression) or a factor with 2 levels (binomial regression). |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
continuous |
character. The kind of average marginal effects computed for continuous explanatory variables. If "slopes" (defaults), these are average marginal slopes. If "predictions", these are average marginal predictions for a set of values. |
show.ci |
logical. Whether to display the confidence intervals |
conf.level |
numerical value. Defaults to 0.95, which corresponds to a 95 percent confidence interval. Must be strictly greater than 0 and less than 1. |
This function is basically a wrapper for regression functions in the gtsummary
function. It computes a series of univariate regressions (one for each explanatory variable), then a multivariate regression (with all explanatory variables) and displays the results side by side. These results are presented in the form of average marginal effects : average marginal predictions for categorical variables and average marginal slopes (or predictions) for continuous variables.
Besides, the function is compatible with the attribute labels assigned with labelled
package : these labels are displayed automatically.
an object of class tbl_merge
from gtsummary
package
Nicolas Robette
Arel-Bundock V, Greifer N, Heiss A (Forthcoming). “How to Interpret Statistical Models Using marginaleffects in R and Python.” Journal of Statistical Software.
Larmarange J., 2024, “Prédictions marginales, contrastes marginaux & effets marginaux”, in Guide-R, Guide pour l’analyse de données d’enquêtes avec R, https://larmarange.github.io/guide-R/analyses/estimations-marginales.html
cattab
, catdesc
, condesc
, assoc.yx
, darma
, assoc.twocat
, assoc.twocat.by
data(Movies) regtab(x = Movies[, c("Genre", "Budget", "Festival", "Critics")], y = Movies$BoxOffice)
data(Movies) regtab(x = Movies[, c("Genre", "Budget", "Festival", "Critics")], y = Movies$BoxOffice)
Computes statistics of a cross-tabulation using assoc.twocat
function.
stat_twocat(mapping = NULL, data = NULL, geom = "point", position = "identity", ..., show.legend = NA, inherit.aes = TRUE)
stat_twocat(mapping = NULL, data = NULL, geom = "point", position = "identity", ..., show.legend = NA, inherit.aes = TRUE)
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If |
geom |
Override the default connection with |
position |
Position adjustment, either as a string naming the adjustment (e.g. |
... |
Other arguments passed on to |
show.legend |
logical. Should this layer be included in the legends? |
inherit.aes |
If |
A ggplot2
plot with the added statistic.
Nicolas Robette
Computes standardized or adjusted residuals of a (possibly) weighted contingency table
stdres.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3, residuals = "std")
stdres.table(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs", digits = 3, residuals = "std")
x |
the first categorical variable |
y |
the second categorical variable |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
digits |
integer. The number of digits (default is 3). If NULL, the results are not rounded. |
residuals |
If "std" (default), standardized (i.e. Pearson) residuals are computed. If "adj", adjusted standardized residuals are computed. |
A table with the residuals
The adjusted standardized residuals are strictly equivalent to test-values for nominal variables as proposed by Lebart et al (1984).
Nicolas Robette
Agresti, A. (2007). An Introduction to Categorical Data Analysis, 2nd ed. New York: John Wiley & Sons.
Rakotomalala R., Comprendre la taille d'effet (effect size), http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf
Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.
assoc.twocat
,phi.table
, or.table
, pem.table
data(Movies) stdres.table(Movies$Country, Movies$ArtHouse)
data(Movies) stdres.table(Movies$Country, Movies$ArtHouse)
Computes the weighted correlation between two distributions. This can be Pearson, Spearman or Kendall correlation.
weighted.cor(x, y, weights = NULL, method = "pearson", na.rm = FALSE)
weighted.cor(x, y, weights = NULL, method = "pearson", na.rm = FALSE)
x |
numeric vector |
y |
numeric vector |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
method |
a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman". |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
a length-one numeric vector
Nicolas Robette
data(Movies) weighted.cor(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500)) weighted.cor(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500), method = "spearman")
data(Movies) weighted.cor(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500)) weighted.cor(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500), method = "spearman")
Computes a matrix of weighted correlations between the columns of x
and the columns of y
. This can be Pearson, Spearman or Kendall correlation.
weighted.cor2(x, y = NULL, weights = NULL, method = "pearson", na.rm = FALSE)
weighted.cor2(x, y = NULL, weights = NULL, method = "pearson", na.rm = FALSE)
x |
a data frame of numeric vectors |
y |
an optional data frame of numeric vectors. Default is NULL, which means that correlations between the columns of |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
method |
a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman". |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
a matrix of correlations
Nicolas Robette
data(Movies) weighted.cor2(Movies[,c("Budget", "Critics", "BoxOffice")], weights = rep(c(.8,1.2), 500))
data(Movies) weighted.cor2(Movies[,c("Budget", "Critics", "BoxOffice")], weights = rep(c(.8,1.2), 500))
Computes the weighted covariance between two distributions.
weighted.cov(x, y, weights = NULL, na.rm = FALSE)
weighted.cov(x, y, weights = NULL, na.rm = FALSE)
x |
numeric vector |
y |
numeric vector |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
a length-one numeric vector
Nicolas Robette
weighted.sd
, weighted.cor
, weighted.cov2
data(Movies) weighted.cov(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500))
data(Movies) weighted.cov(Movies$Critics, Movies$BoxOffice, weights = rep(c(.8,1.2), 500))
Computes a matrix of weighted covariances between the columns of x
and the columns of y
.
weighted.cov2(x, y = NULL, weights = NULL, na.rm = FALSE)
weighted.cov2(x, y = NULL, weights = NULL, na.rm = FALSE)
x |
a data frame of numeric vectors |
y |
an optional data frame of numeric vectors. Default is NULL, which means that covariances between the columns of |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
a matrix of covariances
Nicolas Robette
data(Movies) weighted.cov2(Movies[,c("Budget", "Critics", "BoxOffice")], weights = rep(c(.8,1.2), 500))
data(Movies) weighted.cov2(Movies[,c("Budget", "Critics", "BoxOffice")], weights = rep(c(.8,1.2), 500))
Computes Cramer's V measure of association between two (possibly weighted) categorical variables
weighted.cramer(x, y, weights = NULL, na.rm = FALSE)
weighted.cramer(x, y, weights = NULL, na.rm = FALSE)
x |
the first categorical variable |
y |
the second categorical variable |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. |
Numerical value with Cramer's V.
Nicolas Robette
Rakotomalala R., 'Comprendre la taille d'effet (effect size)', http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf
data(Movies) weighted.cramer(Movies$Country, Movies$ArtHouse)
data(Movies) weighted.cramer(Movies$Country, Movies$ArtHouse)
Computes the weighted median absolute deviation to median (aka MAD) of a distribution.
weighted.mad(x, weights = NULL, na.rm = FALSE)
weighted.mad(x, weights = NULL, na.rm = FALSE)
x |
numeric vector |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
a length-one numeric vector
Nicolas Robette
data(Movies) weighted.mad(Movies$Critics, weights = rep(c(.8,1.2), 500))
data(Movies) weighted.mad(Movies$Critics, weights = rep(c(.8,1.2), 500))
Computes the weighted quantiles of a distribution.
weighted.quantile(x, weights = NULL, probs = seq(0, 1, 0.25), na.rm = FALSE, names = FALSE)
weighted.quantile(x, weights = NULL, probs = seq(0, 1, 0.25), na.rm = FALSE, names = FALSE)
x |
numeric vector whose sample quantiles are wanted |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
probs |
numeric vector of probabilities with values in [0,1] |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
names |
logical. if TRUE, the result has a names attribute. Default is FALSE. |
A numeric vector of the same length as probs
argument.
This function is taken from https://stackoverflow.com/questions/2748725/is-there-a-weighted-median-function
data(Movies) weighted.quantile(Movies$Critics, weights = rep(c(.8,1.2), 500), names = TRUE)
data(Movies) weighted.quantile(Movies$Critics, weights = rep(c(.8,1.2), 500), names = TRUE)
Computes the weighted standard deviation of a distribution.
weighted.sd(x, weights = NULL, na.rm = FALSE)
weighted.sd(x, weights = NULL, na.rm = FALSE)
x |
numeric vector |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. Default is FALSE. |
a length-one numeric vector
Nicolas Robette
data(Movies) weighted.sd(Movies$Critics, weights = rep(c(.8,1.2), 500))
data(Movies) weighted.sd(Movies$Critics, weights = rep(c(.8,1.2), 500))
Computes a contingency table from one or two vectors, with the possibility of specifying weights.
weighted.table(x, y = NULL, weights = NULL, stat = "freq", mar = FALSE, na.rm = FALSE, na.value = "NAs", digits = 1)
weighted.table(x, y = NULL, weights = NULL, stat = "freq", mar = FALSE, na.rm = FALSE, na.value = "NAs", digits = 1)
x |
an object which can be interpreted as factor |
y |
an optional object which can be interpreted as factor |
weights |
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used. |
stat |
character. Whether to compute a contingency table ("freq", default), percentages ("prop"), row percentages ("rprop") or column percentages ("cprop"). |
mar |
logical, indicating whether to compute margins. Default is FALSE. |
na.rm |
logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument). |
na.value |
character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE. |
digits |
integer indicating the number of decimal places (default is 1) |
Returns a contingency table.
Nicolas Robette
data(Movies) weighted.table(Movies$Country, Movies$ArtHouse)
data(Movies) weighted.table(Movies$Country, Movies$ArtHouse)