| Title: | Collection of Correlation, Agreement, and Reliability Estimators |
|---|---|
| Description: | Compute correlation, association, agreement, and reliability measures for small to high-dimensional datasets through a consistent matrix-oriented interface. Supports classical correlations (Pearson, Spearman, Kendall, Chatterjee's rank correlation), distance correlation, partial correlation with regularised estimators, shrinkage correlation for p >= n settings, robust correlations including biweight mid-correlation, percentage-bend, Winsorized, and skipped correlation, latent-variable methods for binary and ordinal data, pairwise and overall intraclass correlation for wide data, repeated-measures correlation, and agreement/reliability analyses based on Cohen's kappa, weighted kappa, multi-rater kappa, Gwet's AC1/AC2, Krippendorff's alpha, Bland-Altman methods, Lin's concordance correlation coefficient, Poisson GLMM concordance for count data, and repeated-measures intraclass/concordance correlation. Implemented with optimized C++ backends using BLAS/OpenMP and memory-aware symmetric updates, and returns standard R objects with print/summary/plot methods plus optional Shiny viewers for matrix inspection. Methods based on Ledoit and Wolf (2004) <doi:10.1016/S0047-259X(03)00096-4>; high-dimensional shrinkage covariance estimation <doi:10.2202/1544-6115.1175>; Lin (1989) <doi:10.2307/2532051>; Wilcox (1994) <doi:10.1007/BF02294395>; Wilcox (2004) <doi:10.1080/0266476032000148821>; Hayes and Krippendorff (2007) <doi:10.1080/19312450709336664>; weighted repeated-measures correlation by Kondo et al. (2025) <doi:10.1002/sim.70046>. |
| Authors: | Thiago de Paula Oliveira [aut, cre]
|
| Maintainer: | Thiago de Paula Oliveira <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.12.2 |
| Built: | 2026-05-31 09:55:45 UTC |
| Source: | https://github.com/prof-thiagooliveira/matrixcorr |
Summary Accessor for Correlation Summaries
## S3 method for class 'summary.corr_result' x[[i, ...]]## S3 method for class 'summary.corr_result' x[[i, ...]]
x |
A |
i |
A column name or summary metadata key. |
... |
Unused. |
A summary column (if present) or summary metadata entry.
Summary Accessor for Correlation Summaries
## S3 method for class 'summary.corr_result' x$name## S3 method for class 'summary.corr_result' x$name
x |
A |
name |
A column name or summary metadata key. |
A summary column (if present) or summary metadata entry.
Edge-list data-frame view
## S3 method for class 'corr_edge_list' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'corr_edge_list' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
A |
row.names |
Ignored. |
optional |
Ignored. |
... |
Unused. |
A data frame with columns row, col, value.
Computes Bland-Altman mean difference and limits of agreement (LoA)
between two numeric measurement vectors, including t-based confidence
intervals for the mean difference and for each LoA using 'C++' backend.
If group2 is omitted and group1 is a numeric matrix or data frame with
at least two numeric columns, ba() computes all pairwise contrasts across
methods and returns a pairwise Bland-Altman matrix object.
Note: Lin's concordance correlation coefficient (CCC) is a complementary, single-number summary of agreement (precision + accuracy). It is useful for quick screening or reporting an overall CI, but may miss systematic or magnitude-dependent bias; consider reporting CCC alongside Bland-Altman.
ba( group1, group2, loa_multiplier = 1.96, mode = 1L, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE ) ## S3 method for class 'ba' print( x, digits = 3, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ba' summary(object, digits = 3, ci_digits = 3, ...) ## S3 method for class 'summary.ba' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ba' plot( x, title = "Bland-Altman Plot", subtitle = NULL, point_alpha = 0.7, point_size = 2.2, line_size = 0.8, shade_ci = TRUE, shade_alpha = 0.08, smoother = c("none", "loess", "lm"), symmetrize_y = TRUE, show_value = TRUE, ... ) ## S3 method for class 'ba_matrix' print( x, digits = 3, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, style = c("pairs", "matrices"), ... ) ## S3 method for class 'ba_matrix' summary(object, digits = 3, ci_digits = 3, ...) ## S3 method for class 'summary.ba_matrix' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ba_matrix' plot( x, pairs = NULL, against = NULL, facet_scales = c("free_y", "fixed"), title = "Bland-Altman (pairwise)", point_alpha = 0.6, point_size = 1.8, line_size = 0.7, shade_ci = TRUE, shade_alpha = 0.08, smoother = c("none", "loess", "lm"), show_value = TRUE, ... )ba( group1, group2, loa_multiplier = 1.96, mode = 1L, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE ) ## S3 method for class 'ba' print( x, digits = 3, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ba' summary(object, digits = 3, ci_digits = 3, ...) ## S3 method for class 'summary.ba' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ba' plot( x, title = "Bland-Altman Plot", subtitle = NULL, point_alpha = 0.7, point_size = 2.2, line_size = 0.8, shade_ci = TRUE, shade_alpha = 0.08, smoother = c("none", "loess", "lm"), symmetrize_y = TRUE, show_value = TRUE, ... ) ## S3 method for class 'ba_matrix' print( x, digits = 3, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, style = c("pairs", "matrices"), ... ) ## S3 method for class 'ba_matrix' summary(object, digits = 3, ci_digits = 3, ...) ## S3 method for class 'summary.ba_matrix' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ba_matrix' plot( x, pairs = NULL, against = NULL, facet_scales = c("free_y", "fixed"), title = "Bland-Altman (pairwise)", point_alpha = 0.6, point_size = 1.8, line_size = 0.7, shade_ci = TRUE, shade_alpha = 0.08, smoother = c("none", "loess", "lm"), show_value = TRUE, ... )
group1 |
Numeric vector of paired measurements, or a numeric matrix/data
frame with at least two numeric columns when |
group2 |
Optional numeric vector of paired measurements. If omitted,
|
loa_multiplier |
Positive scalar; the multiple of the standard deviation used to
define the LoA (default 1.96 for nominal 95\
intervals always use |
mode |
Integer; 1 uses |
conf_level |
Confidence level for CIs (default 0.95). |
n_threads |
Integer |
verbose |
Logical; if TRUE, prints how many OpenMP threads are used. |
x |
A |
digits |
Number of digits for estimates (default 3). |
ci_digits |
Number of digits for CI bounds (default 3). |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Passed to |
object |
A |
title |
Plot title. |
subtitle |
Optional subtitle. If NULL, shows n and LoA summary. |
point_alpha |
Point transparency. |
point_size |
Point size. |
line_size |
Line width for mean/LoA. |
shade_ci |
Logical; if TRUE, draw shaded CI bands instead of 6 dashed lines. |
shade_alpha |
Transparency of CI bands. |
smoother |
One of "none", "loess", "lm" to visualize proportional bias. |
symmetrize_y |
Logical; if TRUE, y-axis centered at mean difference with symmetric limits. |
show_value |
Logical; included for a consistent plotting interface. Bland-Altman plots do not overlay numeric cell values, so this argument currently has no effect. |
style |
Show the pairwise result as |
pairs |
Optional character vector of pair labels to display. |
against |
Optional single method name; if supplied, only contrasts involving that method are plotted. |
facet_scales |
Either |
Given paired measurements , Bland-Altman analysis uses
(or if mode = 2) and
. The mean difference estimates bias.
The limits of agreement (LoA) are , where
is the sample standard deviation of and
(argument loa_multiplier) is typically 1.96 for nominal 95% LoA.
When group2 is omitted and group1 is a wide numeric matrix or data
frame, the same two-method calculation is applied to every unordered column
pair. The returned pairwise matrices follow the requested mode: with
mode = 1, upper-triangle entries represent row minus column; with
mode = 2, upper-triangle entries represent column minus row.
Confidence intervals use Student's distribution with
degrees of freedom, with
Mean-difference CI given by ; and
LoA CI given by .
Assumptions include approximately normal differences and roughly constant
variability across the measurement range; if differences increase with
magnitude, consider a transformation before analysis. Missing values are
removed pairwise (rows with an NA in either input are dropped before
calling the C++ backend).
Probability of agreement, available through prob_agree, is a
tolerance-based companion to Bland-Altman analysis. Bland-Altman reports bias
and limits of agreement for paired differences. prob_agree() instead uses
the sampling distribution of estimated differences to quantify the
probability that two estimated quantities or curves agree within a
user-specified practical tolerance.
If group1 and group2 are both supplied, an object of class
"ba" (list) with elements:
means, diffs: numeric vectors
groups: data.frame used after NA removal
n_obs: integer, number of complete pairs used.
based.on: compatibility alias for n_obs.
lower.limit, mean.diffs, upper.limit
lines: named numeric vector (lower, mean, upper)
CI.lines: named numeric vector for CIs of those lines
loa_multiplier, critical.diff
If group2 is omitted and group1 is a wide numeric matrix or data frame,
an object of class "ba_matrix" with pairwise components:
bias: pairwise matrix of Bland-Altman mean differences.
sd_loa: pairwise matrix of SDs of differences.
loa_lower, loa_upper: pairwise matrices of LoA
endpoints.
width: pairwise matrix of LoA widths.
n: integer pairwise matrix of complete-case counts.
mean_ci_low, mean_ci_high: pairwise matrices of CI
bounds for the mean difference.
loa_lower_ci_low, loa_lower_ci_high: pairwise matrices
of CI bounds for the lower LoA.
loa_upper_ci_low, loa_upper_ci_high: pairwise matrices
of CI bounds for the upper LoA.
methods: character vector of analysed method names.
loa_multiplier, mode: calculation settings reused for
every pair.
Thiago de Paula Oliveira
Bland JM, Altman DG (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 307-310.
Bland JM, Altman DG (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), 135-160.
print.ba, plot.ba,
ccc, prob_agree, ccc_rm_ustat,
ccc_rm_reml
set.seed(1) x <- rnorm(100, 100, 10) y <- x + rnorm(100, 0, 8) fit_ba <- ba(x, y) print(fit_ba) estimate(fit_ba) tidy(fit_ba) confint(fit_ba) plot(fit_ba) # Pairwise Bland-Altman across 3 methods set.seed(7) wide3 <- data.frame( ref = rnorm(80, 100, 8), m2 = rnorm(80, 101, 8), m3 = rnorm(80, 99, 9) ) fit_ba3 <- ba(wide3) print(fit_ba3) summary(fit_ba3) tidy(fit_ba3) plot(fit_ba3)set.seed(1) x <- rnorm(100, 100, 10) y <- x + rnorm(100, 0, 8) fit_ba <- ba(x, y) print(fit_ba) estimate(fit_ba) tidy(fit_ba) confint(fit_ba) plot(fit_ba) # Pairwise Bland-Altman across 3 methods set.seed(7) wide3 <- data.frame( ref = rnorm(80, 100, 8), m2 = rnorm(80, 101, 8), m3 = rnorm(80, 99, 9) ) fit_ba3 <- ba(wide3) print(fit_ba3) summary(fit_ba3) tidy(fit_ba3) plot(fit_ba3)
Repeated-measures Bland-Altman (BA) analysis for method comparison based on a mixed-effects model fitted to subject-time matched paired differences. The fitted model includes a subject-specific random intercept and, optionally, an AR(1) residual correlation structure within subject.
The function accepts either exactly two methods or methods.
With exactly two methods it returns a single fitted BA object. With
methods it fits the same model to every unordered method pair and
returns pairwise matrices of results.
Required variables
response: numeric measurements.
subject: subject identifier.
method: method label with at least two distinct levels.
time: replicate/time key used to form within-subject pairs.
For any analysed pair of methods, only records where both methods are present
for the same subject and integer-coerced time contribute to the
fit. Rows with missing values in any required field are excluded for that
analysed pair.
ba_rm( data = NULL, response, subject, method, time, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE, loa_multiplier = 1.96, include_slope = FALSE, use_ar1 = FALSE, ar1_rho = NA_real_, max_iter = 200L, tol = 1e-06 ) ## S3 method for class 'ba_repeated' print( x, digits = 3, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ba_repeated_matrix' print( x, digits = 3, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, style = c("pairs", "matrices"), ... ) ## S3 method for class 'ba_repeated' plot( x, title = "Bland-Altman (repeated measurements)", subtitle = NULL, point_alpha = 0.7, point_size = 2.2, line_size = 0.8, shade_ci = TRUE, shade_alpha = 0.08, smoother = c("none", "loess", "lm"), symmetrize_y = TRUE, show_points = TRUE, show_value = TRUE, ... ) ## S3 method for class 'ba_repeated_matrix' plot( x, pairs = NULL, against = NULL, facet_scales = c("free_y", "fixed"), title = "Bland-Altman (repeated, pairwise)", point_alpha = 0.6, point_size = 1.8, line_size = 0.7, shade_ci = TRUE, shade_alpha = 0.08, smoother = c("none", "loess", "lm"), show_points = TRUE, show_value = TRUE, ... )ba_rm( data = NULL, response, subject, method, time, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE, loa_multiplier = 1.96, include_slope = FALSE, use_ar1 = FALSE, ar1_rho = NA_real_, max_iter = 200L, tol = 1e-06 ) ## S3 method for class 'ba_repeated' print( x, digits = 3, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ba_repeated_matrix' print( x, digits = 3, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, style = c("pairs", "matrices"), ... ) ## S3 method for class 'ba_repeated' plot( x, title = "Bland-Altman (repeated measurements)", subtitle = NULL, point_alpha = 0.7, point_size = 2.2, line_size = 0.8, shade_ci = TRUE, shade_alpha = 0.08, smoother = c("none", "loess", "lm"), symmetrize_y = TRUE, show_points = TRUE, show_value = TRUE, ... ) ## S3 method for class 'ba_repeated_matrix' plot( x, pairs = NULL, against = NULL, facet_scales = c("free_y", "fixed"), title = "Bland-Altman (repeated, pairwise)", point_alpha = 0.6, point_size = 1.8, line_size = 0.7, shade_ci = TRUE, shade_alpha = 0.08, smoother = c("none", "loess", "lm"), show_points = TRUE, show_value = TRUE, ... )
data |
Optional data frame-like object. If supplied, |
response |
Numeric response vector, or a single character string naming
the response column in |
subject |
Subject identifier vector (integer, numeric, or factor), or a
single character string naming the subject column in |
method |
Method label vector (character, factor, integer, or numeric), or
a single character string naming the method column in |
time |
Replicate/time index vector (integer or numeric), or a single
character string naming the time column in |
conf_level |
Confidence level for Wald confidence intervals for the
reported bias and both LoA endpoints. Must lie in |
n_threads |
Integer |
verbose |
Logical. If |
loa_multiplier |
Positive scalar giving the SD multiplier used to form
the limits of agreement. Default is |
include_slope |
Logical. If |
use_ar1 |
Logical. If |
ar1_rho |
Optional AR(1) parameter. Must satisfy |
max_iter |
Maximum number of EM/GLS iterations used by the backend. |
tol |
Convergence tolerance for the backend EM/GLS iterations. |
x |
A |
digits |
Number of digits for estimates (default 3). |
ci_digits |
Number of digits for CI bounds (default 3). |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional theme adjustments passed to |
style |
Show as pairs or matrix format? |
title |
Plot title (character scalar). Defaults to
|
subtitle |
Optional subtitle (character scalar). If |
point_alpha |
Numeric in |
point_size |
Positive numeric. Size of scatter points; passed to
|
line_size |
Positive numeric. Line width for horizontal bands
(bias and both LoA) and, when requested, the proportional-bias line.
Passed to |
shade_ci |
Logical. If |
shade_alpha |
Numeric in |
smoother |
One of |
symmetrize_y |
Logical (two-method plot only). If |
show_points |
Logical. If |
show_value |
Logical; included for a consistent plotting interface. Repeated-measures Bland-Altman plots do not overlay numeric cell values, so this argument currently has no effect. |
pairs |
(Faceted pairwise plot only.) Optional character vector of
labels specifying which method contrasts to display. Labels must match the
"row - column" convention used by |
against |
(Faceted pairwise plot only.) Optional single method name.
If supplied, facets are restricted to contrasts of the chosen method
against all others. Ignored when |
facet_scales |
(Faceted pairwise plot only.) Either |
For a selected pair of methods , the backend first forms complete
within-subject pairs at matched subject and integer-coerced
time. Let
where is the paired difference and is the paired
mean for subject at time/replicate . Only complete
subject-time matches contribute to that pairwise fit.
If multiple rows are present for the same subject-time-method
combination within an analysed pair, the backend keeps the last encountered
value for that combination when forming the pair. The function therefore
implicitly assumes at most one observation per subject-time-method
cell for each analysed contrast.
The fitted model for each analysed pair is
where if include_slope = TRUE and the term is omitted
otherwise; is a subject-specific
random intercept; and the within-subject residual vector satisfies
.
When use_ar1 = FALSE, . When use_ar1 = TRUE, the backend
works with the residual precision matrix over
contiguous time blocks within subject and uses as
the residual covariance.
Within each subject, paired observations are ordered by integer-coerced
time. AR(1) correlation is applied only over strictly contiguous runs
satisfying . Gaps break the run. Negative times, and
any isolated positions not belonging to a contiguous run, are treated as
independent singletons.
For a contiguous run of length and correlation parameter ,
the block precision matrix is
with a very small ridge added to the diagonal for numerical stability.
If use_ar1 = TRUE and ar1_rho is supplied, that value is used after
validation and clipping to the admissible numerical range handled by the
backend.
If use_ar1 = TRUE and ar1_rho = NA, the backend estimates rho
separately for each analysed pair by:
fitting the corresponding iid model;
computing a moments-based lag-1 estimate from detrended residuals within contiguous blocks, used only as a seed; and
refining that seed by a short profile search over rho using the
profiled REML log-likelihood.
In the exported ba_rm() wrapper, if an AR(1) fit for a given analysed pair
fails specifically because the backend EM/GLS routine did not converge to
admissible finite variance-component estimates, the wrapper retries that pair
with iid residuals. If the iid refit succeeds, the final reported residual
model for that pair is "iid" and a warning is issued. Other AR(1) failures
are not simplified and are propagated as errors.
When include_slope = TRUE, the paired mean regressor is centred and scaled
internally before fitting. Let be the mean of the observed paired
means. The backend chooses a scaling denominator from:
the sample SD;
the IQR-based scale ;
the MAD-based scale .
It uses the first of these that is not judged near-zero relative to the
largest finite positive candidate scale, under a threshold proportional to
. If all candidate scales are treated as
near-zero, the fit stops with an error because the proportional-bias slope is
not estimable on the observed paired-mean scale.
The returned beta_slope is back-transformed to the original paired-mean
scale. The returned BA centre is the fitted mean difference at the centred
reference paired mean , not the original-scale intercept
coefficient.
The backend uses a stabilised EM/GLS scheme.
Conditional on current variance components, the fixed effects are updated by GLS using the marginal precision of the paired differences after integrating out the random subject intercept. The resulting fixed-effect covariance used in the confidence-interval calculations is the GLS covariance
Given updated fixed effects, the variance components are refreshed by EM using the conditional moments of the subject random intercept and the residual quadratic forms. Variance updates are ratio-damped and clipped to admissible ranges for numerical stability.
The reported BA centre is always model-based.
When include_slope = FALSE, it is the fitted intercept of the paired-
difference mixed model.
When include_slope = TRUE, it is the fitted mean difference at the centred
reference paired mean used internally by the backend.
The reported limits of agreement are
where is the reported model-based BA centre. These LoA are for a
single new paired difference from a random subject under the fitted model.
Under the implemented parameterisation, AR(1) correlation affects the
off-diagonal within-subject covariance structure and therefore the estimation
of the model parameters and their uncertainty, but not the marginal variance
of a single paired difference. Consequently rho does not appear explicitly
in the LoA point-estimate formula.
The backend returns Wald confidence intervals for the reported BA centre and for both LoA endpoints.
These intervals combine:
the conditional GLS uncertainty in the fixed effects at the fitted covariance parameters; and
a delta-method propagation of covariance-parameter uncertainty from the observed information matrix of the profiled REML log-likelihood.
The covariance-parameter vector is profiled on transformed scales:
log-variances for and , and, when rho is
estimated internally under AR(1), a transformed correlation parameter
mapped back by .
Numerical central finite differences are used to approximate both the
observed Hessian of the profiled REML log-likelihood and the gradients of the
reported derived quantities. The resulting variances are combined and the
final intervals are formed with the normal quantile corresponding to
conf_level.
methodsWith exactly two methods, at least two complete subject-time pairs are required; otherwise the function errors.
With methods, the function analyses every unordered pair of method
levels. For a given pair with fewer than two complete subject-time matches,
that contrast is skipped and the corresponding matrix entries remain NA.
For a fitted contrast between methods in matrix positions with
, the stored orientation is:
Hence the transposed entry changes sign, while sd_loa and width are
symmetric.
Separate estimation of the residual and subject-level variance components requires sufficient complete within-subject replication after pairing. If the paired data are not adequate to separate these components, the fit stops with an identifiability error.
If the model is conceptually estimable but no finite positive pooled
within-subject variance can be formed during initialisation, the backend uses
only as a temporary positive starting value
for the EM routine and records a warning string in the backend output. The
exported wrapper does not otherwise modify the final estimates.
If the EM/GLS routine fails to reach admissible finite variance-component estimates, the backend throws an explicit convergence error rather than returning fallback estimates.
Either a "ba_repeated" object (exactly two methods) or a
"ba_repeated_matrix" object ( methods).
If exactly two methods are supplied, the returned
"ba_repeated" object is a list with components:
means: numeric vector of paired means
used for plotting helpers.
diffs: numeric vector of paired differences
used for plotting helpers.
n_obs: integer number of complete subject-time pairs used.
based.on: compatibility alias for n_obs.
mean.diffs: scalar model-based BA centre. When
include_slope = FALSE, this is the fitted intercept of the paired-
difference model. When include_slope = TRUE, this is the fitted mean
difference at the centred reference paired mean used internally by the
backend.
lower.limit, upper.limit: scalar limits of agreement,
computed as
.
lines: named numeric vector with entries lower, mean, and
upper.
CI.lines: named numeric vector containing Wald confidence
interval bounds for the bias and both LoA endpoints:
mean.diff.ci.lower, mean.diff.ci.upper,
lower.limit.ci.lower, lower.limit.ci.upper,
upper.limit.ci.lower, upper.limit.ci.upper.
loa_multiplier: scalar LoA multiplier actually used.
critical.diff: scalar LoA half-width
.
include_slope: logical, copied from the call.
beta_slope: proportional-bias slope on the original paired-mean
scale when include_slope = TRUE; otherwise NA.
sigma2_subject: estimated variance of the subject-level random
intercept on paired differences.
sigma2_resid: estimated residual variance on paired differences.
use_ar1: logical, copied from the call.
residual_model: either "ar1" or "iid", indicating the final
residual structure actually used.
ar1_rho: AR(1) correlation actually used in the final fit when
residual_model == "ar1"; otherwise NA.
ar1_estimated: logical indicating whether ar1_rho was
estimated internally (TRUE) or supplied by the user (FALSE) when the
final residual model is AR(1); otherwise NA.
The confidence level is stored as attr(x, "conf.level").
If methods are supplied, the returned
"ba_repeated_matrix" object is a list with components:
bias: numeric matrix of model-based BA centres.
For indices with , bias[j, k] estimates
. Thus the matrix orientation is
column minus row, not row minus column. The diagonal is NA.
sd_loa: numeric matrix of LoA SDs,
. This matrix is symmetric.
loa_lower, loa_upper: numeric matrices
of LoA endpoints corresponding to bias. These satisfy
and
.
width: numeric matrix of LoA widths,
loa_upper - loa_lower. This matrix is symmetric.
n: integer matrix giving the number of complete
subject-time pairs used for each analysed contrast. Pairs with fewer than
two complete matches are left as NA in the estimate matrices.
mean_ci_low, mean_ci_high: numeric
matrices of Wald confidence interval bounds for bias.
loa_lower_ci_low, loa_lower_ci_high: numeric
matrices of Wald confidence interval bounds for the lower
LoA.
loa_upper_ci_low, loa_upper_ci_high: numeric
matrices of Wald confidence interval bounds for the upper
LoA.
slope: optional numeric matrix of
proportional-bias slopes on the original paired-mean scale when
include_slope = TRUE; otherwise NULL. This matrix is antisymmetric in
sign because each fitted contrast is reversed across the transpose.
methods: character vector of method levels defining matrix row
and column order.
loa_multiplier: scalar LoA multiplier actually used.
conf_level: scalar confidence level used for the reported Wald
intervals.
use_ar1: logical, copied from the call.
ar1_rho: scalar equal to the user-supplied common ar1_rho
when use_ar1 = TRUE and a value was supplied; otherwise NA. This field
does not store the per-pair estimated AR(1) parameters.
residual_model: character matrix whose entries
are "ar1", "iid", or NA, indicating the final residual structure used
for each pair.
sigma2_subject: numeric matrix of estimated
subject-level random-intercept variances.
sigma2_resid: numeric matrix of estimated
residual variances.
ar1_rho_pair: optional numeric matrix giving
the AR(1) correlation actually used for each pair when the final residual
model is "ar1"; otherwise NA for that entry. Present only when
use_ar1 = TRUE.
ar1_estimated: optional logical matrix
indicating whether the pair-specific ar1_rho_pair was estimated internally
(TRUE) or supplied by the user (FALSE) for entries whose final residual
model is "ar1"; otherwise NA. Present only when use_ar1 = TRUE.
data_long: canonical long-form data frame with columns
.response, .subject, .method, and .time
retained for pairwise plotting helpers.
mapping: named list mapping response, subject,
method, and time to the canonical stored columns in
data_long.
Thiago de Paula Oliveira
# -------- Simulate repeated-measures data -------- set.seed(1) # design (no AR) # subjects S <- 30L # replicates per subject Tm <- 15L subj <- rep(seq_len(S), each = Tm) time <- rep(seq_len(Tm), times = S) # subject signal centered at 0 so BA "bias" won't be driven by the mean level mu_s <- rnorm(S, mean = 0, sd = 8) # constant within subject across replicates true <- mu_s[subj] # common noise (no AR, i.i.d.) sd_e <- 2 e0 <- rnorm(length(true), 0, sd_e) # --- Methods --- # M1: signal + noise y1 <- true + e0 # M2: same precision as M1; here identical so M3 can be # almost perfectly the inverse of both M1 and M2 y2 <- y1 + rnorm(length(true), 0, 0.01) # M3: perfect inverse of M1 and M2 y3 <- -y1 # = -(true + e0) # M4: unrelated to all others (pure noise, different scale) y4 <- rnorm(length(true), 3, 6) data <- rbind( data.frame(y = y1, subject = subj, method = "M1", time = time), data.frame(y = y2, subject = subj, method = "M2", time = time), data.frame(y = y3, subject = subj, method = "M3", time = time), data.frame(y = y4, subject = subj, method = "M4", time = time) ) data$method <- factor(data$method, levels = c("M1","M2","M3","M4")) # quick sanity checks with(data, { Y <- split(y, method) round(cor(cbind(M1 = Y$M1, M2 = Y$M2, M3 = Y$M3, M4 = Y$M4)), 3) }) # Run BA (no AR) ba4 <- ba_rm( data = data, response = "y", subject = "subject", method = "method", time = "time", loa_multiplier = 1.96, conf_level = 0.95, include_slope = FALSE, use_ar1 = FALSE ) summary(ba4) estimate(ba4) tidy(ba4) confint(ba4) plot(ba4) # -------- Simulate repeated-measures with AR(1) data -------- set.seed(123) S <- 40L # subjects Tm <- 50L # replicates per subject methods <- c("A","B","C") # N = 3 methods rho <- 0.4 # AR(1) within-subject across time ar1_sim <- function(n, rho, sd = 1) { z <- rnorm(n) e <- numeric(n) e[1] <- z[1] * sd if (n > 1) for (t in 2:n) e[t] <- rho * e[t-1] + sqrt(1 - rho^2) * z[t] * sd e } # Subject baseline + time trend (latent "true" signal) subj <- rep(seq_len(S), each = Tm) time <- rep(seq_len(Tm), times = S) # subject effects mu_s <- rnorm(S, 50, 7) trend <- rep(seq_len(Tm) - mean(seq_len(Tm)), times = S) * 0.8 true <- mu_s[subj] + trend # Method-specific biases (B has +1.5 constant; C has slight proportional bias) bias <- c(A = 0, B = 1.5, C = -0.5) # proportional component on "true" prop <- c(A = 0.00, B = 0.00, C = 0.10) # Build long data: for each method, add AR(1) noise within subject over time make_method <- function(meth, sd = 3) { e <- unlist(lapply(split(seq_along(time), subj), function(ix) ar1_sim(length(ix), rho, sd))) y <- true * (1 + prop[meth]) + bias[meth] + e data.frame(y = y, subject = subj, method = meth, time = time, check.names = FALSE) } data <- do.call(rbind, lapply(methods, make_method)) data$method <- factor(data$method, levels = methods) # -------- Repeated BA (pairwise matrix) --------------------- baN <- ba_rm( response = data$y, subject = data$subject, method = data$method, time = data$time, loa_multiplier = 1.96, conf_level = 0.95, include_slope = FALSE, # estimate proportional bias per pair use_ar1 = TRUE, ar1_rho = rho ) # Matrices (row - column orientation) print(baN) summary(baN) tidy(baN) # Faceted BA scatter by pair plot(baN, smoother = "lm", facet_scales = "free_y") # -------- Two-method AR(1) path (A vs B only) ------------------------------ data_AB <- subset(data, method %in% c("A","B")) baAB <- ba_rm( response = data_AB$y, subject = data_AB$subject, method = droplevels(data_AB$method), time = data_AB$time, include_slope = FALSE, use_ar1 = TRUE, ar1_rho = 0.4 ) print(baAB) plot(baAB)# -------- Simulate repeated-measures data -------- set.seed(1) # design (no AR) # subjects S <- 30L # replicates per subject Tm <- 15L subj <- rep(seq_len(S), each = Tm) time <- rep(seq_len(Tm), times = S) # subject signal centered at 0 so BA "bias" won't be driven by the mean level mu_s <- rnorm(S, mean = 0, sd = 8) # constant within subject across replicates true <- mu_s[subj] # common noise (no AR, i.i.d.) sd_e <- 2 e0 <- rnorm(length(true), 0, sd_e) # --- Methods --- # M1: signal + noise y1 <- true + e0 # M2: same precision as M1; here identical so M3 can be # almost perfectly the inverse of both M1 and M2 y2 <- y1 + rnorm(length(true), 0, 0.01) # M3: perfect inverse of M1 and M2 y3 <- -y1 # = -(true + e0) # M4: unrelated to all others (pure noise, different scale) y4 <- rnorm(length(true), 3, 6) data <- rbind( data.frame(y = y1, subject = subj, method = "M1", time = time), data.frame(y = y2, subject = subj, method = "M2", time = time), data.frame(y = y3, subject = subj, method = "M3", time = time), data.frame(y = y4, subject = subj, method = "M4", time = time) ) data$method <- factor(data$method, levels = c("M1","M2","M3","M4")) # quick sanity checks with(data, { Y <- split(y, method) round(cor(cbind(M1 = Y$M1, M2 = Y$M2, M3 = Y$M3, M4 = Y$M4)), 3) }) # Run BA (no AR) ba4 <- ba_rm( data = data, response = "y", subject = "subject", method = "method", time = "time", loa_multiplier = 1.96, conf_level = 0.95, include_slope = FALSE, use_ar1 = FALSE ) summary(ba4) estimate(ba4) tidy(ba4) confint(ba4) plot(ba4) # -------- Simulate repeated-measures with AR(1) data -------- set.seed(123) S <- 40L # subjects Tm <- 50L # replicates per subject methods <- c("A","B","C") # N = 3 methods rho <- 0.4 # AR(1) within-subject across time ar1_sim <- function(n, rho, sd = 1) { z <- rnorm(n) e <- numeric(n) e[1] <- z[1] * sd if (n > 1) for (t in 2:n) e[t] <- rho * e[t-1] + sqrt(1 - rho^2) * z[t] * sd e } # Subject baseline + time trend (latent "true" signal) subj <- rep(seq_len(S), each = Tm) time <- rep(seq_len(Tm), times = S) # subject effects mu_s <- rnorm(S, 50, 7) trend <- rep(seq_len(Tm) - mean(seq_len(Tm)), times = S) * 0.8 true <- mu_s[subj] + trend # Method-specific biases (B has +1.5 constant; C has slight proportional bias) bias <- c(A = 0, B = 1.5, C = -0.5) # proportional component on "true" prop <- c(A = 0.00, B = 0.00, C = 0.10) # Build long data: for each method, add AR(1) noise within subject over time make_method <- function(meth, sd = 3) { e <- unlist(lapply(split(seq_along(time), subj), function(ix) ar1_sim(length(ix), rho, sd))) y <- true * (1 + prop[meth]) + bias[meth] + e data.frame(y = y, subject = subj, method = meth, time = time, check.names = FALSE) } data <- do.call(rbind, lapply(methods, make_method)) data$method <- factor(data$method, levels = methods) # -------- Repeated BA (pairwise matrix) --------------------- baN <- ba_rm( response = data$y, subject = data$subject, method = data$method, time = data$time, loa_multiplier = 1.96, conf_level = 0.95, include_slope = FALSE, # estimate proportional bias per pair use_ar1 = TRUE, ar1_rho = rho ) # Matrices (row - column orientation) print(baN) summary(baN) tidy(baN) # Faceted BA scatter by pair plot(baN, smoother = "lm", facet_scales = "free_y") # -------- Two-method AR(1) path (A vs B only) ------------------------------ data_AB <- subset(data, method %in% c("A","B")) baAB <- ba_rm( response = data_AB$y, subject = data_AB$subject, method = droplevels(data_AB$method), time = data_AB$time, include_slope = FALSE, use_ar1 = TRUE, ar1_rho = 0.4 ) print(baAB) plot(baAB)
Computes pairwise biweight mid-correlations for numeric data. Bicor is a robust, Pearson-like correlation that down-weights outliers and heavy-tailed observations. Optional large-sample confidence intervals are available as a derived feature.
bicor( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, c_const = 9, max_p_outliers = 1, pearson_fallback = c("hybrid", "none", "all"), mad_consistent = FALSE, w = NULL, sparse_threshold = NULL ) diag.bicor(x, ...) ## S3 method for class 'bicor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, na_print = "NA", ... ) ## S3 method for class 'bicor' plot( x, title = "Biweight mid-correlation heatmap", reorder = c("none", "hclust"), triangle = c("full", "lower", "upper"), low_color = "indianred1", mid_color = "white", high_color = "steelblue1", value_text_size = 3, ci_text_size = 2.5, show_value = TRUE, na_fill = "grey90", ... ) ## S3 method for class 'bicor' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.bicor' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )bicor( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, c_const = 9, max_p_outliers = 1, pearson_fallback = c("hybrid", "none", "all"), mad_consistent = FALSE, w = NULL, sparse_threshold = NULL ) diag.bicor(x, ...) ## S3 method for class 'bicor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, na_print = "NA", ... ) ## S3 method for class 'bicor' plot( x, title = "Biweight mid-correlation heatmap", reorder = c("none", "hclust"), triangle = c("full", "lower", "upper"), low_color = "indianred1", mid_color = "white", high_color = "steelblue1", value_text_size = 3, ci_text_size = 2.5, show_value = TRUE, na_fill = "grey90", ... ) ## S3 method for class 'bicor' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.bicor' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric matrix or a data frame containing numeric columns.
Factors, logicals and common time classes are dropped in the data-frame
path. Missing values are not allowed unless |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
conf_level |
Confidence level used when |
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
c_const |
Positive numeric. Tukey biweight tuning constant applied to the
raw MAD; default |
max_p_outliers |
Numeric in |
pearson_fallback |
Character scalar indicating the fallback policy. One of:
|
mad_consistent |
Logical; if |
w |
Optional non-negative numeric vector of length |
sparse_threshold |
Optional numeric |
x |
An object of class |
... |
Additional arguments passed to |
digits |
Integer; number of decimal places used for the matrix. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for bicor confidence limits in the pairwise summary. |
show_ci |
One of |
na_print |
Character; how to display missing values. |
title |
Plot title. Default is |
reorder |
Character; one of |
triangle |
One of |
low_color, mid_color, high_color
|
Colours for the gradient in
|
value_text_size |
Numeric; font size for cell labels. Set to |
ci_text_size |
Text size for confidence-interval labels in the heatmap. |
show_value |
Logical; if |
na_fill |
Fill colour for |
object |
An object of class |
p_digits |
Integer; digits for bicor p-values in the pairwise summary. |
For a column , let be the median and
the (raw) median
absolute deviation. If mad_consistent = TRUE, the consistent scale
is used. With tuning constant
, define
The Tukey biweight gives per-observation weights
Robust standardisation of a column is
For two columns , the biweight mid-correlation is
Capping the maximum proportion of outliers (max_p_outliers).
If max_p_outliers < 1, let and
be the lower/upper quantiles of .
If the corresponding at either quantile exceeds 1, is rescaled
separately on the negative and positive sides so that those quantiles land at
. This guarantees that all observations between the two quantiles receive
positive weight. Note the bound applies per side, so up to
of observations can be treated as outliers overall.
Fallback when for zero MAD / degeneracy (pearson_fallback).
If a column has or the robust denominator becomes zero,
the following rules apply:
"none" when correlations involving that column are NA (diagonal
remains 1).
"hybrid" when only the affected column switches to Pearson standardisation
,
yielding the hybrid correlation
with the other column still robust-standardised.
"all" when all columns use ordinary Pearson standardisation; the result
equals stats::cor(..., method="pearson") when the NA policy matches.
Handling missing values (na_method).
"error" (default): inputs must be finite; this yields a symmetric,
positive semidefinite (PSD) matrix since .
"pairwise": each is computed on the intersection of
rows where both columns are finite. Pairs with fewer than 5 overlapping
rows return NA (guarding against instability). Pairwise deletion can
break PSD, as in the Pearson case.
Row weights (w).
When w is supplied (non-negative, length ), the weighted median
and weighted MAD
are used to form
. The Tukey weights are then multiplied by the observation weights prior
to normalisation:
where are the user-supplied row weights and
are the Tukey biweights built from the weighted median/MAD. Weighted pairwise
behaves analogously on each column pair's overlap.
MAD choice (mad_consistent).
Setting mad_consistent = TRUE multiplies the raw MAD by 1.4826 inside
. Equivalently, it uses an effective tuning constant
. The default FALSE reproduces the convention
in Langfelder & Horvath (2012).
Optional sparsification (sparse_threshold).
If provided, entries with are set to 0 and the
result is returned as a "ddiMatrix" (diagonal is forced to 1). This is a
post-processing step that does not alter the per-pair estimates.
Computation and threads.
Columns are robust-standardised in parallel and the matrix is formed as
. n_threads selects the number of OpenMP
threads; by default it uses getOption("matrixCorr.threads", 1L).
Large-sample inference.
For a pairwise estimate computed from observed rows, the
standard large-sample summaries use
and
The reported p-value is the two-sided Student- tail probability with
degrees of freedom. When ci = TRUE, the package also
reports an approximate Fisher-z confidence interval obtained from
followed by back-transformation with tanh(). Confidence intervals are currently available only for dense,
unweighted outputs.
Basic properties.
.
With no missing data (and with per-column hybrid/robust standardisation), the
output is symmetric and PSD. As with Pearson, affine equivariance does not hold
for the associated biweight midcovariance.
A symmetric correlation matrix with class bicor
(or a dgCMatrix if sparse_threshold is used), with attributes:
method = "biweight_mid_correlation", description,
and package = "matrixCorr". Downstream code should be prepared to
handle either a dense numeric matrix or a sparse dgCMatrix. When
ci = TRUE, the object also carries a ci attribute with
elements est, lwr.ci, upr.ci, conf.level, and
ci.method, together with an inference attribute containing
the standard large-sample summary matrices estimate,
statistic, p_value, Z, and n_obs. Pairwise
complete-case counts are stored in attr(x, "diagnostics")$n_complete.
Internally, all medians/MADs, Tukey weights, optional pairwise-NA handling,
and OpenMP loops are implemented in the C++ helpers
(bicor_*_cpp()), so the R wrapper mostly validates arguments and
dispatches to the appropriate backend.
Invisibly returns x.
A ggplot object.
Thiago de Paula Oliveira
Langfelder, P. & Horvath, S. (2012). Fast R Functions for Robust Correlations and Hierarchical Clustering. Journal of Statistical Software, 46(11), 1-17. doi:10.18637/jss.v046.i11
set.seed(1) X <- matrix(rnorm(2000 * 40), 2000, 40) R <- bicor(X, c_const = 9, max_p_outliers = 1, pearson_fallback = "hybrid") print(attr(R, "method")) summary(R) R_ci <- bicor(X[, 1:5], ci = TRUE) summary(R_ci) estimate(R_ci) tidy(R_ci) ci(R_ci) confint(R_ci) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(R) }set.seed(1) X <- matrix(rnorm(2000 * 40), 2000, 40) R <- bicor(X, c_const = 9, max_p_outliers = 1, pearson_fallback = "hybrid") print(attr(R, "method")) summary(R) R_ci <- bicor(X[, 1:5], ci = TRUE) summary(R_ci) estimate(R_ci) tidy(R_ci) ci(R_ci) confint(R_ci) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(R) }
Computes biserial correlations between continuous variables in data
and binary variables in y. Both pairwise vector mode and rectangular
matrix/data-frame mode are supported.
biserial(data, y, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE, conf_level = 0.95, ...) ## S3 method for class 'biserial_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'biserial_corr' plot( x, title = "Biserial correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'biserial_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.biserial_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )biserial(data, y, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE, conf_level = 0.95, ...) ## S3 method for class 'biserial_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'biserial_corr' plot( x, title = "Biserial correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'biserial_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.biserial_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric vector, matrix, or data frame containing continuous variables. |
y |
A binary vector, matrix, or data frame. In data-frame mode, only two-level columns are retained. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
p_value |
Logical (default |
conf_level |
Confidence level used when |
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum correlation. |
high_color |
Color for the maximum correlation. |
mid_color |
Color for zero correlation. |
value_text_size |
Font size used in tile labels. |
ci_text_size |
Text size for confidence intervals in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
ci_digits |
Integer; digits for biserial confidence limits in the pairwise summary. |
p_digits |
Integer; digits for biserial p-values in the pairwise summary. |
The biserial correlation is the special two-category case of the polyserial
model. It assumes that a binary variable arises by thresholding an
unobserved standard-normal variable that is jointly normal with a
continuous variable . Writing and
, let and be the
standard-normal density evaluated at . If and
denote the sample means of in the two observed groups
and is the sample standard deviation of , the usual
biserial estimator is
This is exactly the estimator implemented in the underlying C++ kernel.
Assumptions. The biserial coefficient is appropriate when the observed binary variable is viewed as a thresholded version of an unobserved continuous latent variable that is jointly normal with the observed continuous variable. The optional p-values and confidence intervals adopt this latent-normal interpretation together with the usual large-sample approximations used for correlation coefficients. These inferential quantities are therefore model-based and should not be interpreted as distribution-free summaries.
Inference. When p_value = TRUE, the package reports the
large-sample -statistic
referenced to a Student -distribution with degrees of
freedom. When ci = TRUE, the package forms an approximate Fisher
-interval by transforming with
, using standard error
, and mapping the limits back with
. The CI is therefore an internal large-sample
extension and is only computed when explicitly requested.
In vector mode a single biserial correlation is returned. In
matrix/data-frame mode, every numeric column of data is paired with every
binary column of y, producing a rectangular matrix of
continuous-by-binary biserial correlations.
Unlike the point-biserial correlation, which is just Pearson correlation on a 0/1 coding of the binary variable, the biserial coefficient explicitly assumes an underlying latent normal threshold model and rescales the mean difference accordingly.
Computational complexity. If data has continuous
columns and y has binary columns, the matrix path computes
closed-form estimates with negligible extra memory beyond the
output matrix.
If both data and y are vectors, a numeric scalar. Otherwise a
numeric matrix of class biserial_corr with rows corresponding to
the continuous variables in data and columns to the binary variables
in y. Matrix outputs carry attributes method,
description, and package = "matrixCorr". When
p_value = TRUE, the object also carries an inference
attribute with matrices estimate, statistic,
parameter, p_value, and n_obs. When ci = TRUE,
it additionally carries a ci attribute with matrices
lwr.ci and upr.ci, plus attr(x, "conf.level"). Scalar
outputs keep the same point estimate and gain the same metadata only when
inference is requested.
Thiago de Paula Oliveira
Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3), 337-347.
Fisher, R. A. (1921). On the probable error of a coefficient of correlation deduced from a small sample. Metron, 1, 3-32.
set.seed(126) n <- 1000 Sigma <- matrix(c( 1.00, 0.35, 0.50, 0.25, 0.35, 1.00, 0.30, 0.55, 0.50, 0.30, 1.00, 0.40, 0.25, 0.55, 0.40, 1.00 ), 4, 4, byrow = TRUE) Z <- mnormt::rmnorm(n = n, mean = rep(0, 4), varcov = Sigma) X <- data.frame(x1 = Z[, 1], x2 = Z[, 2]) Y <- data.frame( g1 = Z[, 3] > stats::qnorm(0.65), g2 = Z[, 4] > stats::qnorm(0.55) ) bs <- biserial(X, Y, ci = TRUE, p_value = TRUE) print(bs, digits = 3) summary(bs) estimate(bs) tidy(bs) ci(bs) confint(bs) plot(bs)set.seed(126) n <- 1000 Sigma <- matrix(c( 1.00, 0.35, 0.50, 0.25, 0.35, 1.00, 0.30, 0.55, 0.50, 0.30, 1.00, 0.40, 0.25, 0.55, 0.40, 1.00 ), 4, 4, byrow = TRUE) Z <- mnormt::rmnorm(n = n, mean = rep(0, 4), varcov = Sigma) X <- data.frame(x1 = Z[, 1], x2 = Z[, 2]) Y <- data.frame( g1 = Z[, 3] > stats::qnorm(0.65), g2 = Z[, 4] > stats::qnorm(0.55) ) bs <- biserial(X, Y, ci = TRUE, p_value = TRUE) print(bs, digits = 3) summary(bs) estimate(bs) tidy(bs) ci(bs) confint(bs) plot(bs)
Computes all pairwise Lin's Concordance Correlation Coefficients (CCC) from the numeric columns of a matrix or data frame. CCC measures both precision (Pearson correlation) and accuracy (closeness to the 45-degree line). This function is backed by a high-performance 'C++' implementation.
Lin's CCC quantifies the concordance between a new test/measurement
and a gold-standard for the same variable. Like a correlation, CCC
ranges from -1 to 1 with perfect agreement at 1, and it cannot exceed the
absolute value of the Pearson correlation between variables. It can be
legitimately computed even with small samples (e.g., 10 observations),
and results are often similar to intraclass correlation coefficients.
CCC provides a single summary of agreement, but it may not capture
systematic bias; a Bland-Altman plot (differences vs. means) is recommended
to visualize bias, proportional trends, and heteroscedasticity (see
ba).
ccc( data, ci = FALSE, conf_level = 0.95, na_method = c("error", "complete", "pairwise"), n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, verbose = FALSE ) ## S3 method for class 'ccc' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ccc' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.ccc' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ccc' plot( x, title = "Lin's Concordance Correlation Heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )ccc( data, ci = FALSE, conf_level = 0.95, na_method = c("error", "complete", "pairwise"), n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, verbose = FALSE ) ## S3 method for class 'ccc' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ccc' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.ccc' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'ccc' plot( x, title = "Lin's Concordance Correlation Heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )
data |
A numeric matrix or data frame with at least two numeric columns. Non-numeric columns will be ignored. |
ci |
Logical; if TRUE, return lower and upper confidence bounds |
conf_level |
Confidence level for CI, default = 0.95 |
na_method |
Character scalar controlling missing-data handling.
|
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
verbose |
Logical; if TRUE, prints how many threads are used |
x |
An object of class |
digits |
Integer; decimals for CCC estimates (default 4). |
ci_digits |
Integer; decimals for CI bounds (default 2). |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Passed to |
object |
A |
title |
Title for the plot. |
low_color |
Color for low CCC values. |
high_color |
Color for high CCC values. |
mid_color |
Color for mid CCC values. |
value_text_size |
Text size for CCC values in the heatmap. |
ci_text_size |
Text size for confidence intervals. |
show_value |
Logical; if |
Lin's CCC is defined as
where are the means, the
variances, and the covariance. Equivalently,
Hence , iff
and , and iff, in
addition, . CCC is symmetric in and penalises both
location and scale differences; unlike Pearson's , it is not invariant
to affine transformations that change means or variances.
When ci = TRUE, large-sample confidence intervals for
are returned for each pair. The implementation uses Lin's
delta-method standard error and then forms limits on a Fisher-z transformed
CCC scale before mapping back to . For speed, CIs are omitted
when ci = FALSE.
If either variable has zero variance, is
undefined and NA is returned for that pair (including the diagonal).
Missing-data handling follows the same contract as pearson_corr
and spearman_rho. With na_method = "error" (default),
missing and non-finite values are rejected. With "complete", rows
incomplete in any retained numeric column are removed once before all pairwise
CCC estimates are computed. With "pairwise", each method pair is
computed on its own complete finite overlap, and n_complete
diagnostics may vary by pair. Pairwise CIs require at least three complete
observations for that pair.
Negative CCC estimates are fully supported. Matrix output preserves the
signed estimate, while thresholded "sparse" and "edge_list"
outputs retain entries according to abs(CCC) >= threshold.
Probability of agreement, available through prob_agree,
answers a
different question from CCC. CCC is a coefficient-based summary of
concordance between paired measurements. prob_agree() follows Stevens and
Anderson-Cook (2017) and estimates the probability that two estimated
quantities or curves differ by no more than a user-specified practical
tolerance.
A symmetric numeric matrix with class "ccc" and attributes:
method: The method used ("Lin's concordance")
description: Description string
If ci = FALSE, returns matrix of class "ccc".
If ci = TRUE, returns a list with elements: est,
lwr.ci, upr.ci.
For summary.ccc, a data frame with columns
item1, item2, estimate, and (optionally)
lwr, upr, plus n_complete when available.
Thiago de Paula Oliveira
Lin L (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics 45: 255-268.
Lin L (2000). A note on the concordance correlation coefficient. Biometrics 56: 324-325.
Bland J, Altman D (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet 327: 307-310.
print.ccc, plot.ccc,
ba, prob_agree, and cia.
ccc() answers the question "How well do two paired measurements
agree overall, accounting for both correlation and mean/scale bias?".
In contrast, cia() answers "Are two or more methods
interchangeable at the individual level relative to within-method
replicate disagreement?".
For repeated measurements look at ccc_rm_reml,
ccc_rm_ustat or ba_rm
# Example with multivariate normal data Sigma <- matrix(c(1, 0.5, 0.3, 0.5, 1, 0.4, 0.3, 0.4, 1), nrow = 3) mu <- c(0, 0, 0) set.seed(123) mat_mvn <- MASS::mvrnorm(n = 100, mu = mu, Sigma = Sigma) result_mvn <- ccc(mat_mvn, ci = TRUE) print(result_mvn) summary(result_mvn) estimate(result_mvn) tidy(result_mvn) ci(result_mvn) confint(result_mvn) plot(result_mvn) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(result_mvn) }# Example with multivariate normal data Sigma <- matrix(c(1, 0.5, 0.3, 0.5, 1, 0.4, 0.3, 0.4, 1), nrow = 3) mu <- c(0, 0, 0) set.seed(123) mat_mvn <- MASS::mvrnorm(n = 100, mu = mu, Sigma = Sigma) result_mvn <- ccc(mat_mvn, ci = TRUE) print(result_mvn) summary(result_mvn) estimate(result_mvn) tidy(result_mvn) ci(result_mvn) confint(result_mvn) plot(result_mvn) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(result_mvn) }
Computes pairwise generalized concordance correlation coefficients (CCC) for count outcomes measured repeatedly by two or more methods, observers, or devices. The current implementation fits Poisson-log generalized linear mixed models and reports total, inter-method, and intra-method agreement summaries from the fitted mean and variance components.
ccc_glmm( data, response, subject, method, replicate = NULL, family = "poisson", link = "log", overdispersion = c("none", "pearson"), include_subject_method = FALSE, ci = FALSE, conf_level = 0.95, max_iter = 1000, tol = 1e-08, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE )ccc_glmm( data, response, subject, method, replicate = NULL, family = "poisson", link = "log", overdispersion = c("none", "pearson"), include_subject_method = FALSE, ci = FALSE, conf_level = 0.95, max_iter = 1000, tol = 1e-08, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE )
data |
A data frame containing the measurements. |
response |
Character. Name of the non-negative integer-like count column. |
subject |
Character. Name of the subject identifier column. |
method |
Character. Name of the method/observer column. |
replicate |
Optional character. Name of the replicate column. If
|
family |
Character. Currently only |
link |
Character. Currently only |
overdispersion |
Character. |
include_subject_method |
Logical. If |
ci |
Logical. If |
conf_level |
Confidence level for delta-method confidence intervals. |
max_iter |
Positive integer. Maximum optimiser iterations. |
tol |
Positive numeric convergence tolerance. |
n_threads |
Integer |
verbose |
Logical. If |
The fitted model for a pair of methods is
where indexes subjects, indexes methods, and
indexes replicate readings. The subject effect captures
between-subject heterogeneity. When include_subject_method = TRUE,
captures subject-specific method departures; otherwise it
is fixed at zero. The fixed method difference contributes to disagreement
through sigma2_method.
The model is fitted by marginal maximum likelihood using Gauss-Hermite quadrature. The random-intercept model uses 40 quadrature points. The subject-by-method model uses tensor-product quadrature with fewer points per dimension because it integrates over a three-dimensional subject block. The reported CCC quantities are then computed from the fitted Poisson-log mean and variance components.
The main matrix value, rho_ccc, is the total agreement coefficient.
Use it as the primary overall count-agreement summary when each individual
reading is the unit of inference. It penalizes lack of between-subject
signal, systematic method bias, subject-by-method disagreement, and Poisson
residual variation.
rho_ccc_inter is the inter-method agreement coefficient for averages
over replicated readings. It is useful when decisions are based on the mean
of replicate readings per subject-method cell. Replication reduces
only the residual count variation term; it does not dilute systematic method
disagreement or between-subject variation.
rho_ccc_intra_method1 and rho_ccc_intra_method2 are
method-specific repeatability coefficients. Use them to diagnose whether one
method is internally more repeatable than the other on the count scale. These
are not direct method-comparison coefficients; they describe within-method
reproducibility after accounting for the fitted mean and random effects.
precision isolates the share of non-systematic variation attributable
to subject ranking/heterogeneity, while accuracy is the ratio
rho_ccc / precision. Low accuracy with reasonable precision indicates
that disagreement is driven mainly by method bias or extra method-specific
variation rather than poor subject discrimination.
overdispersion = "none" fixes , the Poisson model.
overdispersion = "pearson" replaces the residual term by a Pearson
dispersion estimate. Use the Pearson adjustment as a sensitivity analysis
when counts appear more variable than the Poisson model allows; it should
generally reduce CCC when extra-Poisson variation is present.
When ci = TRUE, large-sample confidence intervals are computed for
rho_ccc, rho_ccc_inter, rho_ccc_intra_method1, and
rho_ccc_intra_method2. The implementation uses a delta-method
standard error based on the fitted GLMM parameter vector and the inverse
Hessian of the marginal negative log-likelihood:
where is the relevant CCC function. Gradients are evaluated
numerically by central finite differences. The reported point estimates are
not replaced by ; they remain the values from the standard
point-estimate path.
Intervals are formed on a Fisher-Z transformed CCC scale,
, and then back-transformed to the
CCC scale. This is the same broad strategy used elsewhere for CCC-style
intervals because it is usually more stable than a raw Wald interval near
the boundaries. CI limits are not forcibly truncated to .
For overdispersion = "pearson", is a post-fit Pearson
dispersion estimate rather than a likelihood parameter. Delta-method
confidence intervals therefore treat as fixed and a warning is
issued.
This function currently implements the Poisson-log count-data case. The
family and link arguments are present for API stability and
future extensions, but only family = "poisson" and
link = "log" are currently supported. The returned estimates are
variance-component CCCs constrained to ; they are not Lin's raw
moment CCC and should not be expected to produce negative values.
A symmetric numeric matrix of class c("ccc_glmm", "ccc") containing
rho_ccc. Additional pairwise matrices are stored as attributes:
rho_ccc_inter: agreement for replicated method averages.
rho_ccc_intra_method1, rho_ccc_intra_method2:
method-specific repeatability coefficients.
sigma2_subject, sigma2_method,
sigma2_subject_method: fitted variance/disagreement components.
phi, mu, precision, accuracy: fitted
count-scale diagnostics used in the CCC decomposition.
beta0, beta_method, nll, logLik,
convergence_code: fitting diagnostics.
n_obs, n_subjects, m_reps: design diagnostics.
when ci = TRUE, standard error and confidence-limit matrices
for rho_ccc, rho_ccc_inter,
rho_ccc_intra_method1, and rho_ccc_intra_method2.
Carrasco JL (2010). A generalized concordance correlation coefficient based on the variance components generalized linear mixed models for overdispersed count data. Biometrics.
# Example 1: overall agreement for two Poisson count methods. # Here the methods have similar means, so rho_ccc is mainly driven by # between-subject count heterogeneity versus Poisson residual noise. set.seed(1) df1 <- expand.grid( subject = factor(seq_len(12)), method = factor(c("A", "B")), replicate = factor(seq_len(2)) ) subject_eff <- rnorm(12, 0, 0.5) df1$eta <- 1.1 + subject_eff[as.integer(df1$subject)] df1$y <- rpois(nrow(df1), exp(df1$eta)) fit1 <- ccc_glmm(df1, "y", "subject", "method", replicate = "replicate", ci = TRUE) fit1 summary(fit1) estimate(fit1) tidy(fit1) ci(fit1) confint(fit1) # Example 2: method bias lowers total agreement. # This tests whether a systematic method shift is reflected in rho_ccc # and the accuracy component. set.seed(2) df2 <- expand.grid( subject = factor(seq_len(12)), method = factor(c("A", "B")), replicate = factor(seq_len(2)) ) subject_eff <- rnorm(12, 0, 0.6) df2$eta <- 1.0 + subject_eff[as.integer(df2$subject)] + ifelse(df2$method == "B", 0.6, 0) df2$y <- rpois(nrow(df2), exp(df2$eta)) fit2 <- ccc_glmm(df2, "y", "subject", "method", replicate = "replicate") summary(fit2) # Example 3: subject-by-method variation. # This tests whether individual subjects respond differently by method. # The subject-method component is useful when disagreement is not explained # by a single fixed method bias. set.seed(3) df3 <- expand.grid( subject = factor(seq_len(10)), method = factor(c("A", "B")), replicate = factor(seq_len(2)) ) subject_eff <- rnorm(10, 0, 0.4) subject_method_eff <- matrix(rnorm(20, 0, 0.25), nrow = 10) method_id <- as.integer(df3$method) df3$eta <- 1.1 + subject_eff[as.integer(df3$subject)] + subject_method_eff[cbind(as.integer(df3$subject), method_id)] df3$y <- rpois(nrow(df3), exp(df3$eta)) fit3 <- ccc_glmm( df3, "y", "subject", "method", replicate = "replicate", include_subject_method = TRUE ) attr(fit3, "sigma2_subject_method") # Example 4: four methods. # This tests pairwise agreement across several count methods and helps # identify which method pairs have the strongest total CCC. set.seed(4) df4 <- expand.grid( subject = factor(seq_len(10)), method = factor(c("A", "B", "C", "D")), replicate = factor(seq_len(2)) ) subject_eff <- rnorm(10, 0, 0.5) method_bias <- c(A = 0, B = 0.1, C = 0.4, D = -0.2) df4$eta <- 1.0 + subject_eff[as.integer(df4$subject)] + method_bias[as.character(df4$method)] df4$y <- rpois(nrow(df4), exp(df4$eta)) fit4 <- ccc_glmm(df4, "y", "subject", "method", replicate = "replicate") fit4 summary(fit4, n = 3)# Example 1: overall agreement for two Poisson count methods. # Here the methods have similar means, so rho_ccc is mainly driven by # between-subject count heterogeneity versus Poisson residual noise. set.seed(1) df1 <- expand.grid( subject = factor(seq_len(12)), method = factor(c("A", "B")), replicate = factor(seq_len(2)) ) subject_eff <- rnorm(12, 0, 0.5) df1$eta <- 1.1 + subject_eff[as.integer(df1$subject)] df1$y <- rpois(nrow(df1), exp(df1$eta)) fit1 <- ccc_glmm(df1, "y", "subject", "method", replicate = "replicate", ci = TRUE) fit1 summary(fit1) estimate(fit1) tidy(fit1) ci(fit1) confint(fit1) # Example 2: method bias lowers total agreement. # This tests whether a systematic method shift is reflected in rho_ccc # and the accuracy component. set.seed(2) df2 <- expand.grid( subject = factor(seq_len(12)), method = factor(c("A", "B")), replicate = factor(seq_len(2)) ) subject_eff <- rnorm(12, 0, 0.6) df2$eta <- 1.0 + subject_eff[as.integer(df2$subject)] + ifelse(df2$method == "B", 0.6, 0) df2$y <- rpois(nrow(df2), exp(df2$eta)) fit2 <- ccc_glmm(df2, "y", "subject", "method", replicate = "replicate") summary(fit2) # Example 3: subject-by-method variation. # This tests whether individual subjects respond differently by method. # The subject-method component is useful when disagreement is not explained # by a single fixed method bias. set.seed(3) df3 <- expand.grid( subject = factor(seq_len(10)), method = factor(c("A", "B")), replicate = factor(seq_len(2)) ) subject_eff <- rnorm(10, 0, 0.4) subject_method_eff <- matrix(rnorm(20, 0, 0.25), nrow = 10) method_id <- as.integer(df3$method) df3$eta <- 1.1 + subject_eff[as.integer(df3$subject)] + subject_method_eff[cbind(as.integer(df3$subject), method_id)] df3$y <- rpois(nrow(df3), exp(df3$eta)) fit3 <- ccc_glmm( df3, "y", "subject", "method", replicate = "replicate", include_subject_method = TRUE ) attr(fit3, "sigma2_subject_method") # Example 4: four methods. # This tests pairwise agreement across several count methods and helps # identify which method pairs have the strongest total CCC. set.seed(4) df4 <- expand.grid( subject = factor(seq_len(10)), method = factor(c("A", "B", "C", "D")), replicate = factor(seq_len(2)) ) subject_eff <- rnorm(10, 0, 0.5) method_bias <- c(A = 0, B = 0.1, C = 0.4, D = -0.2) df4$eta <- 1.0 + subject_eff[as.integer(df4$subject)] + method_bias[as.character(df4$method)] df4$y <- rpois(nrow(df4), exp(df4$eta)) fit4 <- ccc_glmm(df4, "y", "subject", "method", replicate = "replicate") fit4 summary(fit4, n = 3)
Compute Lin's Concordance Correlation Coefficient (CCC) from a linear
mixed-effects model fitted by REML. The fixed-effects part can include
method and/or time (optionally their interaction), with a
subject-specific random intercept to capture between-subject variation.
Large inversions are avoided by solving small per-subject
systems.
Assumption: time levels are treated as regular, equally spaced
visits indexed by their order within subject. The AR(1) residual model is
in discrete time on the visit index (not calendar time). NA time codes
break the serial run. Gaps in the factor levels are ignored (adjacent
observed visits are treated as lag-1).
ccc_rm_reml( data, response, subject, method = NULL, time = NULL, ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), ci_mode = c("auto", "raw", "logit"), verbose = FALSE, digits = 4, use_message = TRUE, interaction = FALSE, max_iter = 100, tol = 1e-06, Dmat = NULL, Dmat_type = c("time-avg", "typical-visit", "weighted-avg", "weighted-sq"), Dmat_weights = NULL, Dmat_rescale = TRUE, ar = c("none", "ar1"), ar_rho = NA_real_, slope = c("none", "subject", "method", "custom"), slope_var = NULL, slope_Z = NULL, drop_zero_cols = TRUE, vc_select = c("auto", "none"), vc_alpha = 0.05, vc_test_order = c("subj_time", "subj_method"), include_subj_method = NULL, include_subj_time = NULL, sb_zero_tol = 1e-10 )ccc_rm_reml( data, response, subject, method = NULL, time = NULL, ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), ci_mode = c("auto", "raw", "logit"), verbose = FALSE, digits = 4, use_message = TRUE, interaction = FALSE, max_iter = 100, tol = 1e-06, Dmat = NULL, Dmat_type = c("time-avg", "typical-visit", "weighted-avg", "weighted-sq"), Dmat_weights = NULL, Dmat_rescale = TRUE, ar = c("none", "ar1"), ar_rho = NA_real_, slope = c("none", "subject", "method", "custom"), slope_var = NULL, slope_Z = NULL, drop_zero_cols = TRUE, vc_select = c("auto", "none"), vc_alpha = 0.05, vc_test_order = c("subj_time", "subj_method"), include_subj_method = NULL, include_subj_time = NULL, sb_zero_tol = 1e-10 )
data |
A data frame. |
response |
Character. Response variable name. |
subject |
Character. Subject ID variable name. |
method |
Character or |
time |
Character or |
ci |
Logical. If |
conf_level |
Numeric in |
n_threads |
Integer |
ci_mode |
Character scalar; one of |
verbose |
Logical. If |
digits |
Integer |
use_message |
Logical. When |
interaction |
Logical. Include |
max_iter |
Integer. Maximum iterations for variance-component updates
(default |
tol |
Numeric. Convergence tolerance on parameter change
(default |
Dmat |
Optional |
Dmat_type |
Character, one of
Pick |
Dmat_weights |
Optional numeric weights |
Dmat_rescale |
Logical. When |
ar |
Character. Residual correlation structure: |
ar_rho |
Numeric in |
slope |
Character. Optional extra random-effect design |
slope_var |
For |
slope_Z |
For |
drop_zero_cols |
Logical. When |
vc_select |
Character scalar; one of |
vc_alpha |
Numeric scalar in |
vc_test_order |
Character vector (length 2) with a permutation of
|
include_subj_method, include_subj_time
|
Logical scalars or |
sb_zero_tol |
Non-negative numeric scalar; default |
For measurement on subject under fixed
levels (method, time), we fit
Notation: subjects, total rows;
method levels; time levels; extra
random-slope columns (if any); (or with slopes).
Here is the subject-structured random-effects design and is
block-diagonal at the subject level with the following per-subject
parameterisation. Specifically,
one random intercept with variance ;
optionally, method deviations (one column per method level)
with a common variance and zero
covariances across levels (i.e., multiple of an identity);
optionally, time deviations (one column per time level)
with a common variance and zero
covariances across levels;
optionally, an extra random effect aligned with
(random slope), where each column has its own variance
and columns are uncorrelated.
The fixed-effects design is ~ 1 + method + time and, if
interaction=TRUE, + method:time.
Residual correlation (regular, equally spaced time).
Write . With ar="none", .
With ar="ar1", within-subject residuals follow a discrete AR(1)
process along the visit index after sorting by increasing time level. Ties
retain input order, and any NA time code breaks the series so each
contiguous block of non-NA times forms a run. The correlation
between adjacent observed visits in a run is ; we do not use
calendar-time gaps. Internally we work with the precision of the AR(1)
correlation: for a run of length , the tridiagonal inverse has
The working inverse is .
Per-subject Woodbury system. For subject with
rows, define the per-subject random-effects design (columns:
intercept, method indicators, time indicators; dimension
). The core never forms
explicitly. Instead,
and accumulates GLS blocks via rank- corrections using
:
Because is diagonal with positive entries, each is
symmetric positive definite; solves/inversions use symmetric-PD routines with
a small diagonal ridge and a pseudo-inverse if needed.
Random-slope .
Besides , the function can include an extra design .
slope="subject": has one column (the regressor in
slope_var); is the subject- block, with its own
variance .
slope="method": has one column per method level;
row uses the slope regressor if its method equals level ,
otherwise 0; all-zero columns can be dropped via
drop_zero_cols=TRUE after subsetting. Each column has its own
variance .
slope="custom": is provided fully via slope_Z.
Each column is an independent random effect with its own variance
; cross-covariances among columns are set to 0.
Computations simply augment and the corresponding
inverse-variance block. The EM updates then include, for each column ,
Interpretation: the represent additional within-subject
variability explained by the slope regressor(s) in column and are not part of the CCC
denominator (agreement across methods/time).
EM-style variance-component updates. With current ,
form residuals . The BLUPs and conditional
covariances are
Let . Expected squares then yield closed-form updates:
together with the per-column update for given above.
Iterate until the change across components is tol
or max_iter is reached.
Fixed-effect dispersion : choosing the time-kernel .
Let stack the within-time, pairwise method differences,
grouped by time as with
and . The symmetric
positive semidefinite kernel selects which functional of the
bias profile is targeted by . Internally, the code
rescales any supplied/built to satisfy for
stability and comparability.
Dmat_type = "time-avg" (square of the time-averaged bias).
Let
so that
Methods have equal
means within subject, i.e. for all
. Appropriate when decisions depend on an average over time and
opposite-signed biases are allowed to cancel.
Dmat_type = "typical-visit" (average of squared per-time biases).
With equal visit probability, take
yielding
Methods agree on a
occasion drawn uniformly from the visit set. Use when each visit matters
on its own; alternating signs do not cancel.
Dmat_type = "weighted-avg" (square of a weighted time average).
For user weights with , set
so that
Methods have equal means, i.e. for all
. Use when some visits (e.g., baseline/harvest) are a priori more
influential; opposite-signed biases may cancel according to .
Dmat_type = "weighted-sq" (weighted average of squared per-time biases).
With the same weights , take
giving
Methods agree at visits sampled with
probabilities , counting each visit's discrepancy on its own.
Use when per-visit agreement is required but some visits should be
emphasised more than others.
Time-averaging for CCC (regular visits).
The reported CCC targets agreement of the time-averaged measurements
per method within subject by default (Dmat_type="time-avg"). Averaging over
non-NA visits shrinks time-varying components by
with when residuals are i.i.d. With unbalanced , the
implementation averages the per-(subject,method) values across the
pairs contributing to CCC and then clamps to
for numerical stability. Choosing
Dmat_type="typical-visit" makes match the interpretation of a
randomly sampled occasion instead.
Concordance correlation coefficient. The CCC used is
Special cases: with no method factor, ; with
a single time level, (no -shrinkage).
When or , both -factors equal 1. The extra
random-effect variances (if used) are not included.
CIs / SEs (delta method for CCC). Let
and write with
and
.
The gradient components are
Estimating .
The EM updates write each variance component as an average of per-subject
quantities. For subject ,
where and
.
With subjects, form the empirical covariance of the stacked
subject vectors and scale by to approximate the covariance of the
means:
(Drop rows/columns as needed when nm==0 or nt==0.)
The residual variance estimator is a weighted mean
with . Its variance is
approximated by the variance of a weighted mean of independent terms,
where is the sample variance across
subjects. The method-dispersion term uses the quadratic-form delta already
computed for :
with .
Putting it together. Assemble
by combining the
covariance
block from the subject-level empirical covariance, add the
and
terms on the diagonal,
and ignore cross-covariances across these blocks (a standard large-sample
simplification). Then
A two-sided normal CI is
truncated to in the output for convenience. When is
truncated at 0 or samples are very small/imbalanced, the normal CI can be
mildly anti-conservative near the boundary; a logit transform for CCC or a
subject-level (cluster) bootstrap can be used for sensitivity analysis.
Choosing for AR(1).
When ar="ar1" and ar_rho = NA, is estimated by
profiling the REML log-likelihood at .
With very few visits per subject, can be weakly identified; consider
sensitivity checks over a plausible range.
All per-subject solves are with , so cost
scales with the number of subjects and the fixed-effects dimension rather
than the total number of observations. Solvers use symmetric-PD paths with
a small diagonal ridge and pseudo-inverse,
which helps for very small/unbalanced subsets and near-boundary estimates.
For AR(1), observations are ordered by time within subject; NA time codes
break the run, and gaps between factor levels are treated as regular steps
(elapsed time is not used).
Heteroscedastic slopes across columns are supported.
Each column has its own variance component , but
cross-covariances among columns are set to zero (diagonal block). Column
rescaling changes the implied prior on but does not
introduce correlations.
The C++ backend uses OpenMP loops while also forcing vendor BLAS libraries to
run single-threaded so that overall CPU usage stays predictable. This guard
is applied to OpenBLAS, Apple's Accelerate, and Intel MKL when their runtime
controls are available. You can opt out manually by setting
MATRIXCORR_DISABLE_BLAS_GUARD=1 in the environment before loading
matrixCorr.
Internally, the call is routed to ccc_lmm_reml_pairwise(), which fits one
repeated-measures mixed model per pair of methods. Each model includes:
subject random intercepts (always)
optional subject-by-method (sigma^2_{A \times M}) and
subject-by-time (sigma^2_{A \times T}) variance components
optional random slopes specified via slope/slope_var/slope_Z
residual structure ar = "none" (iid) or ar = "ar1"
D-matrix options (Dmat_type, Dmat, Dmat_weights) control how time
averaging operates when translating variance components into CCC summaries.
Thiago de Paula Oliveira
Lin L (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45: 255-268.
Lin L (2000). A note on the concordance correlation coefficient. Biometrics, 56: 324-325.
Carrasco, J. L. et al. (2013). Estimation of the concordance correlation coefficient for repeated measures using SAS and R. Computer Methods and Programs in Biomedicine, 109(3), 293-304. doi:10.1016/j.cmpb.2012.09.002
King et al. (2007). A Class of Repeated Measures Concordance Correlation Coefficients. Journal of Biopharmaceutical Statistics, 17(4). doi:10.1080/10543400701329455
build_L_Dm_Z_cpp
for constructing //; ccc_rm_ustat
for a U-statistic alternative; and cccrm for a reference approach via
nlme.
# ==================================================================== # 1) Subject x METHOD variance present, no time # y_{i,m} = mu + b_m + u_i + w_{i,m} + e_{i,m} # with u_i ~ N(0, s_A^2), w_{i,m} ~ N(0, s_{AxM}^2) # ==================================================================== set.seed(102) n_subj <- 60 n_time <- 8 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) time <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) sigA <- 0.6 # subject sigAM <- 0.3 # subject x method sigAT <- 0.5 # subject x time sigE <- 0.4 # residual # Expected estimate S_B = 0.2^2 = 0.04 biasB <- 0.2 # fixed method bias # random effects u_i <- rnorm(n_subj, 0, sqrt(sigA)) u <- u_i[as.integer(id)] sm <- interaction(id, method, drop = TRUE) w_im_lv <- rnorm(nlevels(sm), 0, sqrt(sigAM)) w_im <- w_im_lv[as.integer(sm)] st <- interaction(id, time, drop = TRUE) g_it_lv <- rnorm(nlevels(st), 0, sqrt(sigAT)) g_it <- g_it_lv[as.integer(st)] # residuals & response e <- rnorm(length(id), 0, sqrt(sigE)) y <- (method == "B") * biasB + u + w_im + g_it + e dat_both <- data.frame(y, id, method, time) # Both sigma2_subject_method and sigma2_subject_time are identifiable here fit_both <- ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "auto", ci = TRUE, verbose = TRUE) summary(fit_both) estimate(fit_both) tidy(fit_both) ci(fit_both) confint(fit_both) plot(fit_both) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(fit_both) } # ==================================================================== # 2) Subject x TIME variance present (sag > 0) with two methods # y_{i,m,t} = mu + b_m + u_i + g_{i,t} + e_{i,m,t} # where g_{i,t} ~ N(0, s_{AxT}^2) shared across methods at time t # ==================================================================== set.seed(202) n_subj <- 60; n_time <- 14 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) time <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) sigA <- 0.7 sigAT <- 0.5 sigE <- 0.5 biasB <- 0.25 u <- rnorm(n_subj, 0, sqrt(sigA))[as.integer(id)] gIT <- rnorm(n_subj * n_time, 0, sqrt(sigAT)) g <- gIT[ (as.integer(id) - 1L) * n_time + as.integer(time) ] y <- (method == "B") * biasB + u + g + rnorm(length(id), 0, sqrt(sigE)) dat_sag <- data.frame(y, id, method, time) # sigma_AT should be retained; sigma_AM may be dropped (since w_{i,m}=0) fit_sag <- ccc_rm_reml(dat_sag, "y", "id", method = "method", time = "time", vc_select = "auto", verbose = TRUE) summary(fit_sag) plot(fit_sag) # ==================================================================== # 3) BOTH components present: sab > 0 and sag > 0 (2 methods x T times) # y_{i,m,t} = mu + b_m + u_i + w_{i,m} + g_{i,t} + e_{i,m,t} # ==================================================================== set.seed(303) n_subj <- 60; n_time <- 4 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) time <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) sigA <- 0.8 sigAM <- 0.3 sigAT <- 0.4 sigE <- 0.5 biasB <- 0.2 u <- rnorm(n_subj, 0, sqrt(sigA))[as.integer(id)] # (subject, method) random deviations: repeat per (i,m) across its times wIM <- rnorm(n_subj * 2, 0, sqrt(sigAM)) w <- wIM[ (as.integer(id) - 1L) * 2 + as.integer(method) ] # (subject, time) random deviations: shared across methods at time t gIT <- rnorm(n_subj * n_time, 0, sqrt(sigAT)) g <- gIT[ (as.integer(id) - 1L) * n_time + as.integer(time) ] y <- (method == "B") * biasB + u + w + g + rnorm(length(id), 0, sqrt(sigE)) dat_both <- data.frame(y, id, method, time) fit_both <- ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "auto", verbose = TRUE, ci = TRUE) summary(fit_both) plot(fit_both) # If you want to force-include both VCs (skip testing): fit_both_forced <- ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "none", include_subj_method = TRUE, include_subj_time = TRUE, verbose = TRUE) summary(fit_both_forced) plot(fit_both_forced) # ==================================================================== # 4) D_m choices: time-averaged (default) vs typical visit # ==================================================================== # Time-average ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "none", include_subj_method = TRUE, include_subj_time = TRUE, Dmat_type = "time-avg") # Typical visit ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "none", include_subj_method = TRUE, include_subj_time = TRUE, Dmat_type = "typical-visit") # ==================================================================== # 5) AR(1) residual correlation with fixed rho (larger example) # ==================================================================== set.seed(10) n_subj <- 40 n_time <- 10 methods <- c("A", "B", "C", "D") nm <- length(methods) id <- factor(rep(seq_len(n_subj), each = n_time * nm)) method <- factor(rep(rep(methods, each = n_time), times = n_subj), levels = methods) time <- factor(rep(rep(seq_len(n_time), times = nm), times = n_subj)) beta0 <- 0 beta_t <- 0.2 bias_met <- c(A = 0.00, B = 0.30, C = -0.15, D = 0.05) sigA <- 1.0 rho_true <- 0.6 sigE <- 0.7 t_num <- as.integer(time) t_c <- t_num - mean(seq_len(n_time)) mu <- beta0 + beta_t * t_c + bias_met[as.character(method)] u_subj <- rnorm(n_subj, 0, sqrt(sigA)) u <- u_subj[as.integer(id)] e <- numeric(length(id)) for (s in seq_len(n_subj)) { for (m in methods) { idx <- which(id == levels(id)[s] & method == m) e[idx] <- stats::arima.sim(list(ar = rho_true), n = n_time, sd = sigE) } } y <- mu + u + e dat_ar4 <- data.frame(y = y, id = id, method = method, time = time) fit4 <- ccc_rm_reml(dat_ar4, response = "y", subject = "id", method = "method", time = "time", ar = "ar1", ar_rho = 0.6, verbose = TRUE) fit4 summary(fit4) plot(fit4) # ==================================================================== # 6) Random slope variants (subject, method, custom Z) # ==================================================================== ## By SUBJECT set.seed(2) n_subj <- 60; n_time <- 4 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) subj <- as.integer(id) slope_i <- rnorm(n_subj, 0, 0.15) slope_vec <- slope_i[subj] base <- rnorm(n_subj, 0, 1.0)[subj] tnum <- as.integer(tim) y <- base + 0.3*(method=="B") + slope_vec*(tnum - mean(seq_len(n_time))) + rnorm(length(id), 0, 0.5) dat_s <- data.frame(y, id, method, time = tim) dat_s$t_num <- as.integer(dat_s$time) dat_s$t_c <- ave(dat_s$t_num, dat_s$id, FUN = function(v) v - mean(v)) ccc_rm_reml(dat_s, "y", "id", method = "method", time = "time", slope = "subject", slope_var = "t_c", verbose = TRUE) ## By METHOD set.seed(3) n_subj <- 60; n_time <- 4 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) slope_m <- ifelse(method=="B", 0.25, 0.00) base <- rnorm(n_subj, 0, 1.0)[as.integer(id)] tnum <- as.integer(tim) y <- base + 0.3*(method=="B") + slope_m*(tnum - mean(seq_len(n_time))) + rnorm(length(id), 0, 0.5) dat_m <- data.frame(y, id, method, time = tim) dat_m$t_num <- as.integer(dat_m$time) dat_m$t_c <- ave(dat_m$t_num, dat_m$id, FUN = function(v) v - mean(v)) ccc_rm_reml(dat_m, "y", "id", method = "method", time = "time", slope = "method", slope_var = "t_c", verbose = TRUE) ## SUBJECT + METHOD random slopes (custom Z) set.seed(4) n_subj <- 50; n_time <- 4 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) subj <- as.integer(id) slope_subj <- rnorm(n_subj, 0, 0.12)[subj] slope_B <- ifelse(method=="B", 0.18, 0.00) tnum <- as.integer(tim) base <- rnorm(n_subj, 0, 1.0)[subj] y <- base + 0.3*(method=="B") + (slope_subj + slope_B) * (tnum - mean(seq_len(n_time))) + rnorm(length(id), 0, 0.5) dat_bothRS <- data.frame(y, id, method, time = tim) dat_bothRS$t_num <- as.integer(dat_bothRS$time) dat_bothRS$t_c <- ave(dat_bothRS$t_num, dat_bothRS$id, FUN = function(v) v - mean(v)) MM <- model.matrix(~ 0 + method, data = dat_bothRS) Z_custom <- cbind( subj_slope = dat_bothRS$t_c, MM * dat_bothRS$t_c ) ccc_rm_reml(dat_bothRS, "y", "id", method = "method", time = "time", slope = "custom", slope_Z = Z_custom, verbose = TRUE)# ==================================================================== # 1) Subject x METHOD variance present, no time # y_{i,m} = mu + b_m + u_i + w_{i,m} + e_{i,m} # with u_i ~ N(0, s_A^2), w_{i,m} ~ N(0, s_{AxM}^2) # ==================================================================== set.seed(102) n_subj <- 60 n_time <- 8 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) time <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) sigA <- 0.6 # subject sigAM <- 0.3 # subject x method sigAT <- 0.5 # subject x time sigE <- 0.4 # residual # Expected estimate S_B = 0.2^2 = 0.04 biasB <- 0.2 # fixed method bias # random effects u_i <- rnorm(n_subj, 0, sqrt(sigA)) u <- u_i[as.integer(id)] sm <- interaction(id, method, drop = TRUE) w_im_lv <- rnorm(nlevels(sm), 0, sqrt(sigAM)) w_im <- w_im_lv[as.integer(sm)] st <- interaction(id, time, drop = TRUE) g_it_lv <- rnorm(nlevels(st), 0, sqrt(sigAT)) g_it <- g_it_lv[as.integer(st)] # residuals & response e <- rnorm(length(id), 0, sqrt(sigE)) y <- (method == "B") * biasB + u + w_im + g_it + e dat_both <- data.frame(y, id, method, time) # Both sigma2_subject_method and sigma2_subject_time are identifiable here fit_both <- ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "auto", ci = TRUE, verbose = TRUE) summary(fit_both) estimate(fit_both) tidy(fit_both) ci(fit_both) confint(fit_both) plot(fit_both) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(fit_both) } # ==================================================================== # 2) Subject x TIME variance present (sag > 0) with two methods # y_{i,m,t} = mu + b_m + u_i + g_{i,t} + e_{i,m,t} # where g_{i,t} ~ N(0, s_{AxT}^2) shared across methods at time t # ==================================================================== set.seed(202) n_subj <- 60; n_time <- 14 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) time <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) sigA <- 0.7 sigAT <- 0.5 sigE <- 0.5 biasB <- 0.25 u <- rnorm(n_subj, 0, sqrt(sigA))[as.integer(id)] gIT <- rnorm(n_subj * n_time, 0, sqrt(sigAT)) g <- gIT[ (as.integer(id) - 1L) * n_time + as.integer(time) ] y <- (method == "B") * biasB + u + g + rnorm(length(id), 0, sqrt(sigE)) dat_sag <- data.frame(y, id, method, time) # sigma_AT should be retained; sigma_AM may be dropped (since w_{i,m}=0) fit_sag <- ccc_rm_reml(dat_sag, "y", "id", method = "method", time = "time", vc_select = "auto", verbose = TRUE) summary(fit_sag) plot(fit_sag) # ==================================================================== # 3) BOTH components present: sab > 0 and sag > 0 (2 methods x T times) # y_{i,m,t} = mu + b_m + u_i + w_{i,m} + g_{i,t} + e_{i,m,t} # ==================================================================== set.seed(303) n_subj <- 60; n_time <- 4 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) time <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) sigA <- 0.8 sigAM <- 0.3 sigAT <- 0.4 sigE <- 0.5 biasB <- 0.2 u <- rnorm(n_subj, 0, sqrt(sigA))[as.integer(id)] # (subject, method) random deviations: repeat per (i,m) across its times wIM <- rnorm(n_subj * 2, 0, sqrt(sigAM)) w <- wIM[ (as.integer(id) - 1L) * 2 + as.integer(method) ] # (subject, time) random deviations: shared across methods at time t gIT <- rnorm(n_subj * n_time, 0, sqrt(sigAT)) g <- gIT[ (as.integer(id) - 1L) * n_time + as.integer(time) ] y <- (method == "B") * biasB + u + w + g + rnorm(length(id), 0, sqrt(sigE)) dat_both <- data.frame(y, id, method, time) fit_both <- ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "auto", verbose = TRUE, ci = TRUE) summary(fit_both) plot(fit_both) # If you want to force-include both VCs (skip testing): fit_both_forced <- ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "none", include_subj_method = TRUE, include_subj_time = TRUE, verbose = TRUE) summary(fit_both_forced) plot(fit_both_forced) # ==================================================================== # 4) D_m choices: time-averaged (default) vs typical visit # ==================================================================== # Time-average ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "none", include_subj_method = TRUE, include_subj_time = TRUE, Dmat_type = "time-avg") # Typical visit ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time", vc_select = "none", include_subj_method = TRUE, include_subj_time = TRUE, Dmat_type = "typical-visit") # ==================================================================== # 5) AR(1) residual correlation with fixed rho (larger example) # ==================================================================== set.seed(10) n_subj <- 40 n_time <- 10 methods <- c("A", "B", "C", "D") nm <- length(methods) id <- factor(rep(seq_len(n_subj), each = n_time * nm)) method <- factor(rep(rep(methods, each = n_time), times = n_subj), levels = methods) time <- factor(rep(rep(seq_len(n_time), times = nm), times = n_subj)) beta0 <- 0 beta_t <- 0.2 bias_met <- c(A = 0.00, B = 0.30, C = -0.15, D = 0.05) sigA <- 1.0 rho_true <- 0.6 sigE <- 0.7 t_num <- as.integer(time) t_c <- t_num - mean(seq_len(n_time)) mu <- beta0 + beta_t * t_c + bias_met[as.character(method)] u_subj <- rnorm(n_subj, 0, sqrt(sigA)) u <- u_subj[as.integer(id)] e <- numeric(length(id)) for (s in seq_len(n_subj)) { for (m in methods) { idx <- which(id == levels(id)[s] & method == m) e[idx] <- stats::arima.sim(list(ar = rho_true), n = n_time, sd = sigE) } } y <- mu + u + e dat_ar4 <- data.frame(y = y, id = id, method = method, time = time) fit4 <- ccc_rm_reml(dat_ar4, response = "y", subject = "id", method = "method", time = "time", ar = "ar1", ar_rho = 0.6, verbose = TRUE) fit4 summary(fit4) plot(fit4) # ==================================================================== # 6) Random slope variants (subject, method, custom Z) # ==================================================================== ## By SUBJECT set.seed(2) n_subj <- 60; n_time <- 4 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) subj <- as.integer(id) slope_i <- rnorm(n_subj, 0, 0.15) slope_vec <- slope_i[subj] base <- rnorm(n_subj, 0, 1.0)[subj] tnum <- as.integer(tim) y <- base + 0.3*(method=="B") + slope_vec*(tnum - mean(seq_len(n_time))) + rnorm(length(id), 0, 0.5) dat_s <- data.frame(y, id, method, time = tim) dat_s$t_num <- as.integer(dat_s$time) dat_s$t_c <- ave(dat_s$t_num, dat_s$id, FUN = function(v) v - mean(v)) ccc_rm_reml(dat_s, "y", "id", method = "method", time = "time", slope = "subject", slope_var = "t_c", verbose = TRUE) ## By METHOD set.seed(3) n_subj <- 60; n_time <- 4 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) slope_m <- ifelse(method=="B", 0.25, 0.00) base <- rnorm(n_subj, 0, 1.0)[as.integer(id)] tnum <- as.integer(tim) y <- base + 0.3*(method=="B") + slope_m*(tnum - mean(seq_len(n_time))) + rnorm(length(id), 0, 0.5) dat_m <- data.frame(y, id, method, time = tim) dat_m$t_num <- as.integer(dat_m$time) dat_m$t_c <- ave(dat_m$t_num, dat_m$id, FUN = function(v) v - mean(v)) ccc_rm_reml(dat_m, "y", "id", method = "method", time = "time", slope = "method", slope_var = "t_c", verbose = TRUE) ## SUBJECT + METHOD random slopes (custom Z) set.seed(4) n_subj <- 50; n_time <- 4 id <- factor(rep(seq_len(n_subj), each = 2 * n_time)) tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj)) method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj)) subj <- as.integer(id) slope_subj <- rnorm(n_subj, 0, 0.12)[subj] slope_B <- ifelse(method=="B", 0.18, 0.00) tnum <- as.integer(tim) base <- rnorm(n_subj, 0, 1.0)[subj] y <- base + 0.3*(method=="B") + (slope_subj + slope_B) * (tnum - mean(seq_len(n_time))) + rnorm(length(id), 0, 0.5) dat_bothRS <- data.frame(y, id, method, time = tim) dat_bothRS$t_num <- as.integer(dat_bothRS$time) dat_bothRS$t_c <- ave(dat_bothRS$t_num, dat_bothRS$id, FUN = function(v) v - mean(v)) MM <- model.matrix(~ 0 + method, data = dat_bothRS) Z_custom <- cbind( subj_slope = dat_bothRS$t_c, MM * dat_bothRS$t_c ) ccc_rm_reml(dat_bothRS, "y", "id", method = "method", time = "time", slope = "custom", slope_Z = Z_custom, verbose = TRUE)
Computes all pairwise Lin's Concordance Correlation Coefficients (CCC)
across multiple methods (L 2) for repeated-measures data.
Each subject must be measured by all methods across the same set of time
points or replicates.
CCC measures both accuracy (how close measurements are to the line of equality) and precision (Pearson correlation). Confidence intervals are optionally computed using a U-statistics-based estimator with Fisher's Z transformation
ccc_rm_ustat( data, response, subject, method, time = NULL, ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE, Dmat = NULL, delta = 1 )ccc_rm_ustat( data, response, subject, method, time = NULL, ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE, Dmat = NULL, delta = 1 )
data |
A data frame containing the repeated-measures dataset. |
response |
Character. Name of the numeric outcome column. |
subject |
Character. Column identifying subjects. Every subject must have measurements from all methods (and times, when supplied); rows with incomplete {subject, time, method} coverage are dropped per pair. |
method |
Character. Name of the method column (factor with L
|
time |
Character or NULL. Name of the time/repetition column. If NULL, one time point is assumed. |
ci |
Logical. If TRUE, returns confidence intervals (default FALSE). |
conf_level |
Confidence level for CI (default 0.95). |
n_threads |
Integer ( |
verbose |
Logical. If TRUE, prints diagnostic output (default FALSE). |
Dmat |
Optional numeric weight matrix (T |
delta |
Numeric. Exponent applied to the absolute pointwise differences
between two method trajectories before the time-weighted quadratic form is
evaluated. Internally, the function forms
In most applications, |
This function computes pairwise Lin's Concordance Correlation Coefficient (CCC) between methods in a repeated-measures design using a U-statistics-based nonparametric estimator proposed by Carrasco et al. (2013). It is computationally efficient and robust, particularly for large-scale or balanced longitudinal designs.
Lin's CCC is defined as
where:
and are paired measurements from two methods.
, are means, and ,
are variances.
For repeated measures across time points and subjects we
assume
all pairs of subjects are considered to compute a
U-statistic estimator for within-method and cross-method distances.
if delta > 0, pairwise distances are raised to a power before
applying a time-weighted kernel matrix .
if delta = 0, the method reduces to a version similar to a
repeated-measures kappa.
Confidence intervals are constructed using a Fisher Z-transformation of the CCC. Specifically,
The CCC is transformed using
.
Standard errors are computed from the asymptotic variance of the U-statistic.
Normal-based intervals are computed on the Z-scale and then back-transformed to the CCC scale.
The design must be balanced, where all subjects must have complete observations for all methods and time points.
The method is nonparametric and does not require assumptions of normality or linear mixed effects.
Weights (Dmat) allow differential importance of time points.
For unbalanced or complex hierarchical data (e.g.,
missing timepoints, covariate adjustments), consider using
ccc_rm_reml, which uses a variance components approach
via linear mixed models.
If ci = FALSE, a symmetric matrix of class "ccc" (estimates only).
If ci = TRUE, a list of class "ccc", "ccc_ci" with elements:
est: CCC estimate matrix
lwr.ci: Lower bound matrix
upr.ci: Upper bound matrix
Thiago de Paula Oliveira
Lin L (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45: 255-268.
Lin L (2000). A note on the concordance correlation coefficient. Biometrics, 56: 324-325.
Carrasco JL, Jover L (2003). Estimating the concordance correlation coefficient: a new approach. Computational Statistics & Data Analysis, 47(4): 519-539.
ccc, ccc_rm_reml,
plot.ccc, print.ccc
set.seed(123) df <- expand.grid(subject = 1:10, time = 1:2, method = c("A", "B", "C")) df$y <- rnorm(nrow(df), mean = match(df$method, c("A", "B", "C")), sd = 1) # CCC matrix (no CIs) ccc1 <- ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time") print(ccc1) summary(ccc1) plot(ccc1) # With confidence intervals ccc2 <- ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time", ci = TRUE) print(ccc2) summary(ccc2) estimate(ccc2) tidy(ccc2) ci(ccc2) confint(ccc2) plot(ccc2) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(ccc2) } #------------------------------------------------------------------------ # Choosing delta based on distance sensitivity #------------------------------------------------------------------------ # Standard quadratic RM-CCC distance: (X - Y)' D (X - Y) ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time", delta = 1) # Fourth-power loss when D is diagonal: emphasises large disagreements ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time", delta = 2) # Binary disagreement indicator before aggregation ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time", delta = 0)set.seed(123) df <- expand.grid(subject = 1:10, time = 1:2, method = c("A", "B", "C")) df$y <- rnorm(nrow(df), mean = match(df$method, c("A", "B", "C")), sd = 1) # CCC matrix (no CIs) ccc1 <- ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time") print(ccc1) summary(ccc1) plot(ccc1) # With confidence intervals ccc2 <- ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time", ci = TRUE) print(ccc2) summary(ccc2) estimate(ccc2) tidy(ccc2) ci(ccc2) confint(ccc2) plot(ccc2) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(ccc2) } #------------------------------------------------------------------------ # Choosing delta based on distance sensitivity #------------------------------------------------------------------------ # Standard quadratic RM-CCC distance: (X - Y)' D (X - Y) ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time", delta = 1) # Fourth-power loss when D is diagonal: emphasises large disagreements ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time", delta = 2) # Binary disagreement indicator before aggregation ccc_rm_ustat(df, response = "y", subject = "subject", method = "method", time = "time", delta = 0)
cia() estimates the coefficient of individual agreement. CIA assesses
individual-level interchangeability by comparing between-method disagreement
with within-method replicate disagreement. Unlike CCC, CIA is not intended
to be driven by between-subject heterogeneity.
The estimator requires replicated readings within method. A data set with one observation per subject per method is insufficient for CIA because the within-method disagreement term cannot be estimated.
cia( data, response, subject, method, replicate, reference = NULL, scope = c("pairwise", "overall"), estimator = c("mom_unconstrained", "vc_constrained"), ci = FALSE, conf_level = 0.95, inference = c("delta", "bootstrap", "none"), B = 1000L, seed = NULL, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE ) ## S3 method for class 'cia' summary(object, digits = 4, ci_digits = 3, ...) ## S3 method for class 'cia' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.cia' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cia' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'cia_ci' print(x, ...)cia( data, response, subject, method, replicate, reference = NULL, scope = c("pairwise", "overall"), estimator = c("mom_unconstrained", "vc_constrained"), ci = FALSE, conf_level = 0.95, inference = c("delta", "bootstrap", "none"), B = 1000L, seed = NULL, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE ) ## S3 method for class 'cia' summary(object, digits = 4, ci_digits = 3, ...) ## S3 method for class 'cia' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.cia' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cia' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'cia_ci' print(x, ...)
data |
A data frame in long format. |
response |
Character scalar naming the numeric measurement column. |
subject |
Character scalar naming the subject/unit identifier column. |
method |
Character scalar naming the method/device/rater column. |
replicate |
Character scalar naming the replicate identifier within each subject-method cell. |
reference |
Optional character scalar naming the reference method. When
|
scope |
One of |
estimator |
One of |
ci |
Logical; if |
conf_level |
Confidence level used when |
inference |
One of |
B |
Number of subject-bootstrap resamples when |
seed |
Optional positive integer seed for reproducible bootstrap resampling. |
n_threads |
Integer >= 1. Number of OpenMP threads passed to the C++ backend. |
verbose |
Logical; if |
object |
A |
digits |
Integer; number of decimal places for estimates. |
ci_digits |
Integer; number of decimal places for confidence limits. |
... |
Additional arguments passed to downstream print helpers. |
x |
A |
n |
Optional preview row threshold. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns. |
width |
Optional display width. |
show_ci |
One of |
title |
Optional plot title. |
low_color |
Color used for lower agreement values. |
high_color |
Color used for higher agreement values. |
mid_color |
Midpoint color for pairwise heatmaps. |
value_text_size |
Text size for overlaid estimate labels. |
ci_text_size |
Text size for CI labels. |
show_value |
Logical; whether to overlay numeric values. |
Let Y_ijk denote replicate k on subject i measured by method j.
Without a reference method, CIA is defined by
With a reference method J, CIA is defined by
This implementation uses method-of-moments estimators built from subject-level disagreement functions.
For pairwise CIA, consider two methods X and Y. For each eligible
subject i, define
where the first two means are over all distinct within-method replicate
pairs and the third mean is over all cross-method replicate combinations.
Let , , and denote the averages
of these subject-level quantities across eligible subjects. Then the
no-reference pairwise estimator is
and the reference pairwise estimator is
where X is the reference method.
For overall CIA, the current implementation follows the balanced replicated
formulas in the cited papers. This requires n common subjects, J
retained methods, and the same replicate count K >= 2 in every retained
subject-method cell. If the data are not balanced in this sense,
scope = "overall" returns an informative error rather than using an
approximation.
Write for the mean of the K replicates on subject i
and method j. Define the within-method mean square
For the no-reference overall estimator, define the subject-level within term
the subject-wide mean
and the subject-level denominator
With and
, the overall no-reference estimator is
For the reference overall estimator, let method R be the retained
reference method. Define
and
With and
, the overall reference estimator is
Two estimators are available. estimator = "mom_unconstrained" reports the
raw ratio estimator directly. estimator = "vc_constrained" applies the
non-negative variance-component boundary from the cited papers. In the
no-reference setting this uses
, and in the reference
setting it uses .
The constrained estimator then sets
and reports
on the corresponding scale. This
applies the boundary on the implied inter-method variance component rather
than clamping CIA directly.
When confidence intervals are requested, pairwise CIA uses a large-sample delta-method normal interval by default and also supports subject-bootstrap percentile intervals. For overall CIA, delta-method normal intervals are available for the unconstrained moment estimator, and subject-bootstrap percentile intervals are also available.
High CIA indicates stronger individual agreement. The FDA individual
bioequivalence boundary IEC <= 2.4948 corresponds to CIA >= 0.445, and
CIA >= 0.8 is sometimes used as a stronger practical rule. Such thresholds
are context-dependent and are not hard-coded by this function.
Missing rows in the required columns are removed before estimation and the
counts are recorded in attr(x, "diagnostics").
For scope = "overall", a one-row data frame with class
c("cia_overall", "cia", "data.frame"). For scope = "pairwise" and
ci = FALSE, a dense matrix-style object using the package's standard
correlation-result infrastructure. For scope = "pairwise" and ci = TRUE,
a list with elements est, lwr.ci, and upr.ci, classed as
c("cia", "cia_ci").
Thiago de Paula Oliveira
Barnhart HX, Kosinski AS, Haber M. (2007). Assessing individual agreement. Journal of Biopharmaceutical Statistics, 17(4), 697-719. doi:10.1080/10543400701329489
Barnhart HX, Haber M, Lokhnygina Y, Kosinski AS. (2007). Comparison of concordance correlation coefficient and coefficient of individual agreement in assessing agreement. Journal of Biopharmaceutical Statistics, 17(4), 721-738.
Pan Y, Gao J, Haber M, Barnhart HX. (2010). Estimation of coefficients of individual agreement (CIA's) for quantitative and binary data using SAS and R. Computer Methods and Programs in Biomedicine.
ccc, ba, and prob_agree.
cia() answers the question "Are methods interchangeable for
individual subjects once within-method replicate disagreement is taken into
account?". In contrast, ccc() answers "How well do two paired
measurements agree overall, combining correlation with location/scale
agreement?".
set.seed(1) subjects <- sprintf("s%02d", 1:30) methods <- c("A", "B", "C") replicates <- sprintf("r%02d", 1:20) dat <- expand.grid( subject = subjects, method = methods, replicate = replicates, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE ) subject_effect <- stats::rnorm(length(subjects), sd = 3) method_shift <- c(A = 0, B = 0.15, C = -0.10) method_sd <- c(A = 0.45, B = 0.45, C = 0.30) dat$value <- subject_effect[match(dat$subject, subjects)] + method_shift[dat$method] + stats::rnorm(nrow(dat), sd = method_sd[dat$method]) fit_overall <- cia( dat, response = "value", subject = "subject", method = "method", replicate = "replicate", scope = "overall" ) print(fit_overall) summary(fit_overall) estimate(fit_overall) tidy(fit_overall) plot(fit_overall) fit_pairwise <- cia( dat, response = "value", subject = "subject", method = "method", replicate = "replicate", scope = "pairwise" ) print(fit_pairwise) summary(fit_pairwise) tidy(fit_pairwise) plot(fit_pairwise)set.seed(1) subjects <- sprintf("s%02d", 1:30) methods <- c("A", "B", "C") replicates <- sprintf("r%02d", 1:20) dat <- expand.grid( subject = subjects, method = methods, replicate = replicates, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE ) subject_effect <- stats::rnorm(length(subjects), sd = 3) method_shift <- c(A = 0, B = 0.15, C = -0.10) method_sd <- c(A = 0.45, B = 0.45, C = 0.30) dat$value <- subject_effect[match(dat$subject, subjects)] + method_shift[dat$method] + stats::rnorm(nrow(dat), sd = method_sd[dat$method]) fit_overall <- cia( dat, response = "value", subject = "subject", method = "method", replicate = "replicate", scope = "overall" ) print(fit_overall) summary(fit_overall) estimate(fit_overall) tidy(fit_overall) plot(fit_overall) fit_pairwise <- cia( dat, response = "value", subject = "subject", method = "method", replicate = "replicate", scope = "pairwise" ) print(fit_pairwise) summary(fit_pairwise) tidy(fit_pairwise) plot(fit_pairwise)
Computes pairwise repeated-measures coefficients of individual agreement (CIA) from long-format matched repeated-measures data using the categorical condition ANOVA formulation of Haber, Gao, and Barnhart (2010).
cia_rm( data, response, subject, method = NULL, time = NULL, ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), estimator = c("vc_constrained", "mom_unconstrained"), inference = c("delta", "bootstrap", "none"), verbose = FALSE, digits = 4, use_message = TRUE, homogeneous = FALSE, B = 1000L, seed = NULL, ... ) ## S3 method for class 'cia_rm' summary( object, digits = 4, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cia_rm' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.cia_rm' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cia_rm' plot( x, title = NULL, facet_by_pair = FALSE, facet_scales = c("fixed", "free_y"), show_ci = NULL, show_common = TRUE, point_size = 2.2, line_size = 0.7, ci_alpha = 0.16, ci_linewidth = 0.45, ... )cia_rm( data, response, subject, method = NULL, time = NULL, ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), estimator = c("vc_constrained", "mom_unconstrained"), inference = c("delta", "bootstrap", "none"), verbose = FALSE, digits = 4, use_message = TRUE, homogeneous = FALSE, B = 1000L, seed = NULL, ... ) ## S3 method for class 'cia_rm' summary( object, digits = 4, ci_digits = 3, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cia_rm' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.cia_rm' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cia_rm' plot( x, title = NULL, facet_by_pair = FALSE, facet_scales = c("fixed", "free_y"), show_ci = NULL, show_common = TRUE, point_size = 2.2, line_size = 0.7, ci_alpha = 0.16, ci_linewidth = 0.45, ... )
data |
A data frame in long format. |
response |
Character scalar naming the numeric response column. |
subject |
Character scalar naming the subject identifier column. |
method |
Character scalar naming the method column. Required. |
time |
Character scalar naming the repeated condition column. Required. |
ci |
Logical; if |
conf_level |
Confidence level for intervals when |
n_threads |
Positive integer thread hint passed to the C++ backend. |
estimator |
One of |
inference |
One of |
verbose |
Logical; if |
digits |
Integer print precision carried on the returned object. |
use_message |
Logical; if |
homogeneous |
Logical; if |
B |
Number of subject-bootstrap resamples when |
seed |
Optional positive integer seed for reproducible bootstrap resampling. |
... |
Additional arguments passed to |
object |
A |
ci_digits |
Integer; number of digits for confidence interval bounds. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns. |
width |
Optional display width. |
show_ci |
Logical; if |
x |
A |
title |
Optional plot title. |
facet_by_pair |
Logical; if |
facet_scales |
Passed to |
show_common |
Logical; if |
point_size |
Numeric point size passed to |
line_size |
Numeric line width passed to |
ci_alpha |
Numeric alpha used for confidence ribbons in continuous plots. |
ci_linewidth |
Numeric line width used for confidence-interval error bars in categorical plots. |
cia_rm() is for matched repeated measurements under conditions such as
raters, visits, laboratories, treatments, or time points. It is not a
technical-replicate estimator. If the same subject has true technical
replicates within each subject-method cell, use cia() instead.
Let Y_ijk denote the measurement on subject i, with method j, under
condition k, and consider the model
Here subject is random, method and time/condition are fixed,
subject-method and subject-time are random interactions, method-time is a
fixed interaction, and e_ijk is the residual.
For each method pair and condition k, the repeated-measures CIA is
where d_k is the condition-specific mean difference between the two
methods, is the subject-method variance
component, and is the residual variance component.
The function fits this estimator separately to every method pair. The homogeneity of agreement across conditions is tested by the method-time interaction:
The returned homogeneity_F attribute stores this test statistic for each
method pair, and homogeneity_p stores the corresponding upper-tail
F-test p-value using degrees of freedom df_method_time and df_error.
Larger homogeneity_F values and smaller homogeneity_p values indicate
stronger evidence that agreement changes across conditions, meaning that the
condition-specific CIA curves should be interpreted directly rather than
relying on a single pooled/common coefficient. Conversely, a small
homogeneity_F and a non-small homogeneity_p indicate that the data do
not show strong evidence of method-by-condition heterogeneity, so a common
estimate may be a reasonable summary if it is scientifically meaningful.
When homogeneous = TRUE, the function also reports a common or pooled CIA
for each method pair from the reduced model that pools the method-time sum of
squares into the residual term. That common estimate is meaningful only when
agreement is reasonably homogeneous across conditions.
Confidence intervals can be computed by a delta-method normal approximation
or by a subject-level percentile bootstrap. The delta-method interval is the
default. For each method pair and condition k, let
be the ANOVA estimator, let be its numerical
gradient with respect to the subject-level moment vector, and let
be the empirical covariance matrix of that moment vector divided
by the number of subjects. The delta-method standard error is
The raw normal interval is
Under the bootstrap option, the interval is given by the empirical
percentile limits from subject-resampled estimates. The argument
estimator controls whether the reported result is the literal
method-of-moments ratio or a bounded variance-component variant. Under
estimator = "vc_constrained", the estimated subject-method variance
component is constrained to be non-negative before converting it back to CIA,
and the reported interval limits are also truncated to the CIA parameter
space,
This keeps the reported interval inside the CIA parameter space [0, 1].
Use estimator = "mom_unconstrained" to inspect the literal raw
method-of-moments estimator and the corresponding unbounded interval on the
estimator scale.
The current implementation requires exactly one observation in every subject-method-time cell. It therefore targets the balanced repeated-measures ANOVA setting from the cited paper and returns an error otherwise.
If ci = FALSE, the result is a list of class c("cia_rm", "cia").
If ci = TRUE, the result is a list of class
c("cia_rm", "cia_ci", "cia").
Main components:
est: condition-specific pairwise CIA estimates.
common: pairwise overall summary CIA estimates from the reduced
homogeneous model.
condition: the ordered condition/time labels used in the output.
se: standard errors for est when ci = TRUE.
lwr.ci and upr.ci: lower and upper confidence limits for est
when ci = TRUE.
common.se: standard errors for common when ci = TRUE.
common.lwr.ci and common.upr.ci: lower and upper confidence
limits for common when ci = TRUE.
When there are three or more methods, additional overall multi-method components are returned:
overall: condition-specific overall CIA across all methods.
overall.common: the overall summary CIA across all methods when
homogeneous = TRUE.
overall.se: standard errors for overall when ci = TRUE.
overall.lwr.ci and overall.upr.ci: lower and upper confidence
limits for overall when ci = TRUE.
overall.common.se: standard errors for overall.common when
ci = TRUE.
overall.common.lwr.ci and overall.common.upr.ci: lower and upper
confidence limits for overall.common when ci = TRUE.
The object carries pairwise ANOVA diagnostics as attributes, including
sigma2_error, sigma2_subject_method, repeatability,
homogeneity_F, homogeneity_p, df_method_time, df_error,
n_obs, n_subjects, n_methods, n_times, time_levels, and
method_levels. Here homogeneity_F is the method-time interaction test
statistic for each method pair, and homogeneity_p is the corresponding
p-value for the null hypothesis of homogeneous agreement across conditions.
Thiago de Paula Oliveira
Haber M, Gao J, Barnhart HX. (2010). Evaluation of agreement between measurement methods from data with matched repeated measurements via the coefficient of individual agreement. Journal of Data Science, 8, 457-469.
Barnhart HX, Kosinski AS, Haber M. (2007). Assessing individual agreement. Journal of Biopharmaceutical Statistics, 17(4), 697-719.
ccc_rm_reml(), icc_rm_reml(), and cia().
# Example 1 set.seed(1) dat_rater <- expand.grid( id = factor(sprintf("s%02d", 1:8)), method = factor(c("Device_A", "Device_B")), time = factor(c("Rater_1", "Rater_2", "Rater_3")), KEEP.OUT.ATTRS = FALSE ) subj_eff <- rnorm(nlevels(dat_rater$id), sd = 1)[dat_rater$id] method_shift <- c(Device_A = 0, Device_B = 0.25)[dat_rater$method] rater_shift <- c(Rater_1 = 0, Rater_2 = 0.15, Rater_3 = -0.10)[dat_rater$time] interaction_shift <- ifelse( dat_rater$method == "Device_B" & dat_rater$time == "Rater_3", 0.12, 0 ) dat_rater$y <- subj_eff + method_shift + rater_shift + interaction_shift + rnorm(nrow(dat_rater), sd = 0.25) fit_rater <- cia_rm( dat_rater, response = "y", subject = "id", method = "method", time = "time", homogeneous = TRUE ) print(fit_rater) summary(fit_rater) estimate(fit_rater) tidy(fit_rater) plot(fit_rater) # Example 2 set.seed(2) dat_time <- expand.grid( id = factor(sprintf("s%02d", 1:10)), method = factor(c("Assay_A", "Assay_B", "Assay_C", "Assay_D")), time = factor( c("baseline", "week2", "month1", "month2", "month3"), levels = c("baseline", "week2", "month1", "month2", "month3") ), KEEP.OUT.ATTRS = FALSE ) subj_eff <- rnorm(nlevels(dat_time$id), sd = 0.9)[dat_time$id] method_shift <- c(Assay_A = 0, Assay_B = 0.20, Assay_C = -0.10, Assay_D = 0.35)[dat_time$method] time_shift <- c( baseline = 0, week2 = 0.10, month1 = 0.22, month2 = 0.32, month3 = 0.42 )[dat_time$time] interaction_shift <- ifelse(dat_time$method == "Assay_B" & dat_time$time == "month3", 0.10, 0) + ifelse(dat_time$method == "Assay_D" & dat_time$time == "month2", -0.08, 0) dat_time$y <- subj_eff + method_shift + time_shift + interaction_shift + rnorm(nrow(dat_time), sd = 0.30) fit_time <- cia_rm( dat_time, response = "y", subject = "id", method = "method", time = "time", ci = TRUE ) print(fit_time) plot(fit_time) # Example 3 set.seed(3) dat_days <- expand.grid( id = factor(sprintf("s%02d", 1:12)), method = factor(c("Sensor_A", "Sensor_B", "Sensor_C")), time = 1:15, KEEP.OUT.ATTRS = FALSE ) subj_eff <- rnorm(nlevels(dat_days$id), sd = 0.8)[dat_days$id] method_shift <- c(Sensor_A = 0, Sensor_B = 0.15, Sensor_C = -0.08)[dat_days$method] day_trend <- 0.05 * dat_days$time interaction_shift <- ifelse(dat_days$method == "Sensor_B", 0.01 * dat_days$time, 0) + ifelse(dat_days$method == "Sensor_C", -0.005 * dat_days$time, 0) dat_days$y <- subj_eff + method_shift + day_trend + interaction_shift + rnorm(nrow(dat_days), sd = 0.22) fit_days <- cia_rm( dat_days, response = "y", subject = "id", method = "method", time = "time", ci = TRUE ) plot(fit_days, facet_by_pair = TRUE)# Example 1 set.seed(1) dat_rater <- expand.grid( id = factor(sprintf("s%02d", 1:8)), method = factor(c("Device_A", "Device_B")), time = factor(c("Rater_1", "Rater_2", "Rater_3")), KEEP.OUT.ATTRS = FALSE ) subj_eff <- rnorm(nlevels(dat_rater$id), sd = 1)[dat_rater$id] method_shift <- c(Device_A = 0, Device_B = 0.25)[dat_rater$method] rater_shift <- c(Rater_1 = 0, Rater_2 = 0.15, Rater_3 = -0.10)[dat_rater$time] interaction_shift <- ifelse( dat_rater$method == "Device_B" & dat_rater$time == "Rater_3", 0.12, 0 ) dat_rater$y <- subj_eff + method_shift + rater_shift + interaction_shift + rnorm(nrow(dat_rater), sd = 0.25) fit_rater <- cia_rm( dat_rater, response = "y", subject = "id", method = "method", time = "time", homogeneous = TRUE ) print(fit_rater) summary(fit_rater) estimate(fit_rater) tidy(fit_rater) plot(fit_rater) # Example 2 set.seed(2) dat_time <- expand.grid( id = factor(sprintf("s%02d", 1:10)), method = factor(c("Assay_A", "Assay_B", "Assay_C", "Assay_D")), time = factor( c("baseline", "week2", "month1", "month2", "month3"), levels = c("baseline", "week2", "month1", "month2", "month3") ), KEEP.OUT.ATTRS = FALSE ) subj_eff <- rnorm(nlevels(dat_time$id), sd = 0.9)[dat_time$id] method_shift <- c(Assay_A = 0, Assay_B = 0.20, Assay_C = -0.10, Assay_D = 0.35)[dat_time$method] time_shift <- c( baseline = 0, week2 = 0.10, month1 = 0.22, month2 = 0.32, month3 = 0.42 )[dat_time$time] interaction_shift <- ifelse(dat_time$method == "Assay_B" & dat_time$time == "month3", 0.10, 0) + ifelse(dat_time$method == "Assay_D" & dat_time$time == "month2", -0.08, 0) dat_time$y <- subj_eff + method_shift + time_shift + interaction_shift + rnorm(nrow(dat_time), sd = 0.30) fit_time <- cia_rm( dat_time, response = "y", subject = "id", method = "method", time = "time", ci = TRUE ) print(fit_time) plot(fit_time) # Example 3 set.seed(3) dat_days <- expand.grid( id = factor(sprintf("s%02d", 1:12)), method = factor(c("Sensor_A", "Sensor_B", "Sensor_C")), time = 1:15, KEEP.OUT.ATTRS = FALSE ) subj_eff <- rnorm(nlevels(dat_days$id), sd = 0.8)[dat_days$id] method_shift <- c(Sensor_A = 0, Sensor_B = 0.15, Sensor_C = -0.08)[dat_days$method] day_trend <- 0.05 * dat_days$time interaction_shift <- ifelse(dat_days$method == "Sensor_B", 0.01 * dat_days$time, 0) + ifelse(dat_days$method == "Sensor_C", -0.005 * dat_days$time, 0) dat_days$y <- subj_eff + method_shift + day_trend + interaction_shift + rnorm(nrow(dat_days), sd = 0.22) fit_days <- cia_rm( dat_days, response = "y", subject = "id", method = "method", time = "time", ci = TRUE ) plot(fit_days, facet_by_pair = TRUE)
Computes unweighted Cohen's kappa for either a pair of nominal rating vectors or all pairwise combinations of nominal columns in a matrix or data frame.
cohen_kappa( data, y = NULL, na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'cohen_kappa' summary(object, digits = 4, ci_digits = 3, p_digits = 4, ...) ## S3 method for class 'summary.cohen_kappa' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cohen_kappa' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cohen_kappa' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )cohen_kappa( data, y = NULL, na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'cohen_kappa' summary(object, digits = 4, ci_digits = 3, p_digits = 4, ...) ## S3 method for class 'summary.cohen_kappa' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cohen_kappa' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'cohen_kappa' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )
data |
In matrix mode, a matrix or data frame whose rows are
observational units and whose columns are raters or classifiers. Supported
column types are factor, ordered factor, character, logical, integer, and
numeric, all treated as nominal categories here. If the ratings are truly
ordinal and disagreements should be weighted by distance, use
|
y |
Optional second nominal rating vector. When supplied, the function
returns a single Cohen's kappa estimate for |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical; if |
p_value |
Logical; if |
conf_level |
Confidence level used when |
n_threads |
Integer |
output |
Output representation for matrix mode.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
retain entries with |
diag |
Logical; whether to include diagonal entries in sparse and edge-list outputs. |
... |
Additional theme arguments. |
object |
A scalar or matrix-style |
digits |
Integer; number of decimal places for displayed values. |
ci_digits |
Integer; number of decimal places for confidence limits. |
p_digits |
Integer; number of decimal places for p-values. |
x |
A scalar or matrix-style |
n |
Optional preview row threshold. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns. |
width |
Optional display width. |
show_ci |
One of |
title |
Optional plot title. |
low_color |
Fill/color used for negative agreement. |
high_color |
Fill/color used for positive agreement. |
mid_color |
Unused placeholder for API consistency with matrix heatmaps. |
value_text_size |
Text size for the estimate label. |
ci_text_size |
Text size for the CI label. |
show_value |
Logical; whether to print the estimate and CI labels. |
Cohen's kappa is an agreement coefficient for two raters assigning the same
units to mutually exclusive nominal categories. For contingency-table cell
proportions ,
where is the observed agreement and
is the chance agreement implied by the
marginal category proportions.
This implementation is strictly the original unweighted nominal-scale Cohen's
kappa. If the categories are ordinal and near disagreements should count as
less severe than distant disagreements, use weighted_kappa() instead.
In matrix mode, columns are treated as raters/classifiers and rows as shared observational units. All pairwise Cohen's kappas between columns are computed. Category labels are encoded in R before dispatch to C++, using a common label mapping across all columns so that identical labels in different columns correspond to the same category code.
Missing values are encoded as NA_integer_ before entering the C++
backend. With na_method = "error", missing values are rejected before
computation. With na_method = "complete", listwise deletion is
applied across retained columns. With na_method = "pairwise", each
pair uses its own complete observations. Pairwise complete counts are stored
in attr(x, "diagnostics")$n_complete.
Confidence intervals and standard errors.
The implementation uses the exact large-sample formula. Let
be the empirical cell proportions,
the row margins, the
column margins, and . For each cell , define
the influence contribution
The variance estimator used by the code is
with the number of complete paired ratings and
.
The confidence interval is the Wald interval
truncated to in the returned result.
If y is supplied, a scalar S3 object of class "cohen_kappa"
backed by a numeric value, with attributes diagnostics, and
optionally ci, inference, and conf.level. Otherwise a
symmetric matrix-style result with estimator class cohen_kappa.
Thiago de Paula Oliveira
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. doi:10.1177/001316446002000104
weighted_kappa() for two-rater ordered-category agreement with
distance-sensitive disagreement weights; multirater_kappa() for
panel-level nominal agreement among three or more raters.
x <- factor(c("A", "A", "B", "B", "A", "B")) y <- factor(c("A", "B", "B", "B", "A", "A")) cohen_kappa(x, y) raters <- data.frame( r1 = factor(c("low", "low", "high", "high", "mid")), r2 = factor(c("low", "mid", "high", "high", "mid")), r3 = c("low", "low", "high", "mid", "mid") ) ck <- cohen_kappa(raters) print(ck) summary(ck) estimate(ck) tidy(ck) plot(ck)x <- factor(c("A", "A", "B", "B", "A", "B")) y <- factor(c("A", "B", "B", "B", "A", "A")) cohen_kappa(x, y) raters <- data.frame( r1 = factor(c("low", "low", "high", "high", "mid")), r2 = factor(c("low", "mid", "high", "high", "mid")), r3 = c("low", "low", "high", "mid", "mid") ) ck <- cohen_kappa(raters) print(ck) summary(ck) estimate(ck) tidy(ck) plot(ck)
Computes pairwise distance correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Distance correlation detects general dependence, including non-linear relationships. Optional p-values are available via the bias-corrected distance-correlation t-test.
dcor( data, na_method = c("error", "pairwise", "complete"), p_value = FALSE, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'dcor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'dcor' plot( x, title = "Distance correlation heatmap", low_color = "white", high_color = "steelblue1", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'dcor' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.dcor' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )dcor( data, na_method = c("error", "pairwise", "complete"), p_value = FALSE, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'dcor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'dcor' plot( x, title = "Distance correlation heatmap", low_color = "white", high_color = "steelblue1", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'dcor' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.dcor' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns are dropped. Columns must be numeric. |
na_method |
Character scalar controlling missing-data handling.
|
p_value |
Logical (default |
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Colour for zero correlation. Default is |
high_color |
Colour for strong correlation. Default is |
value_text_size |
Font size for displaying values. Default is |
show_value |
Logical; if |
object |
An object of class |
Let and be the pairwise distance matrix
with zero diagonal: , for
. Define row sums and
grand sum . The U-centred matrix is
For two variables , the unbiased distance covariance and variances are
with defined analogously from .
The unbiased distance correlation is
Computation. All heavy lifting (distance matrices, U-centering,
and unbiased scaling) is implemented in C++ (ustat_dcor_matrix_cpp),
so the R wrapper only validates/coerces the input. OpenMP parallelises the
upper-triangular loops. The implementation includes a Huo-Szekely style
univariate dispatch for pairwise terms. We also have an exact
unbiased fallback retained for robustness in small-sample or
non-finite-path cases; no external dependencies are used.
Inference. When p_value = TRUE, the package computes the
bias-corrected distance-correlation t-test of independence of Szekely and
Rizzo (2013). Let denote the signed
bias-corrected distance correlation used internally by the test (that is,
the same ratio before the package's usual clipping to ). With
the test statistic is
referenced to a Student -distribution with degrees of
freedom. The reported p-value uses the upper-tail probability
. This inference payload is attached as metadata; the
main returned matrix is unchanged unless p_value is explicitly
requested. The t reference is an asymptotic approximation. For small
complete-case sample sizes, especially when is small, p-values
can be unstable and should be interpreted cautiously; a permutation-based
dependence test is preferable when exact small-sample calibration matters.
A symmetric numeric matrix where the (i, j) entry is the
unbiased distance correlation between the i-th and j-th
numeric columns. The object has class dcor with attributes
method = "distance_correlation", description, and
package = "matrixCorr". When p_value = TRUE, the object also
carries an inference attribute with matrices estimate,
statistic, parameter, and p_value, plus
attr(x, "diagnostics")$n_complete. The main returned matrix remains
the usual non-negative unbiased distance-correlation estimate.
Invisibly returns x.
A ggplot object representing the heatmap.
Requires . Columns with (near) zero unbiased distance
variance yield NA in their row/column. Typical per-pair cost uses
the fast path, with fallback when needed.
Thiago de paula Oliveira
Szekely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769-2794.
Szekely, G. J., & Rizzo, M. L. (2013). The distance correlation t-test of independence. Journal of Multivariate Analysis, 117, 193-213.
Rizzo, M. L., & Szekely, G. J. (2024). energy: E-statistics (energy statistics). R package version 1.7-12.
## Independent variables -> dCor ~ 0 set.seed(1) X <- cbind(a = rnorm(200), b = rnorm(200)) D <- dcor(X) print(D, digits = 3) summary(D) ## Non-linear dependence: Pearson ~ 0, but unbiased dCor > 0 set.seed(42) n <- 200 x <- rnorm(n) y <- x^2 + rnorm(n, sd = 0.2) XY <- cbind(x = x, y = y) D2 <- dcor(XY) # Compare Pearson vs unbiased distance correlation round(c(pearson = cor(XY)[1, 2], dcor = D2["x", "y"]), 3) summary(D2) plot(D2, title = "Unbiased distance correlation (non-linear example)") ## Small AR(1) multivariate normal example set.seed(7) p <- 5; n <- 150; rho <- 0.6 Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-")) X3 <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma) colnames(X3) <- paste0("V", seq_len(p)) D3 <- dcor(X3) print(D3[1:3, 1:3], digits = 2) ## Optional inference D4 <- dcor(XY, p_value = TRUE) summary(D4) estimate(D4) tidy(D4) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(D) }## Independent variables -> dCor ~ 0 set.seed(1) X <- cbind(a = rnorm(200), b = rnorm(200)) D <- dcor(X) print(D, digits = 3) summary(D) ## Non-linear dependence: Pearson ~ 0, but unbiased dCor > 0 set.seed(42) n <- 200 x <- rnorm(n) y <- x^2 + rnorm(n, sd = 0.2) XY <- cbind(x = x, y = y) D2 <- dcor(XY) # Compare Pearson vs unbiased distance correlation round(c(pearson = cor(XY)[1, 2], dcor = D2["x", "y"]), 3) summary(D2) plot(D2, title = "Unbiased distance correlation (non-linear example)") ## Small AR(1) multivariate normal example set.seed(7) p <- 5; n <- 150; rho <- 0.6 Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-")) X3 <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma) colnames(X3) <- paste0("V", seq_len(p)) D3 <- dcor(X3) print(D3[1:3, 1:3], digits = 2) ## Optional inference D4 <- dcor(XY, p_value = TRUE) summary(D4) estimate(D4) tidy(D4) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(D) }
Temporary wrappers for functions renamed in matrixCorr 1.0.0. These
wrappers preserve the pre-1.0 entry points while warning that they will be
removed in 2.0.0.
bland_altman( group1, group2, two = 1.96, mode = 1L, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE ) bland_altman_repeated( data = NULL, response, subject, method, time, two = 1.96, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), include_slope = FALSE, use_ar1 = FALSE, ar1_rho = NA_real_, max_iter = 200L, tol = 1e-06, verbose = FALSE ) biweight_mid_corr( data, c_const = 9, max_p_outliers = 1, pearson_fallback = c("hybrid", "none", "all"), na_method = c("error", "pairwise", "complete"), mad_consistent = FALSE, w = NULL, sparse_threshold = NULL, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) distance_corr(data, na_method = c("error", "pairwise", "complete"), ...) partial_correlation( data, method = c("oas", "ridge", "sample"), lambda = 0.001, return_cov_precision = FALSE, ci = FALSE, conf_level = 0.95, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ccc_lmm_reml( data, response, rind, method = NULL, time = NULL, interaction = FALSE, max_iter = 100, tol = 1e-06, Dmat = NULL, Dmat_type = c("time-avg", "typical-visit", "weighted-avg", "weighted-sq"), Dmat_weights = NULL, Dmat_rescale = TRUE, ci = FALSE, conf_level = 0.95, ci_mode = c("auto", "raw", "logit"), verbose = FALSE, digits = 4, use_message = TRUE, ar = c("none", "ar1"), ar_rho = NA_real_, slope = c("none", "subject", "method", "custom"), slope_var = NULL, slope_Z = NULL, drop_zero_cols = TRUE, vc_select = c("auto", "none"), vc_alpha = 0.05, vc_test_order = c("subj_time", "subj_method"), include_subj_method = NULL, include_subj_time = NULL, sb_zero_tol = 1e-10 ) ccc_pairwise_u_stat( data, response, method, subject, time = NULL, Dmat = NULL, delta = 1, ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE )bland_altman( group1, group2, two = 1.96, mode = 1L, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE ) bland_altman_repeated( data = NULL, response, subject, method, time, two = 1.96, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), include_slope = FALSE, use_ar1 = FALSE, ar1_rho = NA_real_, max_iter = 200L, tol = 1e-06, verbose = FALSE ) biweight_mid_corr( data, c_const = 9, max_p_outliers = 1, pearson_fallback = c("hybrid", "none", "all"), na_method = c("error", "pairwise", "complete"), mad_consistent = FALSE, w = NULL, sparse_threshold = NULL, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) distance_corr(data, na_method = c("error", "pairwise", "complete"), ...) partial_correlation( data, method = c("oas", "ridge", "sample"), lambda = 0.001, return_cov_precision = FALSE, ci = FALSE, conf_level = 0.95, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ccc_lmm_reml( data, response, rind, method = NULL, time = NULL, interaction = FALSE, max_iter = 100, tol = 1e-06, Dmat = NULL, Dmat_type = c("time-avg", "typical-visit", "weighted-avg", "weighted-sq"), Dmat_weights = NULL, Dmat_rescale = TRUE, ci = FALSE, conf_level = 0.95, ci_mode = c("auto", "raw", "logit"), verbose = FALSE, digits = 4, use_message = TRUE, ar = c("none", "ar1"), ar_rho = NA_real_, slope = c("none", "subject", "method", "custom"), slope_var = NULL, slope_Z = NULL, drop_zero_cols = TRUE, vc_select = c("auto", "none"), vc_alpha = 0.05, vc_test_order = c("subj_time", "subj_method"), include_subj_method = NULL, include_subj_time = NULL, sb_zero_tol = 1e-10 ) ccc_pairwise_u_stat( data, response, method, subject, time = NULL, Dmat = NULL, delta = 1, ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE )
group1, group2
|
Numeric vectors of equal length. |
two |
Positive scalar; the multiple of the standard deviation used to define the limits of agreement. |
mode |
Integer; 1 uses |
conf_level |
Confidence level. |
n_threads |
Integer number of OpenMP threads. |
verbose |
Logical; print brief progress or diagnostic output. |
data |
A |
response |
Numeric response vector or column name, depending on the target method. |
subject |
Subject identifier or subject column name. |
method |
Method label or method column name. |
time |
Replicate/time index or time column name. |
include_slope |
Logical; whether to estimate proportional bias. |
use_ar1 |
Logical; whether to use AR(1) within-subject correlation. |
ar1_rho |
AR(1) parameter. |
max_iter, tol
|
EM control parameters. |
c_const |
Positive numeric Tukey biweight tuning constant. |
max_p_outliers |
Numeric in |
pearson_fallback |
Character fallback policy used by |
na_method |
Missing-data policy forwarded to the replacement function when supported. |
mad_consistent |
Logical; if |
w |
Optional vector of case weights. |
sparse_threshold |
Optional threshold controlling sparse output. |
output |
Output representation for the computed estimates. |
threshold |
Non-negative absolute-value filter for non-matrix outputs. |
diag |
Logical; whether to include diagonal entries in non-matrix outputs. |
... |
Additional arguments forwarded to the replacement function when supported. |
lambda |
Numeric regularisation strength used by |
return_cov_precision |
Logical; if |
ci |
Logical; if |
rind |
Character; column identifying subjects, forwarded as |
interaction |
Logical; forwarded to |
Dmat |
Optional distance matrix forwarded to |
Dmat_type |
Character selector controlling how |
Dmat_weights |
Optional weights used when |
Dmat_rescale |
Logical; whether to rescale |
ci_mode |
Character selector for the confidence-interval scale used by
|
digits |
Display precision forwarded to |
use_message |
Logical; whether the deprecated wrapper emits a lifecycle message. |
ar |
Character selector for the within-subject residual correlation model. |
ar_rho |
Numeric AR(1) parameter. |
slope |
Character selector for the proportional-bias slope structure. |
slope_var |
Optional covariance matrix for custom slopes. |
slope_Z |
Optional design matrix for custom slopes. |
drop_zero_cols |
Logical; whether zero-variance design columns are dropped. |
vc_select |
Character selector controlling variance-component selection. |
vc_alpha |
Significance level used in variance-component selection. |
vc_test_order |
Character vector controlling the variance-component test order. |
include_subj_method |
Optional logical override for the subject-by-method component. |
include_subj_time |
Optional logical override for the subject-by-time component. |
sb_zero_tol |
Numerical tolerance used when stabilising the scale-bias term. |
delta |
Numeric power exponent for U-statistics distances. |
Renamed functions:
bland_altman() -> ba()
bland_altman_repeated() -> ba_rm()
biweight_mid_corr() -> bicor()
distance_corr() -> dcor()
partial_correlation() -> pcorr()
ccc_lmm_reml() -> ccc_rm_reml()
ccc_pairwise_u_stat() -> ccc_rm_ustat()
The deprecated wrappers will be removed in matrixCorr 2.0.0.
Lightweight accessors for matrixCorr result objects. These functions do not change the stored object structure; they just provide a stable way to extract the estimate matrix, confidence intervals, or a pairwise data frame without reading attributes directly.
estimate(x, ...) tidy(x, ...) ci(x, ...) ## S3 method for class 'corr_result' estimate(x, ...) ## S3 method for class 'summary.corr_result' estimate(x, ...) ## S3 method for class 'summary.matrixCorr' estimate(x, ...) ## S3 method for class 'corr_result' coef(object, ...) ## S3 method for class 'corr_result' ci(x, ...) ## Default S3 method: ci(x, ...) ## S3 method for class 'ba' ci(x, ...) ## S3 method for class 'ba_matrix' ci(x, ...) ## S3 method for class 'ba_repeated' ci(x, ...) ## S3 method for class 'ba_repeated_matrix' ci(x, ...) ## S3 method for class 'ccc' ci(x, ...) ## S3 method for class 'ccc_ci' ci(x, ...) ## S3 method for class 'ccc_glmm' ci(x, ...) ## S3 method for class 'cia' ci(x, ...) ## S3 method for class 'cia_ci' ci(x, ...) ## S3 method for class 'cia_rm' ci(x, ...) ## S3 method for class 'cohen_kappa' ci(x, ...) ## S3 method for class 'gwet_ac' ci(x, ...) ## S3 method for class 'icc' ci(x, ...) ## S3 method for class 'icc_overall' ci(x, ...) ## S3 method for class 'icc_rm_reml' ci(x, ...) ## S3 method for class 'krippendorff_alpha' ci(x, ...) ## S3 method for class 'multirater_kappa' ci(x, ...) ## S3 method for class 'partial_corr' ci(x, ...) ## S3 method for class 'prob_agree' ci(x, ...) ## S3 method for class 'rmcorr' ci(x, ...) ## S3 method for class 'rmcorr_matrix' ci(x, ...) ## S3 method for class 'weighted_kappa' ci(x, ...) ## S3 method for class 'partial_corr' estimate(x, ...) ## S3 method for class 'ba' estimate(x, ...) ## S3 method for class 'ba_matrix' estimate(x, ...) ## S3 method for class 'ba_repeated' estimate(x, ...) ## S3 method for class 'ba_repeated_matrix' estimate(x, ...) ## S3 method for class 'ccc_glmm' estimate(x, ...) ## S3 method for class 'ccc_glmm' coef(object, ...) ## Default S3 method: estimate(x, ...) ## S3 method for class 'corr_result' tidy(x, diag = FALSE, triangle = c("upper", "lower", "full"), ...) ## S3 method for class 'summary.corr_result' tidy(x, ...) ## S3 method for class 'summary.matrixCorr' tidy(x, ...) ## S3 method for class 'partial_corr' tidy(x, diag = FALSE, triangle = c("upper", "lower", "full"), ...) ## S3 method for class 'ba' tidy(x, ...) ## S3 method for class 'ba_matrix' tidy(x, ...) ## S3 method for class 'ba_repeated' tidy(x, ...) ## S3 method for class 'ba_repeated_matrix' tidy(x, ...) ## Default S3 method: tidy(x, diag = FALSE, triangle = c("upper", "lower", "full"), ...) ## S3 method for class 'corr_result' confint(object, parm, level = NULL, ...) ## S3 method for class 'summary.corr_result' confint(object, parm, level = NULL, ...) ## S3 method for class 'summary.matrixCorr' confint(object, parm, level = NULL, ...) ## S3 method for class 'ba' confint(object, parm, level = NULL, ...) ## S3 method for class 'ba_repeated' confint(object, parm, level = NULL, ...) ## S3 method for class 'ba_matrix' confint(object, parm, level = NULL, ...) ## S3 method for class 'ba_repeated_matrix' confint(object, parm, level = NULL, ...) ## S3 method for class 'ccc' confint(object, parm, level = NULL, ...) ## S3 method for class 'ccc_ci' confint(object, parm, level = NULL, ...) ## S3 method for class 'ccc_glmm' confint(object, parm, level = NULL, ...) ## S3 method for class 'chatterjee_xi_scalar' confint(object, parm, level = NULL, ...) ## S3 method for class 'cia' confint(object, parm, level = NULL, ...) ## S3 method for class 'cia_ci' confint(object, parm, level = NULL, ...) ## S3 method for class 'cia_rm' confint(object, parm, level = NULL, ...) ## S3 method for class 'cohen_kappa' confint(object, parm, level = NULL, ...) ## S3 method for class 'gwet_ac' confint(object, parm, level = NULL, ...) ## S3 method for class 'icc' confint(object, parm, level = NULL, ...) ## S3 method for class 'icc_overall' confint(object, parm, level = NULL, ...) ## S3 method for class 'icc_rm_reml' confint(object, parm, level = NULL, ...) ## S3 method for class 'krippendorff_alpha' confint(object, parm, level = NULL, ...) ## S3 method for class 'multirater_kappa' confint(object, parm, level = NULL, ...) ## S3 method for class 'prob_agree' confint(object, parm, level = NULL, ...) ## S3 method for class 'partial_corr' confint(object, parm, level = NULL, ...) ## S3 method for class 'rmcorr' confint(object, parm, level = NULL, ...) ## S3 method for class 'rmcorr_matrix' confint(object, parm, level = NULL, ...) ## S3 method for class 'weighted_kappa' confint(object, parm, level = NULL, ...)estimate(x, ...) tidy(x, ...) ci(x, ...) ## S3 method for class 'corr_result' estimate(x, ...) ## S3 method for class 'summary.corr_result' estimate(x, ...) ## S3 method for class 'summary.matrixCorr' estimate(x, ...) ## S3 method for class 'corr_result' coef(object, ...) ## S3 method for class 'corr_result' ci(x, ...) ## Default S3 method: ci(x, ...) ## S3 method for class 'ba' ci(x, ...) ## S3 method for class 'ba_matrix' ci(x, ...) ## S3 method for class 'ba_repeated' ci(x, ...) ## S3 method for class 'ba_repeated_matrix' ci(x, ...) ## S3 method for class 'ccc' ci(x, ...) ## S3 method for class 'ccc_ci' ci(x, ...) ## S3 method for class 'ccc_glmm' ci(x, ...) ## S3 method for class 'cia' ci(x, ...) ## S3 method for class 'cia_ci' ci(x, ...) ## S3 method for class 'cia_rm' ci(x, ...) ## S3 method for class 'cohen_kappa' ci(x, ...) ## S3 method for class 'gwet_ac' ci(x, ...) ## S3 method for class 'icc' ci(x, ...) ## S3 method for class 'icc_overall' ci(x, ...) ## S3 method for class 'icc_rm_reml' ci(x, ...) ## S3 method for class 'krippendorff_alpha' ci(x, ...) ## S3 method for class 'multirater_kappa' ci(x, ...) ## S3 method for class 'partial_corr' ci(x, ...) ## S3 method for class 'prob_agree' ci(x, ...) ## S3 method for class 'rmcorr' ci(x, ...) ## S3 method for class 'rmcorr_matrix' ci(x, ...) ## S3 method for class 'weighted_kappa' ci(x, ...) ## S3 method for class 'partial_corr' estimate(x, ...) ## S3 method for class 'ba' estimate(x, ...) ## S3 method for class 'ba_matrix' estimate(x, ...) ## S3 method for class 'ba_repeated' estimate(x, ...) ## S3 method for class 'ba_repeated_matrix' estimate(x, ...) ## S3 method for class 'ccc_glmm' estimate(x, ...) ## S3 method for class 'ccc_glmm' coef(object, ...) ## Default S3 method: estimate(x, ...) ## S3 method for class 'corr_result' tidy(x, diag = FALSE, triangle = c("upper", "lower", "full"), ...) ## S3 method for class 'summary.corr_result' tidy(x, ...) ## S3 method for class 'summary.matrixCorr' tidy(x, ...) ## S3 method for class 'partial_corr' tidy(x, diag = FALSE, triangle = c("upper", "lower", "full"), ...) ## S3 method for class 'ba' tidy(x, ...) ## S3 method for class 'ba_matrix' tidy(x, ...) ## S3 method for class 'ba_repeated' tidy(x, ...) ## S3 method for class 'ba_repeated_matrix' tidy(x, ...) ## Default S3 method: tidy(x, diag = FALSE, triangle = c("upper", "lower", "full"), ...) ## S3 method for class 'corr_result' confint(object, parm, level = NULL, ...) ## S3 method for class 'summary.corr_result' confint(object, parm, level = NULL, ...) ## S3 method for class 'summary.matrixCorr' confint(object, parm, level = NULL, ...) ## S3 method for class 'ba' confint(object, parm, level = NULL, ...) ## S3 method for class 'ba_repeated' confint(object, parm, level = NULL, ...) ## S3 method for class 'ba_matrix' confint(object, parm, level = NULL, ...) ## S3 method for class 'ba_repeated_matrix' confint(object, parm, level = NULL, ...) ## S3 method for class 'ccc' confint(object, parm, level = NULL, ...) ## S3 method for class 'ccc_ci' confint(object, parm, level = NULL, ...) ## S3 method for class 'ccc_glmm' confint(object, parm, level = NULL, ...) ## S3 method for class 'chatterjee_xi_scalar' confint(object, parm, level = NULL, ...) ## S3 method for class 'cia' confint(object, parm, level = NULL, ...) ## S3 method for class 'cia_ci' confint(object, parm, level = NULL, ...) ## S3 method for class 'cia_rm' confint(object, parm, level = NULL, ...) ## S3 method for class 'cohen_kappa' confint(object, parm, level = NULL, ...) ## S3 method for class 'gwet_ac' confint(object, parm, level = NULL, ...) ## S3 method for class 'icc' confint(object, parm, level = NULL, ...) ## S3 method for class 'icc_overall' confint(object, parm, level = NULL, ...) ## S3 method for class 'icc_rm_reml' confint(object, parm, level = NULL, ...) ## S3 method for class 'krippendorff_alpha' confint(object, parm, level = NULL, ...) ## S3 method for class 'multirater_kappa' confint(object, parm, level = NULL, ...) ## S3 method for class 'prob_agree' confint(object, parm, level = NULL, ...) ## S3 method for class 'partial_corr' confint(object, parm, level = NULL, ...) ## S3 method for class 'rmcorr' confint(object, parm, level = NULL, ...) ## S3 method for class 'rmcorr_matrix' confint(object, parm, level = NULL, ...) ## S3 method for class 'weighted_kappa' confint(object, parm, level = NULL, ...)
x, object
|
A matrixCorr result, summary object, or scalar result. |
... |
Additional arguments passed to methods. |
diag |
Logical; include diagonal entries for matrix-style results.
Default is |
triangle |
For matrix-style correlation results, which triangle to
return from |
parm |
Ignored; included for compatibility with |
level |
Confidence level requested by |
estimate() returns the primary estimate in its natural shape: a
matrix-like result returns a matrix, an edge-list result returns a data frame,
and a scalar result returns a numeric value.
tidy() returns a data frame with columns such as item1,
item2, estimate, lwr, upr, diagnostics, and
inferential quantities when available.
confint() returns a data frame containing confidence limits when they
are available.
ci() returns the stored confidence-interval payload. It is mainly a
structured alternative to reading attr(x, "ci") directly.
Thiago de Paula Oliveira
X <- cbind(a = 1:10, b = 1:10 + rnorm(10), c = rnorm(10)) fit <- pearson_corr(X, ci = TRUE) estimate(fit) tidy(fit) ci(fit) confint(fit)X <- cbind(a = 1:10, b = 1:10 + rnorm(10), c = rnorm(10)) fit <- pearson_corr(X, ci = TRUE) estimate(fit) tidy(fit) ci(fit) confint(fit)
Estimates Gwet's chance-corrected agreement coefficient for either unweighted nominal agreement (AC1) or weighted agreement (AC2) in two-rater pairwise form or panel-level multi-rater form.
gwet_ac( data, y = NULL, weights = c("unweighted", "linear", "quadratic", "ordinal", "radical", "ratio", "circular", "bipolar"), levels = NULL, input = c("pairwise", "ratings", "counts"), na_method = c("error", "pairwise", "complete", "available"), min_raters = 2L, by_category = FALSE, ci = FALSE, p_value = FALSE, conf_level = 0.95, se_method = c("asymptotic", "jackknife", "none"), n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, verbose = FALSE, ... ) ## S3 method for class 'gwet_ac' summary(object, digits = 4, ci_digits = 3, p_digits = 4, ...) ## S3 method for class 'summary.gwet_ac' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'gwet_ac' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, show_by_category = FALSE, ... ) ## S3 method for class 'gwet_ac' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, type = c("agreement_map", "estimate", "item_agreement", "category_proportion", "by_category"), bins = 30L, ... )gwet_ac( data, y = NULL, weights = c("unweighted", "linear", "quadratic", "ordinal", "radical", "ratio", "circular", "bipolar"), levels = NULL, input = c("pairwise", "ratings", "counts"), na_method = c("error", "pairwise", "complete", "available"), min_raters = 2L, by_category = FALSE, ci = FALSE, p_value = FALSE, conf_level = 0.95, se_method = c("asymptotic", "jackknife", "none"), n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, verbose = FALSE, ... ) ## S3 method for class 'gwet_ac' summary(object, digits = 4, ci_digits = 3, p_digits = 4, ...) ## S3 method for class 'summary.gwet_ac' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'gwet_ac' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, show_by_category = FALSE, ... ) ## S3 method for class 'gwet_ac' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, type = c("agreement_map", "estimate", "item_agreement", "category_proportion", "by_category"), bins = 30L, ... )
data |
Input data. If |
y |
Optional second rater vector for scalar two-rater agreement. |
weights |
Weight specification. |
levels |
Optional explicit category labels. This is mainly useful when unused categories should still be retained in the calculation, and is the recommended way to define ordering for ordinal AC2 weights. |
input |
One of |
na_method |
Missing-data rule. For scalar two-rater and pairwise matrix
modes, |
min_raters |
Minimum number of observed ratings required for an item to
be retained in panel mode. Must be at least |
by_category |
Logical; if |
ci |
Logical; if |
p_value |
Logical; if |
conf_level |
Confidence level for intervals. Default is |
se_method |
Standard-error method. |
n_threads |
Integer |
output |
One of |
threshold |
Numeric threshold used only for thresholded pairwise matrix output. |
diag |
Logical; include diagonal entries in thresholded pairwise matrix output. |
verbose |
Logical; if |
... |
Unused. |
object |
A |
digits |
Integer; number of decimal places for displayed values. |
ci_digits |
Integer; number of decimal places for confidence limits. |
p_digits |
Integer; number of decimal places for p-values. |
x |
A |
n |
Optional preview row threshold. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns. |
width |
Optional display width. |
show_ci |
One of |
show_by_category |
Logical; whether to print attached category-wise panel results when available. |
title |
Optional plot title. |
low_color |
Fill/color used for negative agreement. |
high_color |
Fill/color used for positive agreement. |
mid_color |
Unused placeholder for API consistency with matrix heatmaps. |
value_text_size |
Text size for estimate labels in matrix plots. |
ci_text_size |
Text size for CI labels in matrix plots. |
show_value |
Logical; whether to print estimate and CI labels. |
type |
Plot type for panel-level fits: |
bins |
Integer number of bins retained for compatibility with the panel-level item-agreement plot. |
For two raters, let denote the agreement weight for cell
, the cell proportion, and
. Then
and
With identity weights this is Gwet's AC1, the
nominal-agreement coefficient. With non-identity agreement weights this is
Gwet's AC2, which extends the same chance-correction idea to ordered or
partially creditable disagreement by letting near disagreements receive
larger agreement weights than distant disagreements.
Category ordering for weighted AC2. Built-in weighted schemes use
the category order to define near and distant disagreements. If
levels is supplied, that order is used. Otherwise, factor inputs use
their factor levels, numeric/integer/logical inputs use sorted observed
values, character inputs use first-observed order, and counts inputs use
column order (or levels, when supplied). For ordinal analyses,
supplying levels is recommended whenever first-observed character
order is not the intended scale order.
For panel-level counts, if is the number of raters assigning
item to category , , and
is the agreement-weight matrix, then the item-level agreement is
The observed agreement is the mean of over retained items. The
chance term uses the average item-wise category proportions
, where is the number of
retained items.
Confidence intervals and inference.
The primary inferential path is the analytic large-sample method. For two raters, define
The backend variance estimator is
where is the number of complete paired ratings.
For panel-level counts, write
and define the item-specific chance term
The asymptotic linearised contribution used by the backend is
giving
where is the number of retained items.
The reported standard error is
, and the confidence interval is
the t interval
truncated to . Here for the two-rater path and
for panel-level asymptotic inference. The reported test
statistic is the corresponding t ratio for . For
panel-level fits, se_method = "jackknife" remains available as a
second option, using the leave-one-item-out variance
If y is supplied, a scalar numeric object of class
c("gwet_ac", "numeric"). If input = "pairwise", a
corr_result. If input is "ratings" or "counts",
a one-row data frame with class
c("gwet_ac", "agreement_result", "data.frame").
Thiago de Paula Oliveira
Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61, 29-48. doi:10.1348/000711006X126600
cohen_kappa() for two-rater nominal kappa;
multirater_kappa() for panel-level nominal kappa;
weighted_kappa() for weighted Cohen-type agreement.
x <- c("A", "A", "B", "B", "A", "C") y <- c("A", "B", "B", "B", "A", "C") gwet_ac(x, y) gwet_ac(x, y, weights = "quadratic", levels = c("A", "B", "C")) raters <- data.frame( r1 = c("A", "A", "B", "C", "A", "B"), r2 = c("A", "B", "B", "C", "A", "B"), r3 = c("A", "A", "B", "B", "A", "C"), stringsAsFactors = FALSE ) fit_pw <- gwet_ac(raters) fit_panel <- gwet_ac(raters, input = "ratings") print(fit_pw) print(fit_panel) estimate(fit_pw) tidy(fit_pw) tidy(fit_panel)x <- c("A", "A", "B", "B", "A", "C") y <- c("A", "B", "B", "B", "A", "C") gwet_ac(x, y) gwet_ac(x, y, weights = "quadratic", levels = c("A", "B", "C")) raters <- data.frame( r1 = c("A", "A", "B", "C", "A", "B"), r2 = c("A", "B", "B", "C", "A", "B"), r3 = c("A", "A", "B", "B", "A", "C"), stringsAsFactors = FALSE ) fit_pw <- gwet_ac(raters) fit_panel <- gwet_ac(raters, input = "ratings") print(fit_pw) print(fit_panel) estimate(fit_pw) tidy(fit_pw) tidy(fit_panel)
Computes pairwise kernel dependence measures for the numeric columns of a
matrix or data frame using the Hilbert-Schmidt Independence Criterion
(HSIC). By default, hsic() returns a normalised kernel independence
correlation in [0, 1], analogous to a distance-correlation matrix.
hsic( data, kernel = c("gaussian", "linear", "laplace", "polynomial"), bandwidth = c("median", "silverman", "scott"), normalise = TRUE, estimator = c("biased", "unbiased"), na_method = c("error", "pairwise", "complete"), p_value = FALSE, B = 499L, seed = NULL, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'hsic' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'hsic' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.hsic' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'hsic' plot( x, title = NULL, low_color = "white", high_color = "steelblue1", value_text_size = 4, show_value = TRUE, ... )hsic( data, kernel = c("gaussian", "linear", "laplace", "polynomial"), bandwidth = c("median", "silverman", "scott"), normalise = TRUE, estimator = c("biased", "unbiased"), na_method = c("error", "pairwise", "complete"), p_value = FALSE, B = 499L, seed = NULL, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'hsic' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'hsic' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.hsic' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'hsic' plot( x, title = NULL, low_color = "white", high_color = "steelblue1", value_text_size = 4, show_value = TRUE, ... )
data |
A numeric matrix or data frame with at least two numeric columns. Non-numeric columns are dropped. |
kernel |
Kernel used to build univariate Gram matrices. Supported
options are |
bandwidth |
Bandwidth rule for Gaussian and Laplace kernels.
|
normalise |
Logical. If |
estimator |
HSIC estimator. |
na_method |
Missing-data handling. |
p_value |
Logical. If |
B |
Positive integer. Number of permutations used when
|
seed |
Optional positive integer seed for reproducible permutation inference. The seed is passed to the C++ permutation engine and does not mutate the user's global R RNG state. |
n_threads |
Integer |
output |
Output representation: |
threshold |
Non-negative absolute-value filter for non-matrix outputs.
Must be |
diag |
Logical; whether to include diagonal entries in sparse and edge-list outputs. |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns. |
width |
Optional display width. |
show_ci |
One of |
... |
Additional arguments passed to print or plot methods. |
object |
An object of class |
title |
Optional plot title. |
low_color |
Colour for low HSIC values. |
high_color |
Colour for high HSIC values. |
value_text_size |
Font size for tile labels. |
show_value |
Logical; whether to overlay numeric values on the heatmap. |
For paired observations , HSIC measures the squared
Hilbert-Schmidt norm of the RKHS cross-covariance operator. With centred Gram
matrices and , the biased empirical estimator
is
With normalise = TRUE, the returned matrix contains the kernel
independence correlation
The raw HSIC matrix is stored in attr(x, "hsic_raw"). When
estimator = "unbiased", raw HSIC can be slightly negative in finite
samples; these signed raw values are retained in hsic_raw, while the
displayed normalised coefficient is clipped only for the user-facing matrix.
Kernel formulas are:
Gaussian , Laplace
, linear , and polynomial
. For bandwidth = "median", is the
upper median of divided by ; if
this is non-finite or non-positive, .
For p_value = TRUE, the test statistic is the raw HSIC estimate. The null
distribution is generated by permuting one variable within each pair, and the
p-value is . This is an
pairwise procedure and can be expensive for large matrices or
large B.
The polynomial kernel currently uses the fixed form
. More polynomial controls can be added later without
changing the main HSIC object contract.
A matrix-like correlation result. With normalise = TRUE, entries
are normalised kernel independence correlations and the diagonal is 1
when the variable has positive kernel self-dependence. With
normalise = FALSE, entries are raw HSIC estimates. Attributes record the
kernel, bandwidth rule, estimator, raw HSIC matrix, diagnostics, and
optional permutation inference.
Thiago de Paula Oliveira
Gretton, A., Bousquet, O., Smola, A., & Schoelkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory.
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schoelkopf, B., & Smola, A. J. (2007). A kernel statistical test of independence. Advances in Neural Information Processing Systems.
Gretton, A., Bousquet, O., Smola, A., & Schoelkopf, B. (2005). Kernel methods for measuring independence. Journal of Machine Learning Research, 6, 2075-2129.
Pfister, N., Buhlmann, P., Scholkopf, B., & Peters, J. (2018). Kernel-based tests for joint independence. Journal of the Royal Statistical Society: Series B, 80(1), 5-31.
set.seed(1) x <- rnorm(200) y <- rnorm(200) H0 <- hsic(cbind(x = x, y = y)) H0 set.seed(2) x <- rnorm(200) y <- x^2 + rnorm(200, sd = 0.1) H1 <- hsic(cbind(x = x, y = y)) H1 set.seed(3) X <- cbind(a = rnorm(80), b = rnorm(80), c = rnorm(80)) H_perm <- hsic(X, p_value = TRUE, B = 19, seed = 1) summary(H_perm) estimate(H_perm) tidy(H_perm)set.seed(1) x <- rnorm(200) y <- rnorm(200) H0 <- hsic(cbind(x = x, y = y)) H0 set.seed(2) x <- rnorm(200) y <- x^2 + rnorm(200, sd = 0.1) H1 <- hsic(cbind(x = x, y = y)) H1 set.seed(3) X <- cbind(a = rnorm(80), b = rnorm(80), c = rnorm(80)) H_perm <- hsic(X, p_value = TRUE, B = 19, seed = 1) summary(H_perm) estimate(H_perm) tidy(H_perm)
Computes intraclass correlation coefficients for the numeric columns of a matrix or data frame using the classical ANOVA mean-square formulas. The output can be either a pairwise matrix across columns or an overall all-column coefficient table.
icc( data, model = c("oneway", "twoway_random", "twoway_mixed"), type = c("consistency", "agreement"), unit = c("single", "average"), scope = c("pairwise", "overall"), na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE, ... ) ## S3 method for class 'icc' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'icc' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.icc' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'icc_overall' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'icc_overall' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.icc_overall' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )icc( data, model = c("oneway", "twoway_random", "twoway_mixed"), type = c("consistency", "agreement"), unit = c("single", "average"), scope = c("pairwise", "overall"), na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE, ... ) ## S3 method for class 'icc' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'icc' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.icc' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'icc_overall' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'icc_overall' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.icc_overall' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric matrix or data frame with at least two numeric columns. |
model |
Character scalar selecting the classical ICC model.
|
type |
Character scalar selecting the reliability target.
|
unit |
Character scalar selecting whether the coefficient refers to a
single measurement ( |
scope |
Character scalar selecting the analysis target.
|
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical; if |
conf_level |
Confidence level for the interval output. Ignored when
|
n_threads |
Integer number of OpenMP threads. |
verbose |
Logical; if |
... |
Passed to the underlying print helper. |
x |
An intraclass-correlation object returned by |
digits |
Integer; number of digits to print. |
ci_digits |
Integer; number of digits for CI bounds. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
object |
An intraclass-correlation object returned by |
Each column is treated as a measurement channel, method, or rater and each row is treated as a subject.
The function supports two distinct analysis targets.
scope = "pairwise" answers: "how reliable is each specific column pair?"
Each estimate is based on exactly two columns and the output is a symmetric
matrix.
scope = "overall" answers: "how reliable is the full set of columns when
analysed jointly?" The output is the standard six-form overall ANOVA table
(ICC1, ICC2, ICC3, ICC1k, ICC2k, ICC3k).
These two scopes do not target the same quantity. The overall coefficients are computed from the full multi-column ANOVA decomposition and are not obtained by averaging or otherwise aggregating the pairwise matrix.
The three main choice arguments determine the classical ICC form.
model controls the rater structure:
"oneway" uses the one-way random-effects formulation.
"twoway_random" uses the two-way random-effects formulation.
"twoway_mixed" uses the two-way mixed-effects formulation.
type controls whether systematic column mean differences are penalized:
"consistency" targets consistency across columns.
"agreement" targets absolute agreement across columns.
unit controls whether reliability refers to one measurement or to the
average of multiple measurements:
"single" returns the single-measure coefficient.
"average" returns the average-measure coefficient.
The supported mappings are:
model = "oneway", type = "consistency", unit = "single" gives ICC1.
model = "oneway", type = "consistency", unit = "average" gives ICC1k.
model = "twoway_random", type = "agreement", unit = "single" gives ICC2.
model = "twoway_random", type = "agreement", unit = "average" gives ICC2k.
model = "twoway_random", type = "consistency", unit = "single" gives ICC3.
model = "twoway_random", type = "consistency", unit = "average" gives ICC3k.
model = "twoway_mixed", type = "agreement", unit = "single" gives the
mixed-effects absolute-agreement analogue with the same classical point
formula as ICC2.
model = "twoway_mixed", type = "agreement", unit = "average" gives the
corresponding average-measure analogue.
model = "twoway_mixed", type = "consistency", unit = "single" gives ICC3.
model = "twoway_mixed", type = "consistency", unit = "average" gives ICC3k.
The combination model = "oneway", type = "agreement" is not defined here
and returns an error.
For scope = "pairwise", the point estimates are computed in C++ directly
from the two-column ANOVA mean squares for each complete pair. For
unit = "average", the implementation uses k = 2 because each estimate is
based on exactly two columns.
For scope = "overall", the point estimates are computed jointly from the
full wide matrix using the classical ANOVA decomposition over all columns.
Here the average-measure coefficients use k = ncol(data) after any row
filtering required by na_method.
Missing-data handling depends on scope:
with na_method = "error", missing values are rejected before estimation;
with na_method = "pairwise" and scope = "pairwise", each pair uses its
own complete-case overlap;
with na_method = "pairwise" and scope = "overall", rows are restricted
to complete cases across all columns because the overall ANOVA requires a
common wide matrix.
When ci = TRUE, confidence intervals are obtained from the classical
F-based ANOVA formulas corresponding to the selected coefficient. For
scope = "pairwise", non-estimable off-diagonal pairs return NA. For
scope = "overall", the coefficient table includes interval columns for all
six standard rows.
For scope = "pairwise", if ci = FALSE, a symmetric matrix of class
icc. If ci = TRUE, a list with elements est, lwr.ci, and upr.ci
and class c("icc", "icc_ci").
For scope = "overall", a list of class c("icc_overall", "icc") with a
coefficient table, ANOVA table, and mean-square metadata. The coefficient
table always includes the standard six overall coefficients. Confidence
interval columns are attached when ci = TRUE.
All outputs carry attributes describing the selected model, type, unit, and method.
Shrout PE, Fleiss JL (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428.
McGraw KO, Wong SP (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30-46.
ccc(), ccc_rm_reml(), ba(), ba_rm(), rmcorr()
set.seed(123) n <- 40 subj <- rnorm(n, sd = 1) dat <- data.frame( m1 = subj + rnorm(n, sd = 0.3), m2 = 0.2 + subj + rnorm(n, sd = 0.4), m3 = -0.1 + subj + rnorm(n, sd = 0.5) ) fit_icc <- icc(dat, model = "twoway_random", type = "agreement", unit = "single", scope = "pairwise" ) print(fit_icc) summary(fit_icc) estimate(fit_icc) tidy(fit_icc) fit_icc_overall <- icc(dat, scope = "overall", ci = TRUE) print(fit_icc_overall) summary(fit_icc_overall) confint(fit_icc_overall)set.seed(123) n <- 40 subj <- rnorm(n, sd = 1) dat <- data.frame( m1 = subj + rnorm(n, sd = 0.3), m2 = 0.2 + subj + rnorm(n, sd = 0.4), m3 = -0.1 + subj + rnorm(n, sd = 0.5) ) fit_icc <- icc(dat, model = "twoway_random", type = "agreement", unit = "single", scope = "pairwise" ) print(fit_icc) summary(fit_icc) estimate(fit_icc) tidy(fit_icc) fit_icc_overall <- icc(dat, scope = "overall", ci = TRUE) print(fit_icc_overall) summary(fit_icc_overall) confint(fit_icc_overall)
Computes pairwise repeated-measures intraclass correlation coefficients from long-format data using the same REML and Woodbury-identity backend used by the repeated-measures agreement models.
icc_rm_reml( data, response, subject, method = NULL, time = NULL, type = c("consistency", "agreement"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), ci_mode = c("auto", "raw", "logit"), verbose = FALSE, digits = 4, use_message = TRUE, interaction = FALSE, max_iter = 100, tol = 1e-06, Dmat = NULL, Dmat_type = c("time-avg", "typical-visit", "weighted-avg", "weighted-sq"), Dmat_weights = NULL, Dmat_rescale = TRUE, ar = c("none", "ar1"), ar_rho = NA_real_, slope = c("none", "subject", "method", "custom"), slope_var = NULL, slope_Z = NULL, drop_zero_cols = TRUE, vc_select = c("auto", "none"), vc_alpha = 0.05, vc_test_order = c("subj_time", "subj_method"), include_subj_method = NULL, include_subj_time = NULL, sb_zero_tol = 1e-10 ) ## S3 method for class 'icc_rm_reml' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'icc_rm_reml' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.icc_rm_reml' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )icc_rm_reml( data, response, subject, method = NULL, time = NULL, type = c("consistency", "agreement"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), ci_mode = c("auto", "raw", "logit"), verbose = FALSE, digits = 4, use_message = TRUE, interaction = FALSE, max_iter = 100, tol = 1e-06, Dmat = NULL, Dmat_type = c("time-avg", "typical-visit", "weighted-avg", "weighted-sq"), Dmat_weights = NULL, Dmat_rescale = TRUE, ar = c("none", "ar1"), ar_rho = NA_real_, slope = c("none", "subject", "method", "custom"), slope_var = NULL, slope_Z = NULL, drop_zero_cols = TRUE, vc_select = c("auto", "none"), vc_alpha = 0.05, vc_test_order = c("subj_time", "subj_method"), include_subj_method = NULL, include_subj_time = NULL, sb_zero_tol = 1e-10 ) ## S3 method for class 'icc_rm_reml' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'icc_rm_reml' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.icc_rm_reml' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A data frame. |
response |
Character. Response variable name. |
subject |
Character. Subject ID variable name. |
method |
Character or |
time |
Character or |
type |
Character scalar; one of |
ci |
Logical. If |
conf_level |
Numeric in |
n_threads |
Integer |
ci_mode |
Character scalar; one of |
verbose |
Logical. If |
digits |
Integer |
use_message |
Logical. When |
interaction |
Logical. Include |
max_iter |
Integer. Maximum iterations for variance-component updates
(default |
tol |
Numeric. Convergence tolerance on parameter change
(default |
Dmat |
Optional |
Dmat_type |
Character, one of
Pick |
Dmat_weights |
Optional numeric weights |
Dmat_rescale |
Logical. When |
ar |
Character. Residual correlation structure: |
ar_rho |
Numeric in |
slope |
Character. Optional extra random-effect design |
slope_var |
For |
slope_Z |
For |
drop_zero_cols |
Logical. When |
vc_select |
Character scalar; one of |
vc_alpha |
Numeric scalar in |
vc_test_order |
Character vector (length 2) with a permutation of
|
include_subj_method, include_subj_time
|
Logical scalars or |
sb_zero_tol |
Non-negative numeric scalar; default |
x |
For |
ci_digits |
Integer; number of digits for confidence interval bounds in printed method summaries. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Passed to the underlying display helpers. |
object |
For |
The repeated-measures model is fit separately for each method pair using the same REML and Woodbury-identity backend used by the repeated-measures concordance estimator.
Kernel and the fixed-bias term .
Let stack the within-time, pairwise method
differences, grouped by time. The symmetric positive semidefinite kernel
selects which functional of the bias profile is targeted
by . Internally, the code rescales any supplied or constructed
to satisfy for stability and comparability.
Dmat_type = "time-avg" targets the square of the
time-averaged bias.
Dmat_type = "typical-visit" targets the average of squared
per-time biases.
Dmat_type = "weighted-avg" targets the square of a weighted
time average.
Dmat_type = "weighted-sq" targets the weighted average of
squared per-time biases.
As in the repeated-measures concordance implementation, is the
fitted fixed-effect dispersion term induced by . It enters the
denominator only for type = "agreement".
Time-averaging and shrinkage factors.
The fitted variance components reported in the summary are
, , ,
and , stored respectively as sigma2_subject,
sigma2_subject_method, sigma2_subject_time, and sigma2_error.
The repeated-measures ICC always uses in the numerator only.
The denominator uses the same time-averaging logic already used by the
repeated-measures concordance backend, through two pair-specific factors
and .
If time is absent, the implementation sets
If time is present and the target is a single visit
(Dmat_type %in% c("typical-visit", "weighted-sq")), the implementation
leaves the time-varying terms unshrunk:
If time is present and the target is time-averaged
(Dmat_type %in% c("time-avg", "weighted-avg")), then for each observed
subject-method unit with distinct observed visits:
with equal weights,
with when residuals are iid;
with normalized visit weights ,
with when residuals are iid.
With unbalanced , the implementation averages the per-unit
values across the observations contributing to the pair and then
clamps both and to
for numerical stability.
Repeated-measures ICC.
Let denote the effective
subject-by-method variance term, equal to when
the subject-by-method random effect is included and otherwise. Let
denote the effective subject-by-time
variance term, equal to when the subject-by-time
random effect is included and otherwise.
For type = "consistency", the reported ICC is
For type = "agreement", the denominator additionally includes :
This differs from repeated-measures concordance because ICC uses only
in the numerator, whereas the concordance numerator also
includes the time-averaged subject-time term. Extra random-effect variances
from slope / slope_Z are estimated by the shared
backend but are not included in the ICC denominator.
CIs / SEs (delta method for ICC).
Let for type = "agreement" and for
type = "consistency", and define
Write with
The gradient used in the delta method is
The covariance matrix is assembled
from the same REML fit:
the
block comes from the empirical
subject-level covariance of the per-subject REML component updates;
is approximated as the
variance of the weighted mean of subject-level residual quadratic forms;
uses the fixed-effect quadratic-form
delta method already computed in the backend.
Cross-covariances across these blocks are ignored as a large-sample simplification, so
If ci_mode = "raw", a Wald interval is formed on the ICC scale,
and truncated to . If ci_mode = "logit", the backend applies
the same Wald construction after the transform
, with
and then back-transforms
If ci_mode = "auto", the backend selects between the raw-scale and
logit-scale interval per estimate, typically preferring the logit form near
the boundaries.
Choosing for AR(1).
When ar="ar1" and ar_rho = NA, is estimated by
profiling the REML log-likelihood at .
With very few visits per subject, can be weakly identified; consider
sensitivity checks over a plausible range.
A repeated-measures pairwise ICC object. Without confidence intervals the
result is a symmetric matrix of class c("icc_rm_reml", "icc", "matrix").
With confidence intervals it is a list with est, lwr.ci, and upr.ci
and class c("icc_rm_reml", "icc_ci", "icc"). Both carry the fitted
variance-component matrices as attributes.
All per-subject solves are with
, so cost scales with the number of subjects and
the fixed-effects dimension rather than the total number of observations.
Solvers use symmetric positive-definite paths with a small diagonal ridge and
pseudo-inverse fallback, which helps for very small or unbalanced subsets and
near-boundary estimates. For AR(1), observations are ordered by time
within subject; NA time codes break the run, and gaps between factor
levels are treated as regular steps.
Heteroscedastic slopes across columns are supported. Each
column has its own variance component , but
cross-covariances among columns are set to zero.
The C++ backend uses OpenMP loops while also forcing vendor BLAS libraries to
run single-threaded so that overall CPU usage stays predictable. This guard
is applied to OpenBLAS, Apple's Accelerate, and Intel MKL when their runtime
controls are available. You can opt out manually by setting
MATRIXCORR_DISABLE_BLAS_GUARD=1 in the environment before loading the
package.
Shrout PE, Fleiss JL (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428.
McGraw KO, Wong SP (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30-46.
icc(), ccc(), ccc_rm_reml(), ba(), ba_rm(), rmcorr()
set.seed(321) n_id <- 20 n_time <- 3 id <- factor(rep(seq_len(n_id), each = 2 * n_time)) method <- factor(rep(rep(c("A", "B"), each = n_time), times = n_id)) time <- factor(rep(seq_len(n_time), times = 2 * n_id)) subj <- rnorm(n_id, sd = 1)[as.integer(id)] subj_method <- rnorm(n_id * 2, sd = 0.25) sm <- subj_method[(as.integer(id) - 1L) * 2L + as.integer(method)] y <- subj + sm + 0.3 * (method == "B") + rnorm(length(id), sd = 0.35) dat_rm <- data.frame(y = y, id = id, method = method, time = time) fit_icc_rm <- icc_rm_reml( dat_rm, response = "y", subject = "id", method = "method", time = "time", type = "consistency", ci = TRUE ) print(fit_icc_rm) summary(fit_icc_rm) confint(fit_icc_rm) tidy(fit_icc_rm)set.seed(321) n_id <- 20 n_time <- 3 id <- factor(rep(seq_len(n_id), each = 2 * n_time)) method <- factor(rep(rep(c("A", "B"), each = n_time), times = n_id)) time <- factor(rep(seq_len(n_time), times = 2 * n_id)) subj <- rnorm(n_id, sd = 1)[as.integer(id)] subj_method <- rnorm(n_id * 2, sd = 0.25) sm <- subj_method[(as.integer(id) - 1L) * 2L + as.integer(method)] y <- subj + sm + 0.3 * (method == "B") + rnorm(length(id), sd = 0.35) dat_rm <- data.frame(y = y, id = id, method = method, time = time) fit_icc_rm <- icc_rm_reml( dat_rm, response = "y", subject = "id", method = "method", time = "time", type = "consistency", ci = TRUE ) print(fit_icc_rm) summary(fit_icc_rm) confint(fit_icc_rm) tidy(fit_icc_rm)
Computes pairwise Kendall's tau correlations for numeric data using a high-performance 'C++' backend. Optional confidence intervals are available for matrix and data-frame input.
kendall_tau( data, y = NULL, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, ci_method = c("fieller", "if_el", "brown_benedetti"), n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'kendall_matrix' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'kendall_matrix' plot( x, title = "Kendall's Tau correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'kendall_matrix' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'summary.kendall_matrix' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )kendall_tau( data, y = NULL, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, ci_method = c("fieller", "if_el", "brown_benedetti"), n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'kendall_matrix' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'kendall_matrix' plot( x, title = "Kendall's Tau correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'kendall_matrix' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'summary.kendall_matrix' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
For matrix/data frame mode, a numeric matrix or a data frame with at least
two numeric columns. All non-numeric columns are excluded. For two-vector
mode, a numeric vector |
y |
Optional numeric vector |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
conf_level |
Confidence level used when |
ci_method |
Confidence-interval engine used when |
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for Kendall confidence limits in the pairwise summary. |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum tau value. Default is
|
high_color |
Color for the maximum tau value. Default is
|
mid_color |
Color for zero correlation. Default is |
value_text_size |
Font size for displaying correlation values. Default
is |
ci_text_size |
Text size for confidence intervals in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
Kendall's tau is a rank-based measure of association between two variables.
For a dataset with observations on variables and ,
let be the number of unordered pairs, the
number of concordant pairs, and the number of discordant pairs.
Let and
be the numbers of tied pairs within and within , respectively,
where and are tie-group sizes in and .
The tie-robust Kendall's tau-b is:
When there are no ties (), this reduces to tau-a:
The function automatically handles ties. In degenerate cases where a
variable is constant ( or ), the tau-b
denominator is zero and the correlation is undefined (returned as NA
off the diagonal).
When na_method = "pairwise", each estimate is recomputed
on the pairwise complete-case overlap of columns and .
Confidence intervals use the observed pairwise-complete Kendall estimate and
the same pairwise complete-case overlap.
With ci_method = "fieller", the interval is built on the Fisher-style
transformed scale using Fieller's
asymptotic standard error
where is the pairwise complete-case sample size. The interval is then
mapped back with tanh() and clipped to for numerical
safety. This is the default Kendall CI and is intended to be the fast,
production-oriented choice.
With ci_method = "brown_benedetti", the interval is computed on the
Kendall tau scale using the Brown-Benedetti large-sample variance for
Kendall's tau-b. This path is tie-aware, remains on the original Kendall
scale, and is intended as a conventional asymptotic alternative when a
direct tau-scale interval is preferred.
With ci_method = "if_el", the interval is computed in 'C++' using an
influence-function empirical-likelihood construction built from the
linearised Kendall estimating equation. The lower and upper limits are found
by solving the empirical-likelihood ratio equation against the
-cutoff implied by conf_level. This method is slower
than "fieller" and is intended for specialised inference.
Performance:
In the two-vector mode (y supplied), the C++ backend uses a
raw-double path with minimal overhead.
In the matrix/data-frame mode, the no-missing estimate-only path
uses the Knight (1966) algorithm. Pairwise-complete
inference paths recompute each pair on its complete-case overlap; the
"brown_benedetti" interval adds tie-aware large-sample variance
calculations and "if_el" adds extra per-pair likelihood solving.
If y is NULL and data is a matrix/data frame: a
symmetric numeric matrix where entry (i, j) is the Kendall's tau
correlation between the i-th and j-th numeric columns. When
ci = TRUE, the object also carries a ci attribute with
elements est, lwr.ci, upr.ci, conf.level, and
ci.method. Pairwise complete-case sample sizes are stored in
attr(x, "diagnostics")$n_complete.
If y is provided (two-vector mode): a single numeric scalar,
the Kendall's tau correlation between data and y.
Invisibly returns the kendall_matrix object.
A ggplot object representing the heatmap.
Missing values are rejected when na_method = "error". Columns
with fewer than two usable observations are excluded. Confidence intervals
are not available in the two-vector interface.
Thiago de Paula Oliveira
Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81-93.
Knight, W. R. (1966). A Computer Method for Calculating Kendall's Tau with Ungrouped Data. Journal of the American Statistical Association, 61(314), 436-439.
Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for rank correlation coefficients. I. Biometrika, 44(3/4), 470-481.
Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of tests for correlation in two-way contingency tables. Journal of the American Statistical Association, 72(358), 309-315.
Huang, Z., & Qin, G. (2023). Influence function-based confidence intervals for the Kendall rank correlation coefficient. Computational Statistics, 38(2), 1041-1055.
Croux, C., & Dehon, C. (2010). Influence functions of the Spearman and Kendall correlation measures. Statistical Methods & Applications, 19, 497-515.
print.kendall_matrix, plot.kendall_matrix
# Basic usage with a matrix mat <- cbind(a = rnorm(100), b = rnorm(100), c = rnorm(100)) kt <- kendall_tau(mat) print(kt) summary(kt) plot(kt) # Confidence intervals kt_ci <- kendall_tau(mat[, 1:3], ci = TRUE) print(kt_ci, show_ci = "yes") summary(kt_ci) estimate(kt_ci) tidy(kt_ci) ci(kt_ci) confint(kt_ci) # Two-vector mode (scalar path) x <- rnorm(1000); y <- 0.5 * x + rnorm(1000) kendall_tau(x, y) # Including ties tied_df <- data.frame( v1 = rep(1:5, each = 20), v2 = rep(5:1, each = 20), v3 = rnorm(100) ) kt_tied <- kendall_tau(tied_df, ci = TRUE, ci_method = "fieller") print(kt_tied, show_ci = "yes") # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(kt) }# Basic usage with a matrix mat <- cbind(a = rnorm(100), b = rnorm(100), c = rnorm(100)) kt <- kendall_tau(mat) print(kt) summary(kt) plot(kt) # Confidence intervals kt_ci <- kendall_tau(mat[, 1:3], ci = TRUE) print(kt_ci, show_ci = "yes") summary(kt_ci) estimate(kt_ci) tidy(kt_ci) ci(kt_ci) confint(kt_ci) # Two-vector mode (scalar path) x <- rnorm(1000); y <- 0.5 * x + rnorm(1000) kendall_tau(x, y) # Including ties tied_df <- data.frame( v1 = rep(1:5, each = 20), v2 = rep(5:1, each = 20), v3 = rnorm(100) ) kt_tied <- kendall_tau(tied_df, ci = TRUE, ci_method = "fieller") print(kt_tied, show_ci = "yes") # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(kt) }
Estimates Krippendorff's alpha as a panel-level reliability/agreement
coefficient among two or more coders, raters, observers, judges, or
instruments. Missing ratings are supported through na_method, and
nominal, ordinal, interval, and ratio disagreement functions are available.
krippendorff_alpha( data, levels = NULL, input = c("ratings", "counts"), level = c("nominal", "ordinal", "interval", "ratio"), method = c("customary", "analytical"), na_method = c("error", "complete", "available"), min_raters = 2L, ci = FALSE, p_value = FALSE, conf_level = 0.95, se_method = c("auto", "bootstrap", "jackknife", "none"), n_boot = 1000L, seed = NULL, n_threads = getOption("matrixCorr.threads", 1L), return_matrices = FALSE, verbose = FALSE, ... ) ## S3 method for class 'krippendorff_alpha' print(x, digits = 4, ...) ## S3 method for class 'krippendorff_alpha' summary(object, digits = 4, ...) ## S3 method for class 'summary.krippendorff_alpha' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'krippendorff_alpha' plot( x, type = c("estimate", "item_disagreement", "category_proportion", "coincidence"), ... )krippendorff_alpha( data, levels = NULL, input = c("ratings", "counts"), level = c("nominal", "ordinal", "interval", "ratio"), method = c("customary", "analytical"), na_method = c("error", "complete", "available"), min_raters = 2L, ci = FALSE, p_value = FALSE, conf_level = 0.95, se_method = c("auto", "bootstrap", "jackknife", "none"), n_boot = 1000L, seed = NULL, n_threads = getOption("matrixCorr.threads", 1L), return_matrices = FALSE, verbose = FALSE, ... ) ## S3 method for class 'krippendorff_alpha' print(x, digits = 4, ...) ## S3 method for class 'krippendorff_alpha' summary(object, digits = 4, ...) ## S3 method for class 'summary.krippendorff_alpha' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'krippendorff_alpha' plot( x, type = c("estimate", "item_disagreement", "category_proportion", "coincidence"), ... )
data |
Input data.
For |
levels |
Optional category labels in analysis order. For ordinal
character data, explicit |
input |
One of |
level |
Measurement level: |
method |
Estimator. |
na_method |
Missing-data rule. |
min_raters |
Minimum number of observed ratings required for a retained
item. Must be at least |
ci |
Logical; if |
p_value |
Logical; if |
conf_level |
Confidence level for intervals. Default is |
se_method |
Standard-error method. |
n_boot |
Number of bootstrap resamples used for customary alpha when
|
seed |
Optional positive integer seed used for reproducible bootstrap resampling. |
n_threads |
Integer |
return_matrices |
Logical; if |
verbose |
Logical; if |
... |
Unused. |
x |
A |
digits |
Integer; number of decimal places for rounded numeric columns. |
object |
A |
n |
Optional preview row threshold. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns. |
width |
Optional display width. |
show_ci |
One of |
type |
Plot type: |
Krippendorff's alpha is a panel-level reliability coefficient, not a pairwise correlation matrix. The customary coincidence-matrix estimator is
where is the observed disagreement and is the expected
disagreement under chance assignment.
For retained item with observed ratings and category
counts , the observed coincidence matrix updates by
Let and . Then
where is the level-specific disagreement matrix.
The disagreement functions implemented here are:
nominal:
ordinal: Krippendorff's cumulative-margin disagreement based on the pooled category margins
interval:
ratio:
, with
when both values are zero
Analytical estimator and inference.
Hughes (2024) recommends a different point estimator based on within-item
and pooled pairwise disagreement. Let be the number of retained
items, the number of ratings in item , and
. Define
The within-item mean square is
the pooled total disagreement is
where are the pooled observed ratings, and
With and , the analytical
alpha estimate is
When analytical inference is requested, the implementation uses the
delete-one-item jackknife on . If is the
leave-one-item-out transform and is the number of retained items,
the jackknife pseudo-values are
with standard error
The confidence interval is formed on the -scale with a
critical value and back-transformed to alpha using the full
data . The analytical p-value tests ,
equivalently , via the t statistic
For customary alpha, percentile bootstrap intervals are available by resampling retained items with replacement and recomputing the customary estimate.
A one-row data frame with class
c("krippendorff_alpha", "agreement_result", "data.frame"). The
object stores method metadata, diagnostics, and optional matrices in
attributes.
Thiago de Paula Oliveira
Krippendorff, K. (2011/2013). Computing Krippendorff's alpha-reliability.
Hayes, A. F. and Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77-89.
Hughes, J. (2024). Toward improved inference for Krippendorff's Alpha agreement coefficient. Journal of Statistical Planning and Inference, 233, 106170.
multirater_kappa() for nominal multi-rater kappa;
gwet_ac() for AC1/AC2 agreement coefficients.
raters <- data.frame( r1 = c("A", "A", "B", "B", "C", "A"), r2 = c("A", "B", "B", "B", "C", "A"), r3 = c("A", "A", "B", "C", "C", "A"), stringsAsFactors = FALSE ) fit <- krippendorff_alpha(raters, level = "nominal", na_method = "available") print(fit) summary(fit) estimate(fit) tidy(fit) plot(fit)raters <- data.frame( r1 = c("A", "A", "B", "B", "C", "A"), r2 = c("A", "B", "B", "B", "C", "A"), r3 = c("A", "A", "B", "C", "C", "A"), stringsAsFactors = FALSE ) fit <- krippendorff_alpha(raters, level = "nominal", na_method = "available") print(fit) summary(fit) estimate(fit) tidy(fit) plot(fit)
Estimates panel-level chance-corrected agreement among multiple raters assigning items to nominal categories.
multirater_kappa( data, levels = NULL, input = c("ratings", "counts"), method = c("fleiss", "randolph"), exact = FALSE, na_method = c("error", "complete", "available"), min_raters = 2L, by_category = FALSE, ci = FALSE, p_value = FALSE, conf_level = 0.95, se_method = c("asymptotic", "jackknife", "none"), n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE, ... ) ## S3 method for class 'multirater_kappa' print(x, digits = 4, show_by_category = FALSE, ...) ## S3 method for class 'multirater_kappa' summary(object, digits = 4, ...) ## S3 method for class 'summary.multirater_kappa' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'multirater_kappa' plot( x, type = c("agreement_map", "estimate", "item_agreement", "category_proportion", "by_category"), bins = 30L, ... )multirater_kappa( data, levels = NULL, input = c("ratings", "counts"), method = c("fleiss", "randolph"), exact = FALSE, na_method = c("error", "complete", "available"), min_raters = 2L, by_category = FALSE, ci = FALSE, p_value = FALSE, conf_level = 0.95, se_method = c("asymptotic", "jackknife", "none"), n_threads = getOption("matrixCorr.threads", 1L), verbose = FALSE, ... ) ## S3 method for class 'multirater_kappa' print(x, digits = 4, show_by_category = FALSE, ...) ## S3 method for class 'multirater_kappa' summary(object, digits = 4, ...) ## S3 method for class 'summary.multirater_kappa' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'multirater_kappa' plot( x, type = c("agreement_map", "estimate", "item_agreement", "category_proportion", "by_category"), bins = 30L, ... )
data |
Input ratings or counts data.
For |
levels |
Optional explicit category labels. For nominal multi-rater
kappa, category order is not used in the estimator itself, but explicit
levels are useful when unobserved categories should be retained in the
output. If factor columns are mixed with non-factor columns, |
input |
One of |
method |
One of |
exact |
Logical; if |
na_method |
Missing-data rule for |
min_raters |
Minimum number of observed ratings required for an item to
be retained when |
by_category |
Logical; if |
ci |
Logical; if |
p_value |
Logical; if |
conf_level |
Confidence level for intervals. Default is |
se_method |
Standard-error method. |
n_threads |
Integer |
verbose |
Logical; if |
... |
Unused. |
x |
A |
digits |
Integer; number of decimal places for displayed values. |
show_by_category |
Logical; whether to print the attached category-wise kappa table when available. |
object |
A |
n |
Optional preview row threshold. |
topn |
Optional number of leading/trailing rows when truncated. |
max_vars |
Optional maximum number of visible columns. |
width |
Optional display width. |
show_ci |
One of |
type |
Plot type: |
bins |
Integer number of bins retained for compatibility; currently unused by the default item-agreement profile plot. |
multirater_kappa() returns one panel-level agreement coefficient, not a
pairwise matrix. Fleiss' kappa is the default fixed-marginal estimator:
where is the mean item-level agreement and
uses the pooled marginal category proportions.
Randolph's free-marginal alternative replaces the expected agreement with
. This function is for nominal categories and does not use
category ordering. Use weighted_kappa() for two-rater ordered-category
agreement and cohen_kappa() for two-rater nominal agreement.
When exact = TRUE, the exact Fleiss fixed-marginal estimate requires
the original item-by-rater rating matrix. In matrixCorr, closed-form
asymptotic inference is only available for the standard non-exact Fleiss
estimator with a common number of raters per item. In other settings, use
se_method = "jackknife" when inferential quantities are needed.
With input = "ratings" and na_method = "available", the
implementation supports unbalanced item-specific numbers of raters as a
generalisation beyond the strict equal-rater Fleiss setting. The returned
diagnostics record the observed per-item rater counts.
Confidence intervals and standard errors. Two inference paths are implemented.
If se_method = "jackknife", the method computes leave-one-item-out
estimates over the
retained items and forms
The standard error is
and the CI is
truncated to .
If se_method = "asymptotic", the closed-form variance is available
only for the standard non-exact Fleiss estimator
with a common number of raters per item. Let be the number of items,
the common number of raters, the pooled category
proportions, and define
Then the variance estimator is
The standard error and CI are then obtained from the same Wald form
above and truncated to . When asymptotic inference is
unavailable and inferential output is requested with the default setting,
multirater_kappa() automatically falls back to the jackknife path.
A one-row data frame with class
c("multirater_kappa", "agreement_result", "data.frame"). The object
carries estimator metadata and diagnostics in attributes, and may also
attach category-wise binary-collapsed results in
attr(x, "by_category").
Thiago de Paula Oliveira
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378-382. doi:10.1037/h0031619
Randolph, J. J. (2005). Free-Marginal Multirater Kappa: An Alternative to Fleiss' Fixed-Marginal Multirater Kappa.
cohen_kappa() for unweighted two-rater nominal agreement;
weighted_kappa() for two-rater ordered-category agreement.
raters <- data.frame( r1 = c("A", "A", "B", "C", "A", "B"), r2 = c("A", "B", "B", "C", "A", "B"), r3 = c("A", "A", "B", "B", "A", "C"), stringsAsFactors = FALSE ) fit <- multirater_kappa(raters) print(fit) summary(fit) estimate(fit) tidy(fit) # The default plot is an item-by-category agreement map. # Rows are items, ordered from stronger to weaker item-level agreement. # Columns are categories. Each tile shows how many raters assigned that # category to that item, and darker fill means a larger share of raters # chose that category for that item. plot(fit)raters <- data.frame( r1 = c("A", "A", "B", "C", "A", "B"), r2 = c("A", "B", "B", "C", "A", "B"), r3 = c("A", "A", "B", "B", "A", "C"), stringsAsFactors = FALSE ) fit <- multirater_kappa(raters) print(fit) summary(fit) estimate(fit) tidy(fit) # The default plot is an item-by-category agreement map. # Rows are items, ordered from stronger to weaker item-level agreement. # Columns are categories. Each tile shows how many raters assigned that # category to that item, and darker fill means a larger share of raters # chose that category for that item. plot(fit)
Computes all pairwise percentage bend correlations for the numeric columns of
a matrix or data frame. Percentage bend correlation limits the influence of
extreme marginal observations by bending standardised deviations into the
interval , yielding a Pearson-like measure that is robust to
outliers and heavy tails.
pbcor( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), beta = 0.2, n_boot = 500L, seed = NULL, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'pbcor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'pbcor' plot( x, title = "Percentage bend correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'pbcor' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.pbcor' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )pbcor( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), beta = 0.2, n_boot = 500L, seed = NULL, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'pbcor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'pbcor' plot( x, title = "Percentage bend correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'pbcor' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.pbcor' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric matrix or data frame containing numeric columns. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
p_value |
Logical (default |
conf_level |
Confidence level used when |
n_threads |
Integer |
beta |
Bending constant in |
n_boot |
Integer |
seed |
Optional positive integer used to seed the bootstrap resampling
when |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
x |
An object of class |
digits |
Integer; number of digits to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional arguments passed to the underlying print or plot helper. |
title |
Character; plot title. |
low_color, high_color, mid_color
|
Colors used in the heatmap. |
value_text_size |
Numeric text size for overlaid cell values. |
show_value |
Logical; if |
object |
An object of class |
ci_digits |
Integer; digits used for confidence limits in pairwise summaries. |
p_digits |
Integer; digits used for p-values in pairwise summaries. |
Let be a numeric matrix with rows as
observations and columns as variables. For a column
, let and define
as the -th order
statistic of . Larger values of beta reduce
, so more observations are bent to the bounds
and .
The one-step percentage-bend location is
where and
. The bent scores are
and likewise for a second column . The percentage bend
correlation for the pair is
In the complete-data path, the bent score vectors are computed once per
column and collected into a matrix ,
after which the correlation matrix is formed from their cross-products:
If a column yields an undefined bent-score denominator, the corresponding row
and column are returned as NA. With na_method = "pairwise",
each pair is recomputed on its complete-case overlap. As with pairwise
Pearson correlation, this pairwise path can break positive semidefiniteness.
When p_value = TRUE, the method-specific test statistic for a pairwise
estimate based on complete observations is
and the reported p-value is the two-sided Student- tail probability
with degrees of freedom. When ci = TRUE, the interval
is a percentile bootstrap interval based on
resamples drawn from the pairwise complete cases. If
denotes the sorted
bootstrap sample of finite estimates with retained resamples, the
reported limits are
where and
for
. Resamples that yield undefined
estimates are discarded before the percentile limits are formed.
Computational complexity. In the complete-data path, forming the
bent scores requires sorting within each column and the cross-product step
costs with output storage. When
ci = TRUE, the bootstrap cost is incurred separately for each column
pair.
A symmetric correlation matrix with class pbcor and
attributes method = "percentage_bend_correlation",
description, and package = "matrixCorr". When
ci = TRUE, the returned object also carries a ci attribute
with elements est, lwr.ci, upr.ci,
conf.level, and ci.method, plus
attr(x, "conf.level"). When p_value = TRUE, it also carries
an inference attribute with elements estimate,
statistic, parameter, p_value, n_obs, and
alternative. When either inferential option is requested, the
object also carries diagnostics$n_complete.
Thiago de Paula Oliveira
Wilcox, R. R. (1994). The percentage bend correlation coefficient. Psychometrika, 59(4), 601-616. doi:10.1007/BF02294395
wincor(), skipped_corr(), bicor()
set.seed(10) X <- matrix(rnorm(150 * 4), ncol = 4) X[sample(length(X), 8)] <- X[sample(length(X), 8)] + 10 R <- pbcor(X) print(R, digits = 2) summary(R) estimate(R) tidy(R) plot(R) ## Bootstrap confidence intervals R_ci <- pbcor(X, ci = TRUE, n_boot = 49, seed = 10) ci(R_ci) confint(R_ci) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(R) }set.seed(10) X <- matrix(rnorm(150 * 4), ncol = 4) X[sample(length(X), 8)] <- X[sample(length(X), 8)] + 10 R <- pbcor(X) print(R, digits = 2) summary(R) estimate(R) tidy(R) plot(R) ## Bootstrap confidence intervals R_ci <- pbcor(X, ci = TRUE, n_boot = 49, seed = 10) ci(R_ci) confint(R_ci) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(R) }
Computes Gaussian partial correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Covariance estimation is available via the classical sample estimator, ridge regularisation, OAS shrinkage, or graphical lasso. Optional p-values and Fisher-z confidence intervals are available for the classical sample estimator in the ordinary low-dimensional setting.
pcorr( data, method = c("sample", "oas", "ridge", "glasso"), na_method = c("error", "complete"), ci = FALSE, conf_level = 0.95, return_cov_precision = FALSE, return_details = FALSE, return_p_value = FALSE, lambda = 0.001, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'partial_corr' print( x, digits = 3, show_method = TRUE, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'partial_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.partial_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'partial_corr' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, mask_diag = TRUE, reorder = FALSE, ... )pcorr( data, method = c("sample", "oas", "ridge", "glasso"), na_method = c("error", "complete"), ci = FALSE, conf_level = 0.95, return_cov_precision = FALSE, return_details = FALSE, return_p_value = FALSE, lambda = 0.001, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'partial_corr' print( x, digits = 3, show_method = TRUE, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'partial_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.partial_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'partial_corr' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, mask_diag = TRUE, reorder = FALSE, ... )
data |
A numeric matrix or data frame with at least two numeric columns. Non-numeric columns are ignored. |
method |
Character; one of |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
conf_level |
Confidence level used when |
return_cov_precision |
Logical; if |
return_details |
Logical; if |
return_p_value |
Logical; if |
lambda |
Numeric |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
x |
An object of class |
digits |
Integer; number of decimal places for display (default 3). |
show_method |
Logical; print a one-line header with |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional arguments passed to |
object |
An object of class |
title |
Plot title. By default, constructed from the estimator in
|
low_color |
Colour for low (negative) values. Default
|
high_color |
Colour for high (positive) values. Default
|
mid_color |
Colour for zero. Default |
value_text_size |
Font size for cell labels. Default |
show_value |
Logical; if |
mask_diag |
Logical; if |
reorder |
Logical; if |
Statistical overview. Given an data matrix
(rows are observations, columns are variables), the routine estimates a
partial correlation matrix via the precision (inverse covariance)
matrix. Let be the vector of column means and
be the centred cross-product matrix computed without forming a centred copy
of . Two conventional covariance scalings are formed:
Sample: .
Ridge:
with user-supplied (diagonal inflation).
OAS (Oracle Approximating Shrinkage):
shrink towards a scaled identity
target , where .
The data-driven weight is
and
Graphical lasso: estimate a sparse precision matrix
by maximising
with . The returned covariance matrix is
.
The method then ensures positive definiteness of (adding a very
small diagonal jitter only if necessary) and computes the precision
matrix . Partial correlations are obtained by
standardising the off-diagonals of :
If return_p_value = TRUE, the function also reports the classical
two-sided test p-values for the sample partial correlations, using
with degrees of freedom. These p-values are returned only for
method = "sample", where they match the standard full-model partial
correlation test.
When ci = TRUE, the function reports Fisher- confidence
intervals for the sample partial correlations. For a partial correlation
conditioning on variables, the transformed
statistic is with standard
error
where is the effective complete-case sample size used for the
estimate. The two-sided normal-theory interval is formed on the transformed
scale using conf_level and then mapped back with tanh(). In
the full matrix path implemented here, each off-diagonal entry conditions on
all remaining variables, so and the classical CI requires
. This inference is only supported for
method = "sample" without positive-definiteness repair; in
unsupported or numerically singular settings, CI bounds are returned as
NA with an informative cli warning or the request is rejected.
Interpretation. For Gaussian data, equals
the correlation between residuals from regressing variable and
variable on all the remaining variables; equivalently, it encodes
conditional dependence in a Gaussian graphical model, where
if variables and are
conditionally independent given the others. Partial correlations are
invariant to separate rescalings of each
variable; in particular, multiplying by any positive scalar
leaves the partial correlations unchanged.
Why shrinkage/regularisation? When , the sample
covariance is singular and inversion is ill-posed. Ridge and OAS both yield
well-conditioned . Ridge adds a fixed on the
diagonal, whereas OAS shrinks adaptively towards with a
weight chosen to minimise (approximately) the Frobenius risk under a
Gaussian model, often improving mean-square accuracy in high dimension.
Why glasso? Glasso is useful when the goal is not just to stabilise a covariance estimate, but to recover a manageable network of direct relationships rather than a dense matrix of overall associations. In Gaussian models, zeros in the precision matrix correspond to conditional independences, so glasso can suppress indirect associations that are explained by the other variables and return a smaller, more interpretable conditional-dependence graph. This is especially practical in high-dimensional settings, where the sample covariance may be unstable or singular. Glasso yields a positive-definite precision estimate and supports edge selection, graph recovery, and downstream network analysis.
Computational notes. The implementation forms using 'BLAS'
syrk when available and constructs partial correlations by traversing
only the upper triangle with 'OpenMP' parallelism. Positive definiteness is
verified via a Cholesky factorisation; if it fails, a tiny diagonal jitter is
increased geometrically up to a small cap, at which point the routine
signals an error.
With the default return_details = FALSE, a standard
matrixCorr correlation result: a dense corr_matrix, sparse matrix,
or corr_edge_list, depending on output. With
return_details = TRUE, an object of class "partial_corr" (a
list) with elements:
pcor: partial correlation matrix.
cov (if requested): covariance matrix used.
precision (if requested): precision matrix .
p_value (if requested): matrix of two-sided p-values for
the sample partial correlations.
ci (if requested): a list with elements est,
lwr.ci, upr.ci, conf.level, and ci.method.
diagnostics: metadata used for inference, including the
effective complete-case sample size and number of conditioned variables.
method: the estimator used ("oas", "ridge",
"sample", or "glasso").
lambda: ridge or graphical-lasso penalty
(or NA_real_).
rho: OAS shrinkage weight in (or NA_real_).
jitter: diagonal jitter added (if any) to ensure positive
definiteness.
Invisibly returns x.
A compact summary object of class summary.partial_corr.
A ggplot object.
Chen, Y., Wiesel, A., & Hero, A. O. III (2011). Robust Shrinkage Estimation of High-dimensional Covariance Matrices. IEEE Transactions on Signal Processing.
Friedman, J., Hastie, T., & Tibshirani, R. (2007). Sparse inverse covariance estimation with the graphical lasso. Biostatistics.
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365-411.
Schafer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1), Article 32.
## Structured MVN with known partial correlations set.seed(42) p <- 12; n <- 1000 ## Build a tri-diagonal precision (Omega) so the true partial correlations ## are sparse phi <- 0.35 Omega <- diag(p) for (j in 1:(p - 1)) { Omega[j, j + 1] <- Omega[j + 1, j] <- -phi } ## Strict diagonal dominance diag(Omega) <- 1 + 2 * abs(phi) + 0.05 Sigma <- solve(Omega) ## Upper Cholesky L <- chol(Sigma) Z <- matrix(rnorm(n * p), n, p) X <- Z %*% L colnames(X) <- sprintf("V%02d", seq_len(p)) pc <- pcorr(X) summary(pc) estimate(pc) tidy(pc) ## Fisher-z confidence intervals for sample partial correlations pc_ci <- pcorr(X[, 1:5], ci = TRUE) ci(pc_ci) confint(pc_ci) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(pc) } ## True partial correlation from Omega pcor_true <- -Omega / sqrt(diag(Omega) %o% diag(Omega)) diag(pcor_true) <- 1 ## Quick visual check (first 5x5 block) round(estimate(pc)[1:5, 1:5], 2) round(pcor_true[1:5, 1:5], 2) ## Plot method plot(pc) ## Graphical-lasso example set.seed(100) p <- 20; n <- 250 Theta_g <- diag(p) Theta_g[cbind(1:5, 2:6)] <- -0.25 Theta_g[cbind(2:6, 1:5)] <- -0.25 Theta_g[cbind(8:11, 9:12)] <- -0.20 Theta_g[cbind(9:12, 8:11)] <- -0.20 diag(Theta_g) <- rowSums(abs(Theta_g)) + 0.2 Sigma_g <- solve(Theta_g) X_g <- matrix(rnorm(n * p), n, p) %*% chol(Sigma_g) colnames(X_g) <- paste0("Node", seq_len(p)) gfit_1 <- pcorr(X_g, method = "glasso", lambda = 0.02, return_cov_precision = TRUE, return_details = TRUE) gfit_2 <- pcorr(X_g, method = "glasso", lambda = 0.08, return_cov_precision = TRUE, return_details = TRUE) ## Larger lambda gives a sparser conditional-dependence graph edge_count <- function(M, tol = 1e-8) { sum(abs(M[upper.tri(M, diag = FALSE)]) > tol) } c(edges_lambda_002 = edge_count(gfit_1$precision), edges_lambda_008 = edge_count(gfit_2$precision)) ## Inspect strongest estimated conditional associations pcor_g <- gfit_1$pcor idx <- which(upper.tri(pcor_g), arr.ind = TRUE) ord <- order(abs(pcor_g[idx]), decreasing = TRUE) head(data.frame( i = rownames(pcor_g)[idx[ord, 1]], j = colnames(pcor_g)[idx[ord, 2]], pcor = round(pcor_g[idx][ord], 2) )) ## High-dimensional case p >> n set.seed(7) n <- 60; p <- 120 ar_block <- function(m, rho = 0.6) rho^abs(outer(seq_len(m), seq_len(m), "-")) ## Two AR(1) blocks on the diagonal if (requireNamespace("Matrix", quietly = TRUE)) { Sigma_hd <- as.matrix(Matrix::bdiag(ar_block(60, 0.6), ar_block(60, 0.6))) } else { Sigma_hd <- rbind( cbind(ar_block(60, 0.6), matrix(0, 60, 60)), cbind(matrix(0, 60, 60), ar_block(60, 0.6)) ) } L <- chol(Sigma_hd) X_hd <- matrix(rnorm(n * p), n, p) %*% L colnames(X_hd) <- paste0("G", seq_len(p)) pc_oas <- pcorr(X_hd, method = "oas", return_cov_precision = TRUE, return_details = TRUE) pc_ridge <- pcorr(X_hd, method = "ridge", lambda = 1e-2, return_cov_precision = TRUE, return_details = TRUE) pc_samp <- pcorr(X_hd, method = "sample", return_cov_precision = TRUE, return_details = TRUE) pc_glasso <- pcorr(X_hd, method = "glasso", lambda = 5e-3, return_cov_precision = TRUE, return_details = TRUE) ## Show how much diagonal regularisation was used c(oas_jitter = pc_oas$jitter, ridge_lambda = pc_ridge$lambda, sample_jitter = pc_samp$jitter, glasso_lambda = pc_glasso$lambda) ## Compare conditioning of the estimated covariance matrices c(kappa_oas = kappa(pc_oas$cov), kappa_ridge = kappa(pc_ridge$cov), kappa_sample = kappa(pc_samp$cov)) ## Simple conditional-dependence graph from partial correlations pcor <- pc_oas$pcor vals <- abs(pcor[upper.tri(pcor, diag = FALSE)]) thresh <- quantile(vals, 0.98) # top 2% edges <- which(abs(pcor) > thresh & upper.tri(pcor), arr.ind = TRUE) head(data.frame(i = colnames(pcor)[edges[,1]], j = colnames(pcor)[edges[,2]], pcor = round(pcor[edges], 2)))## Structured MVN with known partial correlations set.seed(42) p <- 12; n <- 1000 ## Build a tri-diagonal precision (Omega) so the true partial correlations ## are sparse phi <- 0.35 Omega <- diag(p) for (j in 1:(p - 1)) { Omega[j, j + 1] <- Omega[j + 1, j] <- -phi } ## Strict diagonal dominance diag(Omega) <- 1 + 2 * abs(phi) + 0.05 Sigma <- solve(Omega) ## Upper Cholesky L <- chol(Sigma) Z <- matrix(rnorm(n * p), n, p) X <- Z %*% L colnames(X) <- sprintf("V%02d", seq_len(p)) pc <- pcorr(X) summary(pc) estimate(pc) tidy(pc) ## Fisher-z confidence intervals for sample partial correlations pc_ci <- pcorr(X[, 1:5], ci = TRUE) ci(pc_ci) confint(pc_ci) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(pc) } ## True partial correlation from Omega pcor_true <- -Omega / sqrt(diag(Omega) %o% diag(Omega)) diag(pcor_true) <- 1 ## Quick visual check (first 5x5 block) round(estimate(pc)[1:5, 1:5], 2) round(pcor_true[1:5, 1:5], 2) ## Plot method plot(pc) ## Graphical-lasso example set.seed(100) p <- 20; n <- 250 Theta_g <- diag(p) Theta_g[cbind(1:5, 2:6)] <- -0.25 Theta_g[cbind(2:6, 1:5)] <- -0.25 Theta_g[cbind(8:11, 9:12)] <- -0.20 Theta_g[cbind(9:12, 8:11)] <- -0.20 diag(Theta_g) <- rowSums(abs(Theta_g)) + 0.2 Sigma_g <- solve(Theta_g) X_g <- matrix(rnorm(n * p), n, p) %*% chol(Sigma_g) colnames(X_g) <- paste0("Node", seq_len(p)) gfit_1 <- pcorr(X_g, method = "glasso", lambda = 0.02, return_cov_precision = TRUE, return_details = TRUE) gfit_2 <- pcorr(X_g, method = "glasso", lambda = 0.08, return_cov_precision = TRUE, return_details = TRUE) ## Larger lambda gives a sparser conditional-dependence graph edge_count <- function(M, tol = 1e-8) { sum(abs(M[upper.tri(M, diag = FALSE)]) > tol) } c(edges_lambda_002 = edge_count(gfit_1$precision), edges_lambda_008 = edge_count(gfit_2$precision)) ## Inspect strongest estimated conditional associations pcor_g <- gfit_1$pcor idx <- which(upper.tri(pcor_g), arr.ind = TRUE) ord <- order(abs(pcor_g[idx]), decreasing = TRUE) head(data.frame( i = rownames(pcor_g)[idx[ord, 1]], j = colnames(pcor_g)[idx[ord, 2]], pcor = round(pcor_g[idx][ord], 2) )) ## High-dimensional case p >> n set.seed(7) n <- 60; p <- 120 ar_block <- function(m, rho = 0.6) rho^abs(outer(seq_len(m), seq_len(m), "-")) ## Two AR(1) blocks on the diagonal if (requireNamespace("Matrix", quietly = TRUE)) { Sigma_hd <- as.matrix(Matrix::bdiag(ar_block(60, 0.6), ar_block(60, 0.6))) } else { Sigma_hd <- rbind( cbind(ar_block(60, 0.6), matrix(0, 60, 60)), cbind(matrix(0, 60, 60), ar_block(60, 0.6)) ) } L <- chol(Sigma_hd) X_hd <- matrix(rnorm(n * p), n, p) %*% L colnames(X_hd) <- paste0("G", seq_len(p)) pc_oas <- pcorr(X_hd, method = "oas", return_cov_precision = TRUE, return_details = TRUE) pc_ridge <- pcorr(X_hd, method = "ridge", lambda = 1e-2, return_cov_precision = TRUE, return_details = TRUE) pc_samp <- pcorr(X_hd, method = "sample", return_cov_precision = TRUE, return_details = TRUE) pc_glasso <- pcorr(X_hd, method = "glasso", lambda = 5e-3, return_cov_precision = TRUE, return_details = TRUE) ## Show how much diagonal regularisation was used c(oas_jitter = pc_oas$jitter, ridge_lambda = pc_ridge$lambda, sample_jitter = pc_samp$jitter, glasso_lambda = pc_glasso$lambda) ## Compare conditioning of the estimated covariance matrices c(kappa_oas = kappa(pc_oas$cov), kappa_ridge = kappa(pc_ridge$cov), kappa_sample = kappa(pc_samp$cov)) ## Simple conditional-dependence graph from partial correlations pcor <- pc_oas$pcor vals <- abs(pcor[upper.tri(pcor, diag = FALSE)]) thresh <- quantile(vals, 0.98) # top 2% edges <- which(abs(pcor) > thresh & upper.tri(pcor), arr.ind = TRUE) head(data.frame(i = colnames(pcor)[edges[,1]], j = colnames(pcor)[edges[,2]], pcor = round(pcor[edges], 2)))
Computes pairwise Pearson correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Optional Fisher-z confidence intervals are available.
pearson_corr( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'pearson_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'pearson_corr' plot( x, title = "Pearson correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'pearson_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'summary.pearson_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )pearson_corr( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'pearson_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'pearson_corr' plot( x, title = "Pearson correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'pearson_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'summary.pearson_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Each column must have at least two non-missing values. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
conf_level |
Confidence level used when |
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print in the concordance |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for Pearson confidence limits in the pairwise summary. |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum correlation. Default is
|
high_color |
Color for the maximum correlation. Default is
|
mid_color |
Color for zero correlation. Default is |
value_text_size |
Font size for displaying correlation values. Default
is |
ci_text_size |
Text size for confidence intervals in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
Let be a
numeric matrix with rows as observations and columns as variables, and let
denote the all-ones vector. Define the column
means and the centred cross-product
matrix
The (unbiased) sample covariance is
and the sample standard deviations are .
The Pearson correlation matrix is obtained by standardising , and it is given by
equivalently, entrywise for
and . With scaling,
is unbiased for the covariance; the induced
correlations are biased in finite samples.
The implementation forms via a BLAS
symmetric rank- update (SYRK) on the upper triangle, then applies the
rank-1 correction to obtain without
explicitly materialising . After scaling by
, triangular normalisation by yields ,
which is then symmetrised to remove round-off asymmetry. Tiny negative values
on the covariance diagonal due to floating-point rounding are truncated to
zero before taking square roots.
If a variable has zero variance (), the corresponding row and
column of are set to NA. When
na_method = "pairwise", each correlation is recomputed on
the pairwise complete-case overlap of columns and .
When ci = TRUE, Fisher- confidence intervals are computed from
the observed pairwise Pearson correlation and the pairwise
complete-case sample size :
With , the confidence limits are
Confidence intervals are reported only when .
Computational complexity. The dominant cost is flops
with memory.
A symmetric numeric matrix where the (i, j)-th element is
the Pearson correlation between the i-th and j-th
numeric columns of the input. When ci = TRUE, the object also
carries a ci attribute with elements est, lwr.ci,
upr.ci, and conf.level. When pairwise-complete evaluation is
used, pairwise sample sizes are stored in attr(x, "diagnostics")$n_complete.
Invisibly returns the pearson_corr object.
A ggplot object representing the heatmap.
na_method = "complete" is useful when a common analysis sample is
required across all matrix entries. For covariance- or cross-product-based
correlations, it also avoids the non-positive-semidefinite matrices that can
arise from pairwise deletion.
Thiago de Paula Oliveira
Pearson, K. (1895). "Notes on regression and inheritance in the case of two parents". Proceedings of the Royal Society of London, 58, 240-242.
print.pearson_corr, plot.pearson_corr
## MVN with AR(1) correlation set.seed(123) p <- 6; n <- 300; rho <- 0.5 # true correlation Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-")) L <- chol(Sigma) # MVN(n, 0, Sigma) X <- matrix(rnorm(n * p), n, p) %*% L colnames(X) <- paste0("V", seq_len(p)) pr <- pearson_corr(X) print(pr, digits = 2) summary(pr) estimate(pr) tidy(pr) plot(pr) ## Confidence intervals pr_ci <- pearson_corr(X[, 1:3], ci = TRUE) ci(pr_ci) confint(pr_ci) ## Compare the sample estimate to the truth Rhat <- cor(X) # estimated round(Rhat[1:4, 1:4], 2) # true round(Sigma[1:4, 1:4], 2) off <- upper.tri(Sigma, diag = FALSE) # MAE on off-diagonals mean(abs(Rhat[off] - Sigma[off])) ## Larger n reduces sampling error n2 <- 2000 X2 <- matrix(rnorm(n2 * p), n2, p) %*% L Rhat2 <- cor(X2) off <- upper.tri(Sigma, diag = FALSE) ## mean absolute error (MAE) of the off-diagonal correlations mean(abs(Rhat2[off] - Sigma[off])) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(pr) }## MVN with AR(1) correlation set.seed(123) p <- 6; n <- 300; rho <- 0.5 # true correlation Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-")) L <- chol(Sigma) # MVN(n, 0, Sigma) X <- matrix(rnorm(n * p), n, p) %*% L colnames(X) <- paste0("V", seq_len(p)) pr <- pearson_corr(X) print(pr, digits = 2) summary(pr) estimate(pr) tidy(pr) plot(pr) ## Confidence intervals pr_ci <- pearson_corr(X[, 1:3], ci = TRUE) ci(pr_ci) confint(pr_ci) ## Compare the sample estimate to the truth Rhat <- cor(X) # estimated round(Rhat[1:4, 1:4], 2) # true round(Sigma[1:4, 1:4], 2) off <- upper.tri(Sigma, diag = FALSE) # MAE on off-diagonals mean(abs(Rhat[off] - Sigma[off])) ## Larger n reduces sampling error n2 <- 2000 X2 <- matrix(rnorm(n2 * p), n2, p) %*% L Rhat2 <- cor(X2) off <- upper.tri(Sigma, diag = FALSE) ## mean absolute error (MAE) of the off-diagonal correlations mean(abs(Rhat2[off] - Sigma[off])) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(pr) }
S3 Plot for Edge-List Correlation Results
## S3 method for class 'corr_edge_list' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )## S3 method for class 'corr_edge_list' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )
x |
An edge-list correlation result. |
title |
Optional plot title. |
low_color |
Fill color for -1. |
high_color |
Fill color for +1. |
mid_color |
Fill color for 0. |
value_text_size |
Text size for optional overlaid values. |
ci_text_size |
Text size for optional confidence-interval labels. |
show_value |
Logical; overlay values if |
... |
Additional theme arguments. |
S3 Plot for Dense Correlation Results
## S3 method for class 'corr_matrix' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )## S3 method for class 'corr_matrix' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )
x |
A dense correlation result ( |
title |
Optional plot title. |
low_color |
Fill color for -1. |
high_color |
Fill color for +1. |
mid_color |
Fill color for 0. |
value_text_size |
Text size for optional overlaid values. |
ci_text_size |
Text size for optional confidence-interval labels. |
show_value |
Logical; overlay values if |
... |
Additional theme arguments. |
A ggplot heatmap.
S3 Plot for Sparse Correlation Results
## S3 method for class 'corr_sparse' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )## S3 method for class 'corr_sparse' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )
x |
A sparse correlation result. |
title |
Optional plot title. |
low_color |
Fill color for -1. |
high_color |
Fill color for +1. |
mid_color |
Fill color for 0. |
value_text_size |
Text size for optional overlaid values. |
ci_text_size |
Text size for optional confidence-interval labels. |
show_value |
Logical; overlay values if |
... |
Additional theme arguments. |
Plot probability-of-agreement results
## S3 method for class 'prob_agree' plot( x, threshold = NULL, title = "Probability of agreement", style = c("auto", "curve", "facet", "heatmap"), show_ci = NULL, ... )## S3 method for class 'prob_agree' plot( x, threshold = NULL, title = "Probability of agreement", style = c("auto", "curve", "facet", "heatmap"), show_ci = NULL, ... )
x |
An object returned by |
threshold |
Optional probability threshold shown in curve plots. |
title |
Optional plot title. |
style |
Plot style: |
show_ci |
Logical; for curve plots, controls whether confidence ribbons are shown. By default ribbons are shown for one comparison and suppressed for multiple comparisons. |
... |
Additional arguments passed to |
A ggplot object.
Computes the polychoric correlation for either a pair of ordinal variables or all pairwise combinations of ordinal columns in a matrix/data frame.
polychoric( data, y = NULL, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE, conf_level = 0.95, correct = 0.5, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'polychoric_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'polychoric_corr' plot( x, title = "Polychoric correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'polychoric_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.polychoric_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )polychoric( data, y = NULL, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE, conf_level = 0.95, correct = 0.5, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'polychoric_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'polychoric_corr' plot( x, title = "Polychoric correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'polychoric_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.polychoric_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
An ordinal vector, matrix, or data frame. Supported columns are factors, ordered factors, logical values, or integer-like numerics. In matrix/data-frame mode, only supported ordinal columns are retained. |
y |
Optional second ordinal vector. When supplied, the function returns a single polychoric correlation estimate. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
p_value |
Logical (default |
conf_level |
Confidence level used when |
correct |
Non-negative continuity correction added to zero-count cells.
Default is |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum correlation. |
high_color |
Color for the maximum correlation. |
mid_color |
Color for zero correlation. |
value_text_size |
Font size used in tile labels. |
show_value |
Logical; if |
object |
An object of class |
ci_digits |
Integer; digits for confidence limits in the pairwise summary. |
p_digits |
Integer; digits for p-values in the pairwise summary. |
The polychoric correlation generalises the tetrachoric model to ordered
categorical variables with more than two levels. It assumes latent
standard-normal variables with correlation , and
cut-points
and
such that
For an observed contingency table with counts ,
the thresholds are estimated from the marginal cumulative proportions:
Holding those thresholds fixed, the log-likelihood for the latent correlation is
and the estimator returned is the maximiser over .
The C++ implementation performs a dense one-dimensional search followed by
Brent refinement.
The argument correct adds a non-negative continuity correction to
empty cells before marginal threshold estimation and likelihood evaluation.
This avoids numerical failures for sparse tables with structurally zero cells.
When correct = 0 and zero cells are present, the corresponding fit can
be boundary-driven rather than a regular interior maximum-likelihood problem.
The returned object stores sparse-fit diagnostics and the thresholds used for
estimation so those cases can be inspected explicitly.
Assumptions. The coefficient is appropriate when both observed ordinal variables are viewed as discretisations of jointly normal latent variables. The optional p-values and confidence intervals adopt this latent-normal interpretation and use the same likelihood that defines the polychoric estimate. These inferential quantities are therefore model-based and should not be interpreted as distribution-free summaries.
Inference. When ci = TRUE or p_value = TRUE, the
function refits the pairwise polychoric model by maximum likelihood and
obtains the observed information matrix numerically in C++. The reported
confidence interval is a Wald interval
, and the
reported p-value is from the large-sample Wald -test for
. These inferential quantities are only computed when
explicitly requested.
In matrix/data-frame mode, all pairwise polychoric correlations are computed
between supported ordinal columns. Diagonal entries are 1 for
non-degenerate columns and NA when a column has fewer than two
observed levels.
Computational complexity. For ordinal variables, the matrix
path evaluates bivariate likelihoods. Each pair optimises a
single scalar parameter , so the main cost is repeated evaluation
of bivariate normal rectangle probabilities.
If y is supplied, a numeric scalar with attributes
diagnostics and thresholds. Otherwise a symmetric matrix of
class polychoric_corr with attributes method,
description, package = "matrixCorr", diagnostics,
thresholds, and correct. When p_value = TRUE, the
returned object also carries an inference attribute with elements
estimate, statistic, parameter, p_value, and
n_obs. When ci = TRUE, it also carries a ci attribute
with elements est, lwr.ci, upr.ci, conf.level,
and ci.method, plus attr(x, "conf.level"). Scalar outputs keep
the same point estimate and gain the same metadata only when inference is
requested. In matrix mode, output = "edge_list" returns a data frame with columns
row, col, value; output = "sparse" returns a
symmetric sparse matrix.
Thiago de Paula Oliveira
Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460.
set.seed(124) n <- 1200 Sigma <- matrix(c( 1.00, 0.60, 0.40, 0.60, 1.00, 0.50, 0.40, 0.50, 1.00 ), 3, 3, byrow = TRUE) Z <- mnormt::rmnorm(n = n, mean = rep(0, 3), varcov = Sigma) Y <- data.frame( y1 = ordered(cut( Z[, 1], breaks = c(-Inf, -0.7, 0.4, Inf), labels = c("low", "mid", "high") )), y2 = ordered(cut( Z[, 2], breaks = c(-Inf, -1.0, -0.1, 0.8, Inf), labels = c("1", "2", "3", "4") )), y3 = ordered(cut( Z[, 3], breaks = c(-Inf, -0.4, 0.2, 1.1, Inf), labels = c("A", "B", "C", "D") )) ) pc <- polychoric(Y) print(pc, digits = 3) summary(pc) estimate(pc) tidy(pc) pc_ci <- polychoric(Y, ci = TRUE) ci(pc_ci) confint(pc_ci) plot(pc) polychoric(Y, output = "edge_list", threshold = 0.3, diag = FALSE) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(pc) } # latent Pearson correlations used to generate the ordinal variables round(stats::cor(Z), 2)set.seed(124) n <- 1200 Sigma <- matrix(c( 1.00, 0.60, 0.40, 0.60, 1.00, 0.50, 0.40, 0.50, 1.00 ), 3, 3, byrow = TRUE) Z <- mnormt::rmnorm(n = n, mean = rep(0, 3), varcov = Sigma) Y <- data.frame( y1 = ordered(cut( Z[, 1], breaks = c(-Inf, -0.7, 0.4, Inf), labels = c("low", "mid", "high") )), y2 = ordered(cut( Z[, 2], breaks = c(-Inf, -1.0, -0.1, 0.8, Inf), labels = c("1", "2", "3", "4") )), y3 = ordered(cut( Z[, 3], breaks = c(-Inf, -0.4, 0.2, 1.1, Inf), labels = c("A", "B", "C", "D") )) ) pc <- polychoric(Y) print(pc, digits = 3) summary(pc) estimate(pc) tidy(pc) pc_ci <- polychoric(Y, ci = TRUE) ci(pc_ci) confint(pc_ci) plot(pc) polychoric(Y, output = "edge_list", threshold = 0.3, diag = FALSE) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(pc) } # latent Pearson correlations used to generate the ordinal variables round(stats::cor(Z), 2)
Computes polyserial correlations between continuous variables in data
and ordinal variables in y. Both pairwise vector mode and rectangular
matrix/data-frame mode are supported.
polyserial(data, y, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE, conf_level = 0.95, ...) ## S3 method for class 'polyserial_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'polyserial_corr' plot( x, title = "Polyserial correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'polyserial_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.polyserial_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )polyserial(data, y, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE, conf_level = 0.95, ...) ## S3 method for class 'polyserial_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'polyserial_corr' plot( x, title = "Polyserial correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'polyserial_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.polyserial_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric vector, matrix, or data frame containing continuous variables. |
y |
An ordinal vector, matrix, or data frame containing ordinal variables. Supported columns are factors, ordered factors, logical values, or integer-like numerics. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
p_value |
Logical (default |
conf_level |
Confidence level used when |
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum correlation. |
high_color |
Color for the maximum correlation. |
mid_color |
Color for zero correlation. |
value_text_size |
Font size used in tile labels. |
show_value |
Logical; if |
object |
An object of class |
ci_digits |
Integer; digits for confidence limits in the pairwise summary. |
p_digits |
Integer; digits for p-values in the pairwise summary. |
The polyserial correlation assumes a latent bivariate normal model between a
continuous variable and an unobserved continuous propensity underlying an
ordinal variable. Let
with
, and suppose the observed ordinal response
is formed by cut-points
:
After standardising the observed continuous variable , the thresholds
are estimated from the marginal proportions of . Conditional on an
observed , the category probability is
The returned estimate maximises the log-likelihood
over via a one-dimensional Brent search in C++.
Assumptions. The coefficient is appropriate when the ordinal variable is viewed as the discretised version of a latent normal variable that is jointly normal with the observed continuous variable. The optional p-values and confidence intervals adopt this latent-normal interpretation and use the same likelihood that defines the polyserial estimate. These inferential quantities are therefore model-based and should not be interpreted as distribution-free summaries.
Inference. When ci = TRUE or p_value = TRUE, the
function refits the pairwise polyserial model by maximum likelihood and
obtains the observed information matrix numerically in C++. The reported
confidence interval is a Wald interval
, and the
reported p-value is from the large-sample Wald -test for
. These inferential quantities are only computed when
explicitly requested.
In vector mode a single estimate is returned. In matrix/data-frame mode,
every numeric column of data is paired with every ordinal column of
y, producing a rectangular matrix of continuous-by-ordinal
polyserial correlations.
Computational complexity. If data has continuous
columns and y has ordinal columns, the matrix path computes
separate one-parameter likelihood optimisations.
If both data and y are vectors, a numeric scalar. Otherwise a
numeric matrix of class polyserial_corr with rows corresponding to
the continuous variables in data and columns to the ordinal variables
in y. Matrix outputs carry attributes method,
description, and package = "matrixCorr". When
p_value = TRUE, the returned object also carries an inference
attribute with elements estimate, statistic, parameter,
p_value, and n_obs. When ci = TRUE, it also carries a
ci attribute with elements est, lwr.ci,
upr.ci, conf.level, and ci.method, plus
attr(x, "conf.level"). Scalar outputs keep the same point estimate
and gain the same metadata only when inference is requested.
Thiago de Paula Oliveira
Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3), 337-347.
set.seed(125) n <- 1000 Sigma <- matrix(c( 1.00, 0.30, 0.55, 0.20, 0.30, 1.00, 0.25, 0.50, 0.55, 0.25, 1.00, 0.40, 0.20, 0.50, 0.40, 1.00 ), 4, 4, byrow = TRUE) Z <- mnormt::rmnorm(n = n, mean = rep(0, 4), varcov = Sigma) X <- data.frame(x1 = Z[, 1], x2 = Z[, 2]) Y <- data.frame( y1 = ordered(cut( Z[, 3], breaks = c(-Inf, -0.5, 0.7, Inf), labels = c("low", "mid", "high") )), y2 = ordered(cut( Z[, 4], breaks = c(-Inf, -1.0, 0.0, 1.0, Inf), labels = c("1", "2", "3", "4") )) ) ps <- polyserial(X, Y) print(ps, digits = 3) summary(ps) estimate(ps) tidy(ps) ps_ci <- polyserial(X, Y, ci = TRUE) ci(ps_ci) confint(ps_ci) plot(ps)set.seed(125) n <- 1000 Sigma <- matrix(c( 1.00, 0.30, 0.55, 0.20, 0.30, 1.00, 0.25, 0.50, 0.55, 0.25, 1.00, 0.40, 0.20, 0.50, 0.40, 1.00 ), 4, 4, byrow = TRUE) Z <- mnormt::rmnorm(n = n, mean = rep(0, 4), varcov = Sigma) X <- data.frame(x1 = Z[, 1], x2 = Z[, 2]) Y <- data.frame( y1 = ordered(cut( Z[, 3], breaks = c(-Inf, -0.5, 0.7, Inf), labels = c("low", "mid", "high") )), y2 = ordered(cut( Z[, 4], breaks = c(-Inf, -1.0, 0.0, 1.0, Inf), labels = c("1", "2", "3", "4") )) ) ps <- polyserial(X, Y) print(ps, digits = 3) summary(ps) estimate(ps) tidy(ps) ps_ci <- polyserial(X, Y, ci = TRUE) ci(ps_ci) confint(ps_ci) plot(ps)
ccc_ci
For compatibility with objects that still carry class "ccc_ci".
## S3 method for class 'ccc_ci' print(x, ...)## S3 method for class 'ccc_ci' print(x, ...)
x |
A |
... |
Passed to underlying printers. |
Print Edge-List Correlation Results
## S3 method for class 'corr_edge_list' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )## S3 method for class 'corr_edge_list' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
x |
An edge-list correlation result. |
digits |
Number of digits for numeric values. |
n |
Optional preview row threshold. |
topn |
Optional number of head/tail rows when preview is truncated. |
max_vars |
Optional maximum number of visible columns in preview. |
width |
Optional output width. |
show_ci |
One of |
... |
Unused. |
Print method for matrixCorr CCC objects
## S3 method for class 'matrixCorr_ccc' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )## S3 method for class 'matrixCorr_ccc' print( x, digits = 4, ci_digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
x |
A |
digits |
Number of digits for CCC estimates. |
ci_digits |
Number of digits for CI bounds. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Passed to underlying printers. |
Print method for matrixCorr CCC objects with CIs
## S3 method for class 'matrixCorr_ccc_ci' print(x, ...)## S3 method for class 'matrixCorr_ccc_ci' print(x, ...)
x |
A |
... |
Passed to underlying printers. |
Print, summarize, and plot methods for pairwise
repeated-measures correlation objects of class "rmcorr" and
"summary.rmcorr".
## S3 method for class 'rmcorr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'rmcorr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.rmcorr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'rmcorr' plot( x, title = NULL, point_alpha = 0.8, line_width = 0.8, show_legend = FALSE, show_value = TRUE, ... )## S3 method for class 'rmcorr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'rmcorr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.rmcorr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'rmcorr' plot( x, title = NULL, point_alpha = 0.8, line_width = 0.8, show_legend = FALSE, show_value = TRUE, ... )
x |
An object of class |
digits |
Number of significant digits to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional arguments passed to downstream methods. For
|
object |
An object of class |
title |
Optional plot title for |
point_alpha |
Alpha transparency for scatterplot points. |
line_width |
Line width for subject-specific fitted lines. |
show_legend |
Logical; if |
show_value |
Logical; included for a consistent plotting interface. Pairwise repeated-measures plots do not overlay numeric cell values, so this argument currently has no effect. |
Print, summarize, and plot methods for repeated-measures
correlation matrix objects of class "rmcorr_matrix" and
"summary.rmcorr_matrix".
## S3 method for class 'rmcorr_matrix' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'rmcorr_matrix' plot( x, title = "Repeated-measures correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'rmcorr_matrix' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.rmcorr_matrix' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )## S3 method for class 'rmcorr_matrix' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'rmcorr_matrix' plot( x, title = "Repeated-measures correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'rmcorr_matrix' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.rmcorr_matrix' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
x |
An object of class |
digits |
Number of significant digits to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional arguments passed to downstream methods. |
title |
Plot title for |
low_color, high_color, mid_color
|
Colours used for negative, positive, and midpoint values in the heatmap. |
value_text_size |
Size of the overlaid numeric value labels in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
Print Standardized Correlation Summaries
## S3 method for class 'summary.corr_result' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )## S3 method for class 'summary.corr_result' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
x |
A |
digits |
Number of digits for numeric values. |
n |
Optional preview row threshold. |
topn |
Optional number of head/tail rows when preview is truncated. |
max_vars |
Optional maximum number of visible columns in preview. |
width |
Optional output width. |
show_ci |
One of |
... |
Unused. |
Invisibly returns x.
Prints compact summary statistics returned by
summary.tetrachoric_corr(), summary.polychoric_corr(),
summary.polyserial_corr(), and summary.biserial_corr().
## S3 method for class 'summary.latent_corr' print(x, digits = 4, ...)## S3 method for class 'summary.latent_corr' print(x, digits = 4, ...)
x |
An object of class |
digits |
Integer; number of decimal places to print. |
... |
Unused. |
Invisibly returns x.
Prints compact summary statistics returned by matrix-style
summary() methods in matrixCorr.
## S3 method for class 'summary.matrixCorr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )## S3 method for class 'summary.matrixCorr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Unused. |
Invisibly returns x.
prob_agree() implements the probability of agreement following Stevens and
Anderson-Cook (2017). It fits binomial reliability curves by population or
generation and estimates, for each pair, the probability that the fitted
reliabilities differ by no more than a user-specified practically negligible
amount.
This is not a correlation coefficient and should not be interpreted as linear association. It is a tolerance-based agreement measure. It is computed from the sampling distribution of the estimated difference between two reliability curves.
prob_agree( data, response, predictor, group, delta = NULL, limits = NULL, link = c("logit", "probit"), newdata = NULL, grid_size = 100L, ci = TRUE, conf_level = 0.95, max_iter = 50L, tol = 1e-08, verbose = FALSE )prob_agree( data, response, predictor, group, delta = NULL, limits = NULL, link = c("logit", "probit"), newdata = NULL, grid_size = 100L, ci = TRUE, conf_level = 0.95, max_iter = 50L, tol = 1e-08, verbose = FALSE )
data |
A data frame containing the response, predictor, and group variables. |
response |
Character scalar naming the binary pass/fail response column. |
predictor |
Character scalar naming the numeric age, time, or operating condition column. |
group |
Character scalar naming the population/generation column. All pairwise comparisons among the observed non-missing levels are evaluated. |
delta |
Positive scalar or vector tolerance for symmetric limits
|
limits |
Numeric vector |
link |
Link used for the reliability curve: |
newdata |
Optional data frame containing predictor values where the
probability of agreement is evaluated. If |
grid_size |
Integer number of evaluation points when |
ci |
Logical; if |
conf_level |
Confidence level for intervals. |
max_iter |
Maximum number of IRLS iterations for each fitted curve. |
tol |
Convergence tolerance for IRLS coefficient updates. |
verbose |
Logical; if |
For each pair of fitted reliability curves and
, Stevens and Anderson-Cook define
The models are fit with a binomial GLM using either a logit or probit link,
with . The C++ backend evaluates the
two-population large-sample normal approximation described in the paper and
Supplementary Material A; when more than two groups are supplied, prob_agree()
applies that calculation to all pairwise group comparisons.
Missing rows in response, predictor, or group are removed before model
fitting. The response must be binary, coded as 0/1, FALSE/TRUE, or a
two-level factor.
A data frame with class
c("prob_agree_curve", "prob_agree", "data.frame") and columns group1,
group2, the predictor, prob_agree, and optional lwr.ci, upr.ci.
Thiago de Paula Oliveira
Stevens, N. T. and Anderson-Cook, C. M. (2017). Comparing the Reliability of Related Populations With the Probability of Agreement. Technometrics, 59(3), 371-380. doi:10.1080/00401706.2016.1214180.
# Stevens and Anderson-Cook's probability of agreement evaluates whether # two related populations have reliability curves that are similar enough # to be treated as practically homogeneous. At each age, the agreement # hypothesis is that the difference between the two fitted reliabilities # lies inside the user-specified practical tolerance interval. set.seed(1) n <- 160 dat <- data.frame( age = c(runif(n, 0, 60), runif(n, 0, 45)), generation = rep(c("Gen1", "Gen2"), each = n) ) eta <- ifelse( dat$generation == "Gen1", 4.3 - 0.045 * dat$age, 4.0 - 0.040 * dat$age ) dat$pass <- rbinom(nrow(dat), size = 1, prob = plogis(eta)) fit_pa <- prob_agree( dat, response = "pass", predictor = "age", group = "generation", delta = 0.05, link = "logit", ci = TRUE ) print(fit_pa) summary(fit_pa) estimate(fit_pa) tidy(fit_pa) confint(fit_pa) plot(fit_pa) # Four generations are compared as all pairwise two-generation contrasts. set.seed(2) n4 <- 120 dat4 <- data.frame( age = rep(runif(n4, 0, 55), times = 4), generation = rep(paste0("Gen", 1:4), each = n4) ) shifts <- c(4.2, 4.0, 3.8, 3.6) slopes <- c(-0.040, -0.042, -0.044, -0.046) gen_id <- match(dat4$generation, paste0("Gen", 1:4)) eta4 <- shifts[gen_id] + slopes[gen_id] * dat4$age dat4$pass <- rbinom(nrow(dat4), size = 1, prob = plogis(eta4)) fit4 <- prob_agree( dat4, response = "pass", predictor = "age", group = "generation", limits = c(-0.03, 0.05), link = "logit", ci = FALSE ) print(fit4) plot(fit4)# Stevens and Anderson-Cook's probability of agreement evaluates whether # two related populations have reliability curves that are similar enough # to be treated as practically homogeneous. At each age, the agreement # hypothesis is that the difference between the two fitted reliabilities # lies inside the user-specified practical tolerance interval. set.seed(1) n <- 160 dat <- data.frame( age = c(runif(n, 0, 60), runif(n, 0, 45)), generation = rep(c("Gen1", "Gen2"), each = n) ) eta <- ifelse( dat$generation == "Gen1", 4.3 - 0.045 * dat$age, 4.0 - 0.040 * dat$age ) dat$pass <- rbinom(nrow(dat), size = 1, prob = plogis(eta)) fit_pa <- prob_agree( dat, response = "pass", predictor = "age", group = "generation", delta = 0.05, link = "logit", ci = TRUE ) print(fit_pa) summary(fit_pa) estimate(fit_pa) tidy(fit_pa) confint(fit_pa) plot(fit_pa) # Four generations are compared as all pairwise two-generation contrasts. set.seed(2) n4 <- 120 dat4 <- data.frame( age = rep(runif(n4, 0, 55), times = 4), generation = rep(paste0("Gen", 1:4), each = n4) ) shifts <- c(4.2, 4.0, 3.8, 3.6) slopes <- c(-0.040, -0.042, -0.044, -0.046) gen_id <- match(dat4$generation, paste0("Gen", 1:4)) eta4 <- shifts[gen_id] + slopes[gen_id] * dat4$age dat4$pass <- rbinom(nrow(dat4), size = 1, prob = plogis(eta4)) fit4 <- prob_agree( dat4, response = "pass", predictor = "age", group = "generation", limits = c(-0.03, 0.05), link = "logit", ci = FALSE ) print(fit4) plot(fit4)
Computes repeated-measures correlation for two or more continuous responses
observed repeatedly within subjects. Supply a data.frame plus column
names, or pass the response matrix/data frame and subject vector directly.
The repeated observations are indexed only by subject, thus no explicit
time variable is modeled, and the method targets a common within-subject
linear association after removing subject-specific means.
rmcorr( data = NULL, response, subject, na_method = c("error", "pairwise", "complete"), conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), keep_data = FALSE, verbose = FALSE, estimator = c("ancova", "weighted"), ci_method = c("auto", "fisher_z", "bootstrap", "none"), n_boot = 999L, seed = NULL, ... ) rmcorr_weighted(data = NULL, response, subject, ...)rmcorr( data = NULL, response, subject, na_method = c("error", "pairwise", "complete"), conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), keep_data = FALSE, verbose = FALSE, estimator = c("ancova", "weighted"), ci_method = c("auto", "fisher_z", "bootstrap", "none"), n_boot = 999L, seed = NULL, ... ) rmcorr_weighted(data = NULL, response, subject, ...)
data |
Optional |
response |
Either:
If exactly two responses are supplied, the function returns a pairwise
repeated-measures correlation object of class |
subject |
Subject identifier (factor/character/integer/numeric) or a
single character string naming the subject column in |
na_method |
Character scalar controlling missing-data handling.
|
conf_level |
Confidence level used for Wald confidence intervals on the
repeated-measures correlation (default |
n_threads |
Integer |
keep_data |
Logical (default |
verbose |
Logical; if |
estimator |
Character scalar selecting the repeated-measures
correlation estimator. |
ci_method |
Confidence-interval engine. |
n_boot |
Integer |
seed |
Optional positive integer seed used for bootstrap reproducibility. |
... |
Deprecated compatibility aliases. The legacy |
Repeated-measures correlation estimates the common within-subject linear association between two variables measured repeatedly on the same subjects. It differs from agreement methods such as Lin's CCC or Bland-Altman analysis because those target concordance or interchangeability, whereas repeated-measures correlation targets the strength of the subject-centred association.
For subject and repeated observations
, let and denote the two
responses. Define subject-specific means
The repeated-measures correlation uses within-subject centred values
and computes
Equivalently, this is the correlation implied by an ANCOVA model with a common slope and subject-specific intercepts:
The returned slope is
and the subject-specific fitted intercepts are
. Residual degrees of
freedom are , where after filtering to
complete observations and retaining only subjects with at least two repeated
pairs.
Confidence intervals are computed with a Fisher -transformation of
and then back-transformed to the correlation scale. In
matrix mode, the same estimator is applied to every pair of selected
response columns.
With estimator = "weighted", the package implements the weighted
repeated-measures estimator from Kondo et al. (2025) using complete observed
pairs per contrast. This estimator does not impute missing data;
it uses per-subject complete-pair sets and weighted sums of explained and
residual quantities from the fixed-subject ANCOVA decomposition.
Bootstrap confidence intervals use non-parametric subject-level resampling (subjects are resampled as blocks, not rows).
Either a "rmcorr" object (exactly two responses) or a
"rmcorr_matrix" object (pairwise results when 3 responses).
If "rmcorr" (exactly two responses), outputs include:
estimate; repeated-measures correlation estimate.
p_value; two-sided p-value for the common within-subject slope.
lwr, upr; confidence interval limits for
estimate.
slope; common within-subject slope.
df; residual degrees of freedom .
n_obs; number of complete observations retained after
dropping subjects with fewer than two repeated pairs.
n_subjects; number of contributing subjects.
responses; names of the fitted response variables.
compatibility aliases r, conf_int, and based.on
are reconstructed on access without duplicate storage.
when keep_data = TRUE, compact source data are retained so
plot() can lazily reconstruct data_long,
intercepts, and fitted lines; these are otherwise not stored.
If "rmcorr_matrix" (3 responses), outputs are:
a symmetric numeric matrix of pairwise repeated-measures correlations.
attributes method, description, and
package = "matrixCorr".
diagnostics; a list with square matrices for slope,
p_value, df, n_complete, n_subjects,
conf_low, and conf_high, plus scalar
conf_level.
Thiago de Paula Oliveira
Bakdash, J. Z., & Marusich, L. R. (2017). Repeated Measures Correlation. Frontiers in Psychology, 8, 456. doi:10.3389/fpsyg.2017.00456
Kondo, M., Nagashima, K., Isono, S., & Sato, Y. (2025). Weighted Repeated Measures Correlation Coefficient: A New Correlation Coefficient for Handling Missing Data With Repeated Measures. Statistics in Medicine, 44(10-12), e70046. doi:10.1002/sim.70046
set.seed(2026) n_subjects <- 20 n_rep <- 4 subject <- rep(seq_len(n_subjects), each = n_rep) subj_eff_x <- rnorm(n_subjects, sd = 1.5) subj_eff_y <- rnorm(n_subjects, sd = 2.0) within_signal <- rnorm(n_subjects * n_rep) dat <- data.frame( subject = subject, x = subj_eff_x[subject] + within_signal + rnorm(n_subjects * n_rep, sd = 0.2), y = subj_eff_y[subject] + 0.8 * within_signal + rnorm(n_subjects * n_rep, sd = 0.3), z = subj_eff_y[subject] - 0.4 * within_signal + rnorm(n_subjects * n_rep, sd = 0.4) ) fit_xy <- rmcorr(dat, response = c("x", "y"), subject = "subject", keep_data = TRUE) print(fit_xy) summary(fit_xy) estimate(fit_xy) tidy(fit_xy) ci(fit_xy) confint(fit_xy) plot(fit_xy) fit_mat <- rmcorr(dat, response = c("x", "y", "z"), subject = "subject") print(fit_mat, digits = 3) summary(fit_mat) tidy(fit_mat) plot(fit_mat) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { fit_mat_view <- rmcorr( dat, response = c("x", "y", "z"), subject = "subject", keep_data = TRUE ) view_rmcorr_shiny(fit_mat_view) }set.seed(2026) n_subjects <- 20 n_rep <- 4 subject <- rep(seq_len(n_subjects), each = n_rep) subj_eff_x <- rnorm(n_subjects, sd = 1.5) subj_eff_y <- rnorm(n_subjects, sd = 2.0) within_signal <- rnorm(n_subjects * n_rep) dat <- data.frame( subject = subject, x = subj_eff_x[subject] + within_signal + rnorm(n_subjects * n_rep, sd = 0.2), y = subj_eff_y[subject] + 0.8 * within_signal + rnorm(n_subjects * n_rep, sd = 0.3), z = subj_eff_y[subject] - 0.4 * within_signal + rnorm(n_subjects * n_rep, sd = 0.4) ) fit_xy <- rmcorr(dat, response = c("x", "y"), subject = "subject", keep_data = TRUE) print(fit_xy) summary(fit_xy) estimate(fit_xy) tidy(fit_xy) ci(fit_xy) confint(fit_xy) plot(fit_xy) fit_mat <- rmcorr(dat, response = c("x", "y", "z"), subject = "subject") print(fit_mat, digits = 3) summary(fit_mat) tidy(fit_mat) plot(fit_mat) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { fit_mat_view <- rmcorr( dat, response = c("x", "y", "z"), subject = "subject", keep_data = TRUE ) view_rmcorr_shiny(fit_mat_view) }
Computes robust distance correlations by applying the biloop transformation to each numeric variable and then computing unbiased distance correlation on the transformed variables.
robust_dcor( data, na_method = c("error", "pairwise", "complete"), p_value = FALSE, inference = c("none", "permutation"), n_perm = 999L, seed = NULL, c_const = 4, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'robust_dcor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'robust_dcor' summary(object, topn = NULL, show_ci = NULL, ...) ## S3 method for class 'robust_dcor' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )robust_dcor( data, na_method = c("error", "pairwise", "complete"), p_value = FALSE, inference = c("none", "permutation"), n_perm = 999L, seed = NULL, c_const = 4, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'robust_dcor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'robust_dcor' summary(object, topn = NULL, show_ci = NULL, ...) ## S3 method for class 'robust_dcor' plot( x, title = NULL, low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... )
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns are dropped. Columns must be numeric. |
na_method |
Character scalar controlling missing-data handling.
|
p_value |
Logical (default |
inference |
Character scalar. Use |
n_perm |
Positive integer; number of permutations used when
|
seed |
Optional positive integer seed for permutation inference. |
c_const |
Positive numeric tuning constant for the biloop
transformation. Default |
n_threads |
Integer |
output |
Output representation for the computed estimates:
|
threshold |
Non-negative absolute-value filter for non-matrix outputs.
Must be |
diag |
Logical; whether to include diagonal entries in
|
... |
Deprecated compatibility aliases. Currently only
|
x, object
|
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns. |
width |
Optional display width. |
show_ci |
One of |
title |
Plot title. |
low_color, high_color, mid_color
|
Colors used in the heatmap. |
value_text_size, ci_text_size
|
Text sizes for plot labels. |
show_value |
Logical; overlay numeric values if |
Permutation inference is computed for every unique column pair, so the
workload grows as . A warning is emitted for
large requests; reduce n_perm, reduce the number of columns, or run
estimates without permutation p-values when this cost is not intended.
For each variable , the wrapper first robustly
standardises values using the median
and raw median absolute deviation
If this scale is zero or non-finite, the implementation falls back to
and then the ordinary sample standard
deviation. Columns with no positive finite fallback scale are treated as
degenerate.
The standardised value is mapped to two
bounded coordinates by the biloop transformation
and
Pairwise distances are Euclidean distances in this transformed two-dimensional space,
with zero diagonal.
The transformed distance matrix is U-centred as
where and
. The diagonal of
is zero.
For variables and , the unbiased robust distance covariance
is
The corresponding robust distance correlation is the usual non-negative distance-correlation ratio based on this covariance and the two transformed distance variances. Small negative numerical artifacts are clipped to zero.
A symmetric correlation result. Dense output inherits from
robust_dcor, corr_matrix, and matrix. Sparse and
edge-list outputs use the package-standard corr_sparse and
corr_edge_list representations.
dcor()
robust_dcor() is not a replacement for dcor(). It estimates distance
correlation after a bounded robust transformation. Large differences between
dcor() and robust_dcor() can indicate that the classical dependence
signal is driven by tail observations or outliers.
robust_dcor() is more robust to extreme observations than classical
dcor(), but it may downweight genuine tail dependence. Classical
dcor() may be preferable when tail dependence is the scientific target.
Comparing both methods is recommended.
Thiago de Paula Oliveira
Leyder, J., Raymaekers, J., & Rousseeuw, P. J. (2025). Robust distance correlation through bounded transformations.
dcor(), wincor(), pbcor(), skipped_corr()
## Non-linear dependence: both estimators detect association. set.seed(1) n <- 200 x <- rnorm(n) y <- x^2 + rnorm(n, sd = 0.2) X <- cbind(x = x, y = y) classical <- dcor(X) robust <- robust_dcor(X) round(c( dcor = classical["x", "y"], robust_dcor = robust["x", "y"] ), 3) ## One diagonal outlier can inflate classical dCor more than robust dCor. set.seed(45) x <- rnorm(20) y <- rnorm(20) x[1] <- 10 y[1] <- 10 X_out <- cbind(x = x, y = y) classical <- dcor(X_out) robust <- robust_dcor(X_out) round(c( dcor = classical["x", "y"], robust_dcor = robust["x", "y"] ), 3) print(classical) print(robust) summary(robust) estimate(robust) tidy(robust) plot(robust) ## Several variables. set.seed(7) z <- rnorm(120) X_multi <- cbind( linear = z + rnorm(120, sd = 0.3), nonlinear = z^2 + rnorm(120, sd = 0.3), noise = rnorm(120), outlier = rnorm(120) ) X_multi[1, "outlier"] <- 12 X_multi[1, "noise"] <- 12 robust_multi <- robust_dcor(X_multi) print(robust_multi) summary(robust_multi) plot(robust_multi)## Non-linear dependence: both estimators detect association. set.seed(1) n <- 200 x <- rnorm(n) y <- x^2 + rnorm(n, sd = 0.2) X <- cbind(x = x, y = y) classical <- dcor(X) robust <- robust_dcor(X) round(c( dcor = classical["x", "y"], robust_dcor = robust["x", "y"] ), 3) ## One diagonal outlier can inflate classical dCor more than robust dCor. set.seed(45) x <- rnorm(20) y <- rnorm(20) x[1] <- 10 y[1] <- 10 X_out <- cbind(x = x, y = y) classical <- dcor(X_out) robust <- robust_dcor(X_out) round(c( dcor = classical["x", "y"], robust_dcor = robust["x", "y"] ), 3) print(classical) print(robust) summary(robust) estimate(robust) tidy(robust) plot(robust) ## Several variables. set.seed(7) z <- rnorm(120) X_multi <- cbind( linear = z + rnorm(120, sd = 0.3), nonlinear = z^2 + rnorm(120, sd = 0.3), noise = rnorm(120), outlier = rnorm(120) ) X_multi[1, "outlier"] <- 12 X_multi[1, "noise"] <- 12 robust_multi <- robust_dcor(X_multi) print(robust_multi) summary(robust_multi) plot(robust_multi)
Computes a shrinkage correlation matrix for numeric data using a high-performance 'C++' backend. The current implementation uses the Schafer-Strimmer shrinkage estimator to stabilise Pearson correlation estimates by shrinking off-diagonal entries towards zero.
shrinkage_corr( data, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) schafer_corr( data, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'shrinkage_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'schafer_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'shrinkage_corr' plot( x, title = "Schafer-Strimmer shrinkage correlation", cluster = TRUE, hclust_method = "complete", triangle = c("upper", "lower", "full"), show_value = TRUE, show_values = NULL, value_text_limit = 60, value_text_size = 3, palette = c("diverging", "viridis"), ... ) ## S3 method for class 'schafer_corr' plot( x, title = "Schafer-Strimmer shrinkage correlation", cluster = TRUE, hclust_method = "complete", triangle = c("upper", "lower", "full"), show_value = TRUE, show_values = NULL, value_text_limit = 60, value_text_size = 3, palette = c("diverging", "viridis"), ... ) ## S3 method for class 'shrinkage_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'schafer_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )shrinkage_corr( data, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) schafer_corr( data, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'shrinkage_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'schafer_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'shrinkage_corr' plot( x, title = "Schafer-Strimmer shrinkage correlation", cluster = TRUE, hclust_method = "complete", triangle = c("upper", "lower", "full"), show_value = TRUE, show_values = NULL, value_text_limit = 60, value_text_size = 3, palette = c("diverging", "viridis"), ... ) ## S3 method for class 'schafer_corr' plot( x, title = "Schafer-Strimmer shrinkage correlation", cluster = TRUE, hclust_method = "complete", triangle = c("upper", "lower", "full"), show_value = TRUE, show_values = NULL, value_text_limit = 60, value_text_size = 3, palette = c("diverging", "viridis"), ... ) ## S3 method for class 'shrinkage_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'schafer_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric matrix or a data frame with at least two numeric
columns. All non-numeric columns will be excluded. Columns must be numeric
and contain no |
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional arguments passed to |
title |
Plot title. |
cluster |
Logical; if TRUE, reorder rows/cols by hierarchical clustering
on distance |
hclust_method |
Linkage method for |
triangle |
One of |
show_value |
Logical; if |
show_values |
Deprecated compatibility alias for |
value_text_limit |
Integer threshold controlling when values are drawn. |
value_text_size |
Font size for values if shown. |
palette |
Character; |
object |
An object of class |
Let be the sample Pearson correlation matrix. The Schafer-Strimmer
shrinkage estimator targets the identity in correlation space and uses
(clamped to ), where
.
The returned estimator is .
A symmetric numeric matrix of class shrinkage_corr (with
compatibility class schafer_corr) where entry (i, j) is the
shrunk correlation between the i-th and j-th numeric columns.
Attributes:
method = "schafer_shrinkage"
description = "Schafer-Strimmer shrinkage correlation
matrix"
package = "matrixCorr"
Columns with zero variance are set to NA across row/column (including
the diagonal), matching pearson_corr() behaviour.
Invisibly returns x.
A ggplot object.
No missing values are permitted. Columns with fewer than two observations
or zero variance are flagged as NA (row/column).
Thiago de Paula Oliveira
Schafer, J. & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1).
print.shrinkage_corr,
plot.shrinkage_corr, pearson_corr
## Multivariate normal with AR(1) dependence (Toeplitz correlation) set.seed(1) n <- 80; p <- 40; rho <- 0.6 d <- abs(outer(seq_len(p), seq_len(p), "-")) Sigma <- rho^d X <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma) colnames(X) <- paste0("V", seq_len(p)) Rshr <- shrinkage_corr(X) print(Rshr, digits = 2, n = 6, max_vars = 6) summary(Rshr) estimate(Rshr) tidy(Rshr) plot(Rshr) ## Shrinkage typically moves the sample correlation closer to the truth Rraw <- stats::cor(X) off <- upper.tri(Sigma, diag = FALSE) mae_raw <- mean(abs(Rraw[off] - Sigma[off])) mae_shr <- mean(abs(Rshr[off] - Sigma[off])) print(c(MAE_raw = mae_raw, MAE_shrunk = mae_shr)) plot(Rshr, title = "Schafer-Strimmer shrinkage correlation") # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(Rshr) }## Multivariate normal with AR(1) dependence (Toeplitz correlation) set.seed(1) n <- 80; p <- 40; rho <- 0.6 d <- abs(outer(seq_len(p), seq_len(p), "-")) Sigma <- rho^d X <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma) colnames(X) <- paste0("V", seq_len(p)) Rshr <- shrinkage_corr(X) print(Rshr, digits = 2, n = 6, max_vars = 6) summary(Rshr) estimate(Rshr) tidy(Rshr) plot(Rshr) ## Shrinkage typically moves the sample correlation closer to the truth Rraw <- stats::cor(X) off <- upper.tri(Sigma, diag = FALSE) mae_raw <- mean(abs(Rraw[off] - Sigma[off])) mae_shr <- mean(abs(Rshr[off] - Sigma[off])) print(c(MAE_raw = mae_raw, MAE_shrunk = mae_shr)) plot(Rshr, title = "Schafer-Strimmer shrinkage correlation") # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(Rshr) }
Computes all pairwise skipped correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++' backend.
Skipped correlation detects bivariate outliers using a projection rule and then computes Pearson or Spearman correlation on the retained observations. It is designed for situations where marginally robust methods can still be distorted by unusual points in the joint data cloud.
skipped_corr( data, method = c("pearson", "spearman"), na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), return_masks = FALSE, stand = TRUE, outlier_rule = c("idealf", "mad"), cutoff = sqrt(stats::qchisq(0.975, df = 2)), n_boot = 2000L, p_adjust = c("none", "hochberg", "ecp"), fwe_level = 0.05, n_mc = 1000L, seed = NULL, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) skipped_corr_masks(x, var1 = NULL, var2 = NULL) ## S3 method for class 'skipped_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 4, show_ci = NULL, show_p = c("auto", "yes", "no"), ... ) ## S3 method for class 'skipped_corr' plot( x, title = "Skipped correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'skipped_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.skipped_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )skipped_corr( data, method = c("pearson", "spearman"), na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), return_masks = FALSE, stand = TRUE, outlier_rule = c("idealf", "mad"), cutoff = sqrt(stats::qchisq(0.975, df = 2)), n_boot = 2000L, p_adjust = c("none", "hochberg", "ecp"), fwe_level = 0.05, n_mc = 1000L, seed = NULL, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) skipped_corr_masks(x, var1 = NULL, var2 = NULL) ## S3 method for class 'skipped_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 4, show_ci = NULL, show_p = c("auto", "yes", "no"), ... ) ## S3 method for class 'skipped_corr' plot( x, title = "Skipped correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'skipped_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.skipped_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. |
method |
Correlation computed after removing projected outliers. One of
|
na_method |
Character scalar controlling missing-data handling.
With |
ci |
Logical; if |
p_value |
Logical; if |
conf_level |
Confidence level used when |
n_threads |
Integer |
return_masks |
Logical; if |
stand |
Logical; if |
outlier_rule |
One of |
cutoff |
Positive numeric constant multiplying the projected spread in
the outlier rule
|
n_boot |
Integer |
p_adjust |
One of |
fwe_level |
Familywise-error level used when
|
n_mc |
Integer |
seed |
Optional positive integer used to seed the bootstrap resampling
when |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
x |
An object of class |
var1, var2
|
Optional column names or 1-based column indices used by
|
digits |
Integer; number of digits to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for skipped-correlation confidence limits. |
show_ci |
One of |
show_p |
One of |
... |
Additional arguments passed to the underlying print or plot helper. |
title |
Character; plot title. |
low_color, high_color, mid_color
|
Colors used in the heatmap. |
value_text_size |
Numeric text size for overlaid cell values. |
ci_text_size |
Text size for confidence intervals in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
Let be a numeric matrix with rows as
observations and columns as variables. For a given pair of columns
, write the observed bivariate points as
, . If stand = TRUE,
each margin is first centred by its median and divided by a robust scale
estimate before outlier detection; otherwise the original pair is used. The
robust scale is the MAD when positive, with fallback to
and then the usual sample standard deviation if
needed. Let denote the resulting points and let be
the componentwise median center of the detection cloud.
For each observation , define the direction vector
. When , all observations are
projected onto the line through in direction . The
projected distances are
For each direction , observation is flagged as an outlier if
where is either the ideal-fourths interquartile width
(outlier_rule = "idealf") or the median absolute deviation
(outlier_rule = "mad"). An observation is removed if it is flagged
for at least one projection direction. The skipped correlation is then the
ordinary Pearson or Spearman correlation computed from the retained
observations:
where is the index set of observations not flagged as
outliers.
Unlike marginally robust methods such as pbcor(), wincor(),
or bicor(), skipped correlation is explicitly pairwise because
outlier detection depends on the joint geometry of each variable pair. As a
result, the reported matrix need not be positive semidefinite, even with
complete data.
Computational notes. In the complete-data path, each column pair
requires a full bivariate projection search, so the dominant cost is higher
than for marginal robust methods. The implementation evaluates pairs in
'C++'; where available, pairs are processed with 'OpenMP' parallelism. With
na_method = "pairwise", each pair is recomputed on its overlap of
non-missing rows. With na_method = "complete", rows with any
non-finite value across the retained numeric columns are removed before any
pairwise outlier search. This gives a common row universe for all
skipped-correlation masks.
Bootstrap inference. When ci = TRUE or p_value = TRUE,
the implementation uses the percentile-bootstrap strategy studied by Wilcox
(2015). Each bootstrap replicate resamples whole observation pairs with
replacement, reruns the skipped-correlation outlier detection on the
resampled data, and recomputes the skipped correlation on the retained
observations. This corresponds to Wilcox's B2 method and avoids the
statistically unsatisfactory shortcut of removing outliers only once before
bootstrapping. Bootstrap inference currently requires complete data
(na_method = "error" or "complete"). When
p_adjust = "hochberg", the
bootstrap p-values are processed with Hochberg's step-up procedure (method H
in Wilcox, Rousselet, and Pernet, 2018). When p_adjust = "ecp", the
package follows their ECP method and simulates n_mc null data sets
from a -variate normal distribution with identity covariance,
recomputes the pairwise bootstrap p-values for each null data set, stores the
minimum p-value from each run, and estimates the fwe_level quantile of
that null distribution using the Harrell-Davis estimator. Hypotheses are then
rejected when their observed bootstrap p-values are less than or equal to the
estimated critical p-value. The calibrated H1 procedure from Wilcox,
Rousselet, and Pernet (2018) is not currently implemented.
A symmetric correlation matrix with class skipped_corr and
attributes method = "skipped_correlation", description, and
package = "matrixCorr". When return_masks = TRUE, the matrix
also carries a skipped_masks attribute containing compact pairwise
skipped-row indices. The diagnostics attribute stores per-pair
complete-case counts and skipped-row counts/proportions. When
ci = TRUE or p_value = TRUE, bootstrap inference matrices are
attached via attributes.
Thiago de Paula Oliveira
Wilcox, R. R. (2004). Inferences based on a skipped correlation coefficient. Journal of Applied Statistics, 31(2), 131-143. doi:10.1080/0266476032000148821
Wilcox, R. R. (2015). Inferences about the skipped correlation coefficient: Dealing with heteroscedasticity and non-normality. Journal of Modern Applied Statistical Methods, 14(1), 172-188. doi:10.22237/jmasm/1430453580
Wilcox, R. R., Rousselet, G. A., & Pernet, C. R. (2018). Improved methods for making inferences about multiple skipped correlations. Journal of Statistical Computation and Simulation, 88(16), 3116-3131. doi:10.1080/00949655.2018.1501051
set.seed(12) X <- matrix(rnorm(160 * 4), ncol = 4) X[1, 1] <- 9 X[1, 2] <- -8 R <- skipped_corr(X, method = "pearson") print(R, digits = 2) summary(R) plot(R) Rm <- skipped_corr(X, method = "pearson", return_masks = TRUE) skipped_corr_masks(Rm, 1, 2) # Example 1: Xm <- as.matrix(datasets::mtcars[, c("mpg", "disp", "hp", "wt")]) Rm2 <- skipped_corr(Xm, method = "spearman") print(Rm2, digits = 2) # Example 2: Ri <- skipped_corr(Xm, method = "pearson", ci = TRUE, n_boot = 40, seed = 1) ci(Ri) confint(Ri) tidy(Ri) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(R) }set.seed(12) X <- matrix(rnorm(160 * 4), ncol = 4) X[1, 1] <- 9 X[1, 2] <- -8 R <- skipped_corr(X, method = "pearson") print(R, digits = 2) summary(R) plot(R) Rm <- skipped_corr(X, method = "pearson", return_masks = TRUE) skipped_corr_masks(Rm, 1, 2) # Example 1: Xm <- as.matrix(datasets::mtcars[, c("mpg", "disp", "hp", "wt")]) Rm2 <- skipped_corr(Xm, method = "spearman") print(Rm2, digits = 2) # Example 2: Ri <- skipped_corr(Xm, method = "pearson", ci = TRUE, n_boot = 40, seed = 1) ci(Ri) confint(Ri) tidy(Ri) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(R) }
Computes pairwise Spearman's rank correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Optional confidence intervals are available via a jackknife Euclidean-likelihood method.
spearman_rho( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'spearman_rho' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'spearman_rho' plot( x, title = "Spearman's rank correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'spearman_rho' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'summary.spearman_rho' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )spearman_rho( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'spearman_rho' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'spearman_rho' plot( x, title = "Spearman's rank correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, ci_text_size = 3, show_value = TRUE, ... ) ## S3 method for class 'spearman_rho' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'summary.spearman_rho' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Each column must have at least two non-missing values. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
conf_level |
Confidence level used when |
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for Spearman confidence limits in the pairwise summary. |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum rho value. Default is
|
high_color |
Color for the maximum rho value. Default is
|
mid_color |
Color for zero correlation. Default is |
value_text_size |
Font size for displaying correlation values. Default
is |
ci_text_size |
Text size for confidence intervals in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
For each column , let
denote the (mid-)ranks of
, assigning average ranks to ties. The mean rank is
regardless of ties. Define the centred rank
vectors ,
where is the all-ones vector. The
Spearman correlation between columns and is the Pearson
correlation of their rank vectors:
In matrix form, with ,
for , and
,
the Spearman correlation matrix is
When there are no ties, the familiar rank-difference formula obtains
but this expression does not hold under ties; computing Pearson on
mid-ranks (as above) is the standard tie-robust approach. Without ties,
; with ties, the variance is
smaller.
and is symmetric
positive semi-definite by construction (up to floating-point error). The
implementation symmetrises the result to remove round-off asymmetry.
Spearman's correlation is invariant to strictly monotone transformations
applied separately to each variable.
Computation. Each column is ranked (mid-ranks) to form .
The product is computed via a 'BLAS' symmetric rank update
('SYRK'), and centred using
avoiding an explicit centred copy. Division by yields the sample
covariance of ranks; standardising by gives .
Columns with zero rank variance (all values equal) are returned as NA
along their row/column; the corresponding diagonal entry is also NA.
When na_method = "pairwise", each estimate is recomputed
on the pairwise complete-case overlap of columns and . When
ci = TRUE, confidence intervals are computed in 'C++' using the
jackknife Euclidean-likelihood method of de Carvalho and Marques (2012).
For a pairwise estimate , delete-one jackknife
pseudo-values are formed as
where is the Spearman correlation after removing observation
. The confidence limits solve
Ranking costs
; forming and normalising
costs with additional
memory. The optional jackknife Euclidean-likelihood confidence intervals add
per-pair delete-one recomputation work and are intended for inference rather
than raw-matrix throughput.
A symmetric numeric matrix where the (i, j)-th element is
the Spearman correlation between the i-th and j-th
numeric columns of the input. When ci = TRUE, the object also
carries a ci attribute with elements est, lwr.ci,
upr.ci, and conf.level. When pairwise-complete evaluation is
used, pairwise sample sizes are stored in attr(x, "diagnostics")$n_complete.
Invisibly returns the spearman_rho object.
A ggplot object representing the heatmap.
Missing values are rejected when na_method = "error". Columns
with fewer than two usable observations are excluded.
Thiago de Paula Oliveira
Spearman, C. (1904). The proof and measurement of association between two things. International Journal of Epidemiology, 39(5), 1137-1150.
de Carvalho, M., & Marques, F. (2012). Jackknife Euclidean likelihood-based inference for Spearman's rho. North American Actuarial Journal, 16(4), 487-492.
print.spearman_rho, plot.spearman_rho
## Monotone transformation invariance (Spearman is rank-based) set.seed(123) n <- 400; p <- 6; rho <- 0.6 Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-")) L <- chol(Sigma) X <- matrix(rnorm(n * p), n, p) %*% L colnames(X) <- paste0("V", seq_len(p)) X_mono <- X X_mono[, 1] <- exp(X_mono[, 1]) X_mono[, 2] <- log1p(exp(X_mono[, 2])) X_mono[, 3] <- X_mono[, 3]^3 sp_X <- spearman_rho(X) sp_m <- spearman_rho(X_mono) summary(sp_X) round(max(abs(sp_X - sp_m)), 3) plot(sp_X) ## Confidence intervals sp_ci <- spearman_rho(X[, 1:3], ci = TRUE) print(sp_ci, show_ci = "yes") summary(sp_ci) estimate(sp_ci) tidy(sp_ci) ci(sp_ci) confint(sp_ci) ## Ties handled via mid-ranks tied <- cbind( a = rep(1:5, each = 20), b = rep(5:1, each = 20) + rnorm(100, sd = 0.1), c = as.numeric(gl(10, 10)) ) sp_tied <- spearman_rho(tied, ci = TRUE) print(sp_tied, digits = 2, show_ci = "yes") # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(sp_X) }## Monotone transformation invariance (Spearman is rank-based) set.seed(123) n <- 400; p <- 6; rho <- 0.6 Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-")) L <- chol(Sigma) X <- matrix(rnorm(n * p), n, p) %*% L colnames(X) <- paste0("V", seq_len(p)) X_mono <- X X_mono[, 1] <- exp(X_mono[, 1]) X_mono[, 2] <- log1p(exp(X_mono[, 2])) X_mono[, 3] <- X_mono[, 3]^3 sp_X <- spearman_rho(X) sp_m <- spearman_rho(X_mono) summary(sp_X) round(max(abs(sp_X - sp_m)), 3) plot(sp_X) ## Confidence intervals sp_ci <- spearman_rho(X[, 1:3], ci = TRUE) print(sp_ci, show_ci = "yes") summary(sp_ci) estimate(sp_ci) tidy(sp_ci) ci(sp_ci) confint(sp_ci) ## Ties handled via mid-ranks tied <- cbind( a = rep(1:5, each = 20), b = rep(5:1, each = 20) + rnorm(100, sd = 0.1), c = as.numeric(gl(10, 10)) ) sp_tied <- spearman_rho(tied, ci = TRUE) print(sp_tied, digits = 2, show_ci = "yes") # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(sp_X) }
ccc_rm_reml ObjectsProduces a detailed summary of a "ccc_rm_reml" object, including
Lin's CCC estimates and associated variance component estimates per method pair.
## S3 method for class 'ccc_rm_reml' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.ccc_rm_reml' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )## S3 method for class 'ccc_rm_reml' summary( object, digits = 4, ci_digits = 2, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'summary.ccc_rm_reml' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
object |
An object of class |
digits |
Integer; number of decimal places to round CCC estimates and components. |
ci_digits |
Integer; decimal places for confidence interval bounds. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
Character string indicating whether to show confidence
intervals: |
... |
Passed to |
x |
An object of class |
A data frame of class "summary.ccc_rm_reml" with columns:
item1, item2, estimate, and optionally lwr,
upr, plus canonical repeated-measures counts n_subjects and
n_obs. Method-specific columns retain the scientific variance
component outputs: sigma2_subject, sigma2_subject_method,
sigma2_subject_time, sigma2_error, sigma2_extra,
SB, and se_ccc.
Representation-first summary for edge-list outputs.
## S3 method for class 'corr_edge_list' summary(object, topn = NULL, show_ci = NULL, ...)## S3 method for class 'corr_edge_list' summary(object, topn = NULL, show_ci = NULL, ...)
object |
An edge-list correlation result. |
topn |
Optional number of head/tail rows when preview is truncated. |
show_ci |
One of |
... |
Unused. |
A standardized summary data frame with class
c("summary.corr_result", "data.frame") (plus compatibility classes).
Representation-first summary for dense correlation outputs.
## S3 method for class 'corr_matrix' summary(object, topn = NULL, show_ci = NULL, ...)## S3 method for class 'corr_matrix' summary(object, topn = NULL, show_ci = NULL, ...)
object |
A dense correlation result ( |
topn |
Optional number of head/tail rows when preview is truncated. |
show_ci |
One of |
... |
Unused. |
A standardized summary data frame with class
c("summary.corr_result", "data.frame") (plus compatibility classes).
Representation-first summary for sparse correlation outputs.
## S3 method for class 'corr_sparse' summary(object, topn = NULL, show_ci = NULL, ...)## S3 method for class 'corr_sparse' summary(object, topn = NULL, show_ci = NULL, ...)
object |
A sparse correlation result. |
topn |
Optional number of head/tail rows when preview is truncated. |
show_ci |
One of |
... |
Unused. |
A standardized summary data frame with class
c("summary.corr_result", "data.frame") (plus compatibility classes).
Computes the tetrachoric correlation for either a pair of binary variables or all pairwise combinations of binary columns in a matrix/data frame.
tetrachoric( data, y = NULL, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE, conf_level = 0.95, correct = 0.5, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'tetrachoric_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'tetrachoric_corr' plot( x, title = "Tetrachoric correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'tetrachoric_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.tetrachoric_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )tetrachoric( data, y = NULL, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE, conf_level = 0.95, correct = 0.5, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'tetrachoric_corr' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'tetrachoric_corr' plot( x, title = "Tetrachoric correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'tetrachoric_corr' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.tetrachoric_corr' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A binary vector, matrix, or data frame. In matrix/data-frame mode, only binary columns are retained. |
y |
Optional second binary vector. When supplied, the function returns a single tetrachoric correlation estimate. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
p_value |
Logical (default |
conf_level |
Confidence level used when |
correct |
Non-negative continuity correction added to zero-count cells.
Default is |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum correlation. |
high_color |
Color for the maximum correlation. |
mid_color |
Color for zero correlation. |
value_text_size |
Font size used in tile labels. |
show_value |
Logical; if |
object |
An object of class |
ci_digits |
Integer; digits for confidence limits in the pairwise summary. |
p_digits |
Integer; digits for p-values in the pairwise summary. |
The tetrachoric correlation assumes that the observed binary variables arise
by dichotomising latent standard-normal variables. Let
with latent correlation , and define
observed binary variables by thresholds :
If the observed table has counts
for , the marginal proportions determine
the thresholds:
The estimator returned here is the maximum-likelihood estimate of the latent
correlation , obtained by maximizing the multinomial log-likelihood
built from the rectangle probabilities of the bivariate normal distribution:
where are the four bivariate-normal cell probabilities implied
by and the fixed thresholds. The implementation evaluates the
likelihood over by a coarse search followed by Brent
refinement in C++.
The argument correct adds a continuity correction only to zero-count
cells before threshold estimation and likelihood evaluation. This stabilises
the estimator for sparse tables and mirrors the conventional
correct = 0.5 continuity-correction behaviour used in several
latent-correlation implementations.
When correct = 0 and the observed contingency table contains zero
cells, the fit is non-regular and may be boundary-driven. In those cases the
returned object stores sparse-fit diagnostics, including whether the fit was
classified as boundary or near_boundary.
Assumptions. The coefficient is appropriate when both observed binary variables are viewed as thresholded versions of jointly normal latent variables. The optional p-values and confidence intervals adopt this latent-normal interpretation and use the same likelihood that defines the tetrachoric estimate. These inferential quantities are therefore model-based and should not be interpreted as distribution-free summaries.
Inference. When ci = TRUE or p_value = TRUE, the
function refits the pairwise tetrachoric model by maximum likelihood and
obtains the observed information matrix numerically in C++. The reported
confidence interval is a Wald interval
, and the
reported p-value is from the large-sample Wald -test for
. These inferential quantities are only computed when
explicitly requested.
In matrix/data-frame mode, all pairwise tetrachoric correlations are computed
between binary columns. Diagonal entries are 1 for non-degenerate
columns and NA for columns with fewer than two observed levels.
Variable-specific latent thresholds are stored in the thresholds
attribute, and pairwise sparse-fit diagnostics are stored in
diagnostics.
Computational complexity. For binary variables, the matrix
path evaluates pairwise likelihoods. Each pair uses a
one-dimensional optimisation with negligible memory overhead beyond the
output matrix.
If y is supplied, a numeric scalar with attributes
diagnostics and thresholds. Otherwise a symmetric matrix of
class tetrachoric_corr with attributes method,
description, package = "matrixCorr", diagnostics,
thresholds, and correct. When p_value = TRUE, the
returned object also carries an inference attribute with elements
estimate, statistic, parameter, p_value, and
n_obs. When ci = TRUE, it also carries a ci attribute
with elements est, lwr.ci, upr.ci, conf.level,
and ci.method, plus attr(x, "conf.level"). Scalar outputs keep
the same point estimate and gain the same metadata only when inference is
requested. In matrix mode, output = "edge_list" returns a data frame with columns
row, col, value; output = "sparse" returns a
symmetric sparse matrix.
Thiago de Paula Oliveira
Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society A, 195, 1-47.
Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460.
set.seed(123) n <- 1000 Sigma <- matrix(c( 1.00, 0.55, 0.35, 0.55, 1.00, 0.45, 0.35, 0.45, 1.00 ), 3, 3, byrow = TRUE) Z <- mnormt::rmnorm(n = n, mean = rep(0, 3), varcov = Sigma) X <- data.frame( item1 = Z[, 1] > stats::qnorm(0.70), item2 = Z[, 2] > stats::qnorm(0.60), item3 = Z[, 3] > stats::qnorm(0.50) ) tc <- tetrachoric(X) print(tc, digits = 3) summary(tc) estimate(tc) tidy(tc) tc_ci <- tetrachoric(X, ci = TRUE) ci(tc_ci) confint(tc_ci) plot(tc) tetrachoric(X, output = "edge_list", diag = FALSE) tetrachoric(X, output = "sparse", threshold = 0.4, diag = FALSE) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(tc) } # latent Pearson correlations used to generate the binary items round(stats::cor(Z), 2)set.seed(123) n <- 1000 Sigma <- matrix(c( 1.00, 0.55, 0.35, 0.55, 1.00, 0.45, 0.35, 0.45, 1.00 ), 3, 3, byrow = TRUE) Z <- mnormt::rmnorm(n = n, mean = rep(0, 3), varcov = Sigma) X <- data.frame( item1 = Z[, 1] > stats::qnorm(0.70), item2 = Z[, 2] > stats::qnorm(0.60), item3 = Z[, 3] > stats::qnorm(0.50) ) tc <- tetrachoric(X) print(tc, digits = 3) summary(tc) estimate(tc) tidy(tc) tc_ci <- tetrachoric(X, ci = TRUE) ci(tc_ci) confint(tc_ci) plot(tc) tetrachoric(X, output = "edge_list", diag = FALSE) tetrachoric(X, output = "sparse", threshold = 0.4, diag = FALSE) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(tc) } # latent Pearson correlations used to generate the binary items round(stats::cor(Z), 2)
Launches an interactive Shiny gadget that displays correlation heatmaps with
filtering, clustering, and hover inspection. The viewer accepts any
matrixCorr correlation result (for example the outputs from
pearson_corr(), spearman_rho(), kendall_tau(), bicor(),
pbcor(), wincor(), skipped_corr(), pcorr(), dcor(), or
shrinkage_corr()), a plain
matrix, or a named list of such objects. When a list is supplied the gadget
offers a picker to switch between results.
view_corr_shiny(x, title = NULL, default_max_vars = 40L)view_corr_shiny(x, title = NULL, default_max_vars = 40L)
x |
A correlation result, a numeric matrix, or a named list of those objects. Each element must be square with matching row/column names. |
title |
Optional character title shown at the top of the gadget. |
default_max_vars |
Integer; maximum number of variables pre-selected when the app opens. Defaults to 40 so very wide matrices start collapsed. |
This helper lives in Suggests; it requires the shiny and
shinyWidgets packages at runtime and will optionally convert the plot to
an interactive widget when plotly is installed. Variable selection uses
a searchable picker, and clustering controls let you reorder variables via
hierarchical clustering on either absolute or signed correlations with a
choice of linkage methods.
Invisibly returns NULL; the function is called for its side
effect of launching a Shiny gadget.
if (interactive()) { data <- mtcars results <- list( Pearson = pearson_corr(data), Spearman = spearman_rho(data), Kendall = kendall_tau(data) ) view_corr_shiny(results) }if (interactive()) { data <- mtcars results <- list( Pearson = pearson_corr(data), Spearman = spearman_rho(data), Kendall = kendall_tau(data) ) view_corr_shiny(results) }
Launches a dedicated Shiny gadget for repeated-measures correlation matrix
objects of class "rmcorr_matrix". The viewer combines the correlation
heatmap with a pairwise scatterplot panel that rebuilds the corresponding
two-variable "rmcorr" fit for user-selected variables.
view_rmcorr_shiny(x, title = NULL, default_max_vars = 40L)view_rmcorr_shiny(x, title = NULL, default_max_vars = 40L)
x |
An object of class |
title |
Optional character title shown at the top of the gadget. |
default_max_vars |
Integer; maximum number of variables pre-selected in the heatmap view when the app opens. Defaults to 40. |
This helper requires the shiny and shinyWidgets
packages at runtime and will optionally use plotly for the heatmap
when available. The pairwise panel reuses the package's regular
plot.rmcorr() method, so the Shiny scatterplot matches the standard
pairwise repeated-measures correlation plot. To rebuild pairwise fits from a
returned "rmcorr_matrix" object, the matrix must have been created
with keep_data = TRUE.
Invisibly returns NULL; the function is called for its side
effect of launching a Shiny gadget.
if (interactive()) { set.seed(2026) n_subjects <- 20 n_rep <- 4 subject <- rep(seq_len(n_subjects), each = n_rep) subj_eff_x <- rnorm(n_subjects, sd = 1.5) subj_eff_y <- rnorm(n_subjects, sd = 2.0) within_signal <- rnorm(n_subjects * n_rep) dat <- data.frame( subject = subject, x = subj_eff_x[subject] + within_signal + rnorm(n_subjects * n_rep, sd = 0.2), y = subj_eff_y[subject] + 0.8 * within_signal + rnorm(n_subjects * n_rep, sd = 0.3), z = subj_eff_y[subject] - 0.4 * within_signal + rnorm(n_subjects * n_rep, sd = 0.4) ) fit_mat <- rmcorr( dat, response = c("x", "y", "z"), subject = "subject", keep_data = TRUE ) view_rmcorr_shiny(fit_mat) }if (interactive()) { set.seed(2026) n_subjects <- 20 n_rep <- 4 subject <- rep(seq_len(n_subjects), each = n_rep) subj_eff_x <- rnorm(n_subjects, sd = 1.5) subj_eff_y <- rnorm(n_subjects, sd = 2.0) within_signal <- rnorm(n_subjects * n_rep) dat <- data.frame( subject = subject, x = subj_eff_x[subject] + within_signal + rnorm(n_subjects * n_rep, sd = 0.2), y = subj_eff_y[subject] + 0.8 * within_signal + rnorm(n_subjects * n_rep, sd = 0.3), z = subj_eff_y[subject] - 0.4 * within_signal + rnorm(n_subjects * n_rep, sd = 0.4) ) fit_mat <- rmcorr( dat, response = c("x", "y", "z"), subject = "subject", keep_data = TRUE ) view_rmcorr_shiny(fit_mat) }
Computes weighted Cohen's kappa for either a pair of ordinal rating vectors or all pairwise combinations of ordinal columns in a matrix or data frame.
weighted_kappa( data, y = NULL, weights = c("quadratic", "linear", "unweighted"), levels = NULL, na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'weighted_kappa' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )weighted_kappa( data, y = NULL, weights = c("quadratic", "linear", "unweighted"), levels = NULL, na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'weighted_kappa' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
In matrix mode, a matrix or data frame whose rows are observational units and whose columns are raters or classifiers. Each column contains ordered category ratings. In two-vector mode, the first ordinal rating vector. |
y |
Optional second ordinal rating vector. When supplied, the function
returns a single weighted kappa estimate for |
weights |
Weight specification. The default |
levels |
Optional ordered category levels. Weighted kappa depends on category order and will not silently alphabetise arbitrary labels. If omitted, order is inferred only when all involved ratings are ordered factors with identical levels or when all involved ratings are numeric or integer and can be ordered by their sorted unique values. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical; if |
p_value |
Logical; if |
conf_level |
Confidence level used when |
n_threads |
Integer |
output |
Output representation for matrix mode.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
retain entries with |
diag |
Logical; whether to include diagonal entries in sparse and edge-list outputs. |
... |
Reserved for future extensions. Unsupported extra arguments are rejected. |
x |
A matrix-style |
digits |
Integer; number of decimal places for displayed values. |
n |
Optional preview row threshold. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns. |
width |
Optional display width. |
show_ci |
One of |
This implementation uses agreement/similarity weights internally:
"unweighted":
"linear":
"quadratic":
The "quadratic" scheme is the default and corresponds to
Fleiss-Cohen-style agreement weights. The "linear" scheme
corresponds to equal-spacing agreement weights.
For matrix mode, rows are shared observational units and columns are raters
or classifiers. A common category map is resolved in R and the main
computation is performed in C++. Missing-data handling follows the usual
matrixCorr na_method conventions. Pairwise complete counts are
stored in attr(x, "diagnostics")$n_complete.
Confidence intervals and standard errors.
Confidence intervals and p-values use the exact large-sample multinomial
delta-method formula. Let
be the observed weighted agreement,
the expected weighted agreement,
and . Define the weighted margin summaries
For each cell , the backend uses
The variance estimator is
where is the number of complete paired ratings. The standard error
is and the CI is the Wald interval
truncated to in the returned result. Use cohen_kappa() for
unordered nominal categories where all disagreements are equally serious.
If y is supplied, a scalar S3 object of class
c("weighted_kappa", "numeric") is returned with diagnostics and
optional ci and inference attributes. Otherwise a symmetric
matrix-style result with estimator class weighted_kappa.
Thiago de Paula Oliveira
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213-220. doi:10.1037/h0026256
cohen_kappa() for unweighted two-rater nominal agreement;
multirater_kappa() for panel-level nominal agreement among three or more
raters.
raters <- data.frame( r1 = ordered(c("low", "low", "mid", "high", "high"), levels = c("low", "mid", "high")), r2 = ordered(c("low", "mid", "mid", "high", "high"), levels = c("low", "mid", "high")), r3 = ordered(c("low", "low", "high", "high", "mid"), levels = c("low", "mid", "high")) ) wk <- weighted_kappa(raters) print(wk) summary(wk) estimate(wk) tidy(wk) plot(wk) x <- raters$r1 y <- raters$r2 weighted_kappa(x, y, weights = "linear")raters <- data.frame( r1 = ordered(c("low", "low", "mid", "high", "high"), levels = c("low", "mid", "high")), r2 = ordered(c("low", "mid", "mid", "high", "high"), levels = c("low", "mid", "high")), r3 = ordered(c("low", "low", "high", "high", "mid"), levels = c("low", "mid", "high")) ) wk <- weighted_kappa(raters) print(wk) summary(wk) estimate(wk) tidy(wk) plot(wk) x <- raters$r1 y <- raters$r2 weighted_kappa(x, y, weights = "linear")
Computes all pairwise Winsorized correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++' backend.
This function Winsorizes each margin at proportion tr and then
computes ordinary Pearson correlation on the Winsorized values. It is a
simple robust alternative to Pearson correlation when the main concern is
unusually large or small observations in the marginal distributions.
wincor( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), tr = 0.2, n_boot = 500L, seed = NULL, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'wincor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'wincor' plot( x, title = "Winsorized correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'wincor' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.wincor' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )wincor( data, na_method = c("error", "pairwise", "complete"), ci = FALSE, p_value = FALSE, conf_level = 0.95, n_threads = getOption("matrixCorr.threads", 1L), tr = 0.2, n_boot = 500L, seed = NULL, output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE ) ## S3 method for class 'wincor' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... ) ## S3 method for class 'wincor' plot( x, title = "Winsorized correlation heatmap", low_color = "indianred1", high_color = "steelblue1", mid_color = "white", value_text_size = 4, show_value = TRUE, ... ) ## S3 method for class 'wincor' summary( object, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, p_digits = 4, show_ci = NULL, ... ) ## S3 method for class 'summary.wincor' print( x, digits = NULL, n = NULL, topn = NULL, max_vars = NULL, width = NULL, show_ci = NULL, ... )
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
p_value |
Logical (default |
conf_level |
Confidence level used when |
n_threads |
Integer |
tr |
Winsorization proportion in |
n_boot |
Integer |
seed |
Optional positive integer used to seed the bootstrap resampling
when |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
x |
An object of class |
digits |
Integer; number of digits to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional arguments passed to the underlying print or plot helper. |
title |
Character; plot title. |
low_color, high_color, mid_color
|
Colors used in the heatmap. |
value_text_size |
Numeric text size for overlaid cell values. |
show_value |
Logical; if |
object |
An object of class |
ci_digits |
Integer; digits used for confidence limits in pairwise summaries. |
p_digits |
Integer; digits used for p-values in pairwise summaries. |
Let be a numeric matrix with rows as
observations and columns as variables. For a column
, write the order statistics as
and let
. The Winsorized values can be written as
For two columns and , the Winsorized correlation is the
ordinary Pearson correlation computed from and :
In matrix form, let contain the Winsorized columns and define
the centred, unit-norm columns
If , then the Winsorized
correlation matrix is
Winsorization acts on each margin separately, so it guards against marginal
outliers and heavy tails but does not target unusual points in the joint
cloud. This implementation Winsorizes each column in 'C++', centres and
normalises it, and forms the complete-data matrix from cross-products. With
na_method = "pairwise", each pair is recomputed on its overlap of
non-missing rows. As with Pearson correlation, the complete-data path yields
a symmetric positive semidefinite matrix, whereas pairwise deletion can
break positive semidefiniteness. If the Winsorized variance of a column is
zero, correlations involving that column are returned as NA.
When p_value = TRUE, inference follows the method-specific test based
on
evaluated against a -distribution with
degrees of freedom, where
and is the
pairwise complete-case sample size for the corresponding column pair. The
p-value is reported only when the pair is not identical and the resulting
degrees of freedom are positive. When ci = TRUE, the interval is a
percentile bootstrap interval based on resamples
drawn from the pairwise complete cases. If
denotes the sorted
bootstrap sample of finite estimates with retained resamples, the
reported limits are
where and
for
. Resamples that yield undefined
estimates are discarded before the percentile limits are formed.
Computational complexity. In the complete-data path, Winsorizing the
columns requires sorting within each column, and forming the cross-product
matrix costs with output storage. When
ci = TRUE, the bootstrap cost is incurred separately for each column
pair.
A symmetric correlation matrix with class wincor and
attributes method = "winsorized_correlation", description,
and package = "matrixCorr". When ci = TRUE, the returned
object also carries a ci attribute with elements est,
lwr.ci, upr.ci, conf.level, and ci.method,
plus attr(x, "conf.level"). When p_value = TRUE, it also
carries an inference attribute with elements estimate,
statistic, parameter, p_value, n_obs, and
alternative. When either inferential option is requested, the
object also carries diagnostics$n_complete.
Thiago de Paula Oliveira
Wilcox, R. R. (1993). Some results on a Winsorized correlation coefficient. British Journal of Mathematical and Statistical Psychology, 46(2), 339-349. doi:10.1111/j.2044-8317.1993.tb01020.x
Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press.
pbcor(), skipped_corr(), bicor()
set.seed(11) X <- matrix(rnorm(180 * 4), ncol = 4) X[sample(length(X), 6)] <- X[sample(length(X), 6)] - 12 R <- wincor(X, tr = 0.2) print(R, digits = 2) summary(R) estimate(R) tidy(R) plot(R) ## Bootstrap confidence intervals R_ci <- wincor(X, tr = 0.2, ci = TRUE, n_boot = 49, seed = 11) ci(R_ci) confint(R_ci) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(R) }set.seed(11) X <- matrix(rnorm(180 * 4), ncol = 4) X[sample(length(X), 6)] <- X[sample(length(X), 6)] - 12 R <- wincor(X, tr = 0.2) print(R, digits = 2) summary(R) estimate(R) tidy(R) plot(R) ## Bootstrap confidence intervals R_ci <- wincor(X, tr = 0.2, ci = TRUE, n_boot = 49, seed = 11) ci(R_ci) confint(R_ci) # Interactive viewing (requires shiny) if (interactive() && requireNamespace("shiny", quietly = TRUE)) { view_corr_shiny(R) }
Computes the directed Chatterjee rank correlation coefficient for numeric
vectors or for all directed column pairs of a numeric matrix/data frame.
The matrix orientation is result[i, j] = xi(V_i, V_j), where
V_i is the predictor/sorting variable and V_j is the
response/ranked variable.
xi_corr( data, y = NULL, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, ci_method = c("auto", "dette_kroll", "n_choose_m"), bootstrap_reps = 999L, m = NULL, large_sample_cutoff = 1000L, bias_correction = c("none", "upper_bound"), tie_method = c("random", "first"), seed = NULL, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'chatterjee_xi' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'chatterjee_xi_scalar' print(x, digits = 4, ci_digits = 4, ...) ## S3 method for class 'chatterjee_xi_scalar' summary(object, digits = 4, ci_digits = 4, ...) ## S3 method for class 'summary.chatterjee_xi_scalar' print(x, digits = NULL, ci_digits = NULL, ...)xi_corr( data, y = NULL, na_method = c("error", "pairwise", "complete"), ci = FALSE, conf_level = 0.95, ci_method = c("auto", "dette_kroll", "n_choose_m"), bootstrap_reps = 999L, m = NULL, large_sample_cutoff = 1000L, bias_correction = c("none", "upper_bound"), tie_method = c("random", "first"), seed = NULL, n_threads = getOption("matrixCorr.threads", 1L), output = c("matrix", "sparse", "edge_list"), threshold = 0, diag = TRUE, ... ) ## S3 method for class 'chatterjee_xi' print( x, digits = 4, n = NULL, topn = NULL, max_vars = NULL, width = NULL, ci_digits = 3, show_ci = NULL, ... ) ## S3 method for class 'chatterjee_xi_scalar' print(x, digits = 4, ci_digits = 4, ...) ## S3 method for class 'chatterjee_xi_scalar' summary(object, digits = 4, ci_digits = 4, ...) ## S3 method for class 'summary.chatterjee_xi_scalar' print(x, digits = NULL, ci_digits = NULL, ...)
data |
A numeric matrix or a data frame with at least two numeric
columns, or a numeric predictor vector when |
y |
Optional numeric response vector for two-vector mode. When supplied,
|
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
conf_level |
Confidence level used when |
ci_method |
Confidence interval method:
|
bootstrap_reps |
Number of m-out-of-n bootstrap replicates used when
|
m |
Optional subsample size for the m-out-of-n bootstrap. If
|
large_sample_cutoff |
For |
bias_correction |
Finite-sample normalisation:
|
tie_method |
How ties in the sorting variable
|
seed |
Optional positive integer seed for reproducible tie breaking and bootstrap resampling. |
n_threads |
Integer |
output |
Output representation for the computed directed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Compatibility arguments. The deprecated |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for confidence limits. |
show_ci |
One of |
object |
An object of class |
Let , be complete paired observations.
Chatterjee's rank correlation is directed: measures how well
is functionally determined by . It is not a symmetric
association measure, so in matrix mode
need not equal . The row variable is always the
sorting/predictor variable and the column variable is the response/ranked
variable.
For a given directed pair, observations are sorted by . Let
be the number of response values , and let
be the number of response values , evaluated for
the -th observation in sorted-by- order. The tied-response
finite-sample estimator is
If all response values are equal, the denominator is zero and the estimate
is NA. When there are no ties in , this reduces to the familiar
rank-difference expression
The raw finite-sample statistic is not forced to one on the diagonal; with
no ties and X = Y, it equals . Use
bias_correction = "upper_bound" only when this finite-sample
upper-bound normalisation is desired.
Ties in the sorting variable are handled before computing adjacent
rank differences. Chatterjee's definition uses random tie breaking, provided
here by tie_method = "random". For reproducible diagnostics or when
the input order should define tied- ordering, use
tie_method = "first". Ties in the response variable are
handled by the general formula above and do not require random
tie breaking.
Confidence intervals. The ordinary n-out-of-n bootstrap is not
used for Chatterjee's coefficient. Both available intervals use
m-out-of-n subsampling without replacement, consistent with the bootstrap
family considered for Chatterjee's rank correlation.
With ci_method = "dette_kroll", subsamples of size are drawn
without replacement and the limiting standard deviation is estimated as
The reported interval is the normal interval
If m = NULL, this method uses , bounded
to .
With ci_method = "n_choose_m", subsamples are also drawn without
replacement, but the interval is obtained by inverting the empirical
quantiles of the centred and scaled statistic
If and are the bootstrap quantiles,
the basic m-out-of-n limits are
If finite Monte Carlo quantiles fall entirely on one side of zero, the
inversion is anchored at zero so that the reported interval contains the
observed estimate; the bounds are not clipped to the population parameter
range. If m = NULL, this method uses
, bounded to , following the
implementation rule used by Dalitz, Arning and Goebbels.
Which CI method should be used? The Dette-Kroll interval is the
more conservative default for small to moderate complete-case sample sizes
in this implementation, because it uses the m-out-of-n bootstrap only to
estimate a normal-approximation standard error. The non-parametric
n-choose-m interval is useful for larger samples when a direct
m-out-of-n bootstrap interval is preferred, but Dalitz, Arning and Goebbels
report that this interval may need fairly large to approach nominal
coverage in some settings. Therefore ci_method = "auto" uses
"dette_kroll" when pair-specific n_complete <=
large_sample_cutoff and "n_choose_m" when
n_complete > large_sample_cutoff. Increase
bootstrap_reps for final analyses to reduce Monte Carlo error, and
consider setting m explicitly when a study protocol requires a
fixed subsample size.
Computation. For complete finite matrices without confidence
intervals, the C++ backend precomputes each column's sorting order and
response ranks and reuses them across directed pairs. This costs
time and memory. Pairwise
missing-data evaluation and bootstrap confidence intervals recompute on the
relevant complete-case samples and are correspondingly more expensive.
A directed numeric matrix where the (i, j)-th element is
Chatterjee's from the i-th numeric column to the
j-th numeric column. The dense matrix inherits from
c("corr_matrix", "chatterjee_xi", "corr_result", "matrix"). When
ci = TRUE, the object also carries a ci attribute with
elements est, lwr.ci, upr.ci, conf.level,
ci.method, se, m, and bootstrap_reps. When
pairwise-complete evaluation is used, pairwise sample sizes are stored in
attr(x, "diagnostics")$n_complete. In two-vector mode, a numeric
scalar is returned; when ci = TRUE, it carries confidence-interval
attributes.
Thiago de Paula Oliveira
Chatterjee, S. (2021). A New Coefficient of Correlation. Journal of the American Statistical Association, 116, 2009-2022.
Dette, H. and Kroll, M. (2025). A simple bootstrap for Chatterjee's rank correlation. Biometrika, 112(1), asae045. doi:10.1093/biomet/asae045.
Lin, Z. and Han, F. (2025). Limit theorems of Chatterjee's rank correlation. arXiv:2204.08031v4. doi:10.48550/arXiv.2204.08031.
Dalitz, C., Arning, J. and Goebbels, S. (2024). A Simple Bias Reduction for Chatterjee's Correlation.
## Example 1: independence versus functional dependence ## Chatterjee's xi targets whether Y is determined by X. set.seed(1) n <- 300 x <- runif(n, -1, 1) y_independent <- rnorm(n) y_function <- sin(2 * pi * x) c( independent = xi_corr(x, y_independent, tie_method = "first"), functional = xi_corr(x, y_function, tie_method = "first") ) ## Example 2: non-monotone functional dependence is directed ## x determines x^2, but x^2 does not determine the sign of x. ## The reverse raw finite-sample estimate can therefore be much smaller, ## and may be negative when sorted response ranks oscillate strongly. x <- seq(-1, 1, length.out = 300) y <- x^2 c( xi_x_to_y = xi_corr(x, y, tie_method = "first"), xi_y_to_x = xi_corr(y, x, tie_method = "first") ) ## Example 3: the raw finite-sample diagonal is not forced to one ## The optional upper-bound normalisation rescales this finite-sample ceiling. z <- 1:20 c( raw = xi_corr(z, z, tie_method = "first"), upper_bound = xi_corr( z, z, tie_method = "first", bias_correction = "upper_bound" ) ) ## Example 4: directed matrix workflow X <- cbind( x = x, square = x^2, sine = sin(2 * pi * x), noise = rnorm(length(x)) ) xi <- xi_corr(X, tie_method = "first") print(xi, digits = 3) summary(xi) estimate(xi) tidy(xi) ## Example 5: Dette-Kroll bootstrap interval for a moderate sample xi_ci <- xi_corr( X[, c("x", "sine")], ci = TRUE, ci_method = "dette_kroll", bootstrap_reps = 49, seed = 1, tie_method = "first" ) summary(xi_ci) ci(xi_ci) confint(xi_ci) plot(xi_ci) ## Example 6: n-choose-m interval for larger samples set.seed(2) x_large <- runif(1200, -1, 1) y_large <- sin(2 * pi * x_large) + rnorm(1200, sd = 0.2) xi_corr( x_large, y_large, ci = TRUE, ci_method = "n_choose_m", bootstrap_reps = 99, seed = 2, tie_method = "first" )## Example 1: independence versus functional dependence ## Chatterjee's xi targets whether Y is determined by X. set.seed(1) n <- 300 x <- runif(n, -1, 1) y_independent <- rnorm(n) y_function <- sin(2 * pi * x) c( independent = xi_corr(x, y_independent, tie_method = "first"), functional = xi_corr(x, y_function, tie_method = "first") ) ## Example 2: non-monotone functional dependence is directed ## x determines x^2, but x^2 does not determine the sign of x. ## The reverse raw finite-sample estimate can therefore be much smaller, ## and may be negative when sorted response ranks oscillate strongly. x <- seq(-1, 1, length.out = 300) y <- x^2 c( xi_x_to_y = xi_corr(x, y, tie_method = "first"), xi_y_to_x = xi_corr(y, x, tie_method = "first") ) ## Example 3: the raw finite-sample diagonal is not forced to one ## The optional upper-bound normalisation rescales this finite-sample ceiling. z <- 1:20 c( raw = xi_corr(z, z, tie_method = "first"), upper_bound = xi_corr( z, z, tie_method = "first", bias_correction = "upper_bound" ) ) ## Example 4: directed matrix workflow X <- cbind( x = x, square = x^2, sine = sin(2 * pi * x), noise = rnorm(length(x)) ) xi <- xi_corr(X, tie_method = "first") print(xi, digits = 3) summary(xi) estimate(xi) tidy(xi) ## Example 5: Dette-Kroll bootstrap interval for a moderate sample xi_ci <- xi_corr( X[, c("x", "sine")], ci = TRUE, ci_method = "dette_kroll", bootstrap_reps = 49, seed = 1, tie_method = "first" ) summary(xi_ci) ci(xi_ci) confint(xi_ci) plot(xi_ci) ## Example 6: n-choose-m interval for larger samples set.seed(2) x_large <- runif(1200, -1, 1) y_large <- sin(2 * pi * x_large) + rnorm(1200, sd = 0.2) xi_corr( x_large, y_large, ci = TRUE, ci_method = "n_choose_m", bootstrap_reps = 99, seed = 2, tie_method = "first" )