Title: | Quantification of Asymmetric Dependence |
---|---|
Description: | A copula-based measure for quantifying asymmetry in dependence and associations. Documentation and theory about 'qad' is provided by the paper by Junker, Griessenberger & Trutschnig (2021, <doi:10.1016/j.csda.2020.107058>), and the paper by Trutschnig (2011, <doi:10.1016/j.jmaa.2011.06.013>). |
Authors: | Thimo Kasper [aut, cre], Florian Griessenberger [aut], Robert R. Junker [aut], Valentin Petzel [aut], Wolfgang Trutschnig [aut] |
Maintainer: | Thimo Kasper <[email protected]> |
License: | GPL-2 |
Version: | 1.0.3 |
Built: | 2024-11-26 04:37:54 UTC |
Source: | https://github.com/griefl/qad |
A copula-based measure for quantifying asymmetry in dependence and associations.
Package: | qad |
Type: | Package |
Version: | 1.0.0 |
Date: | 2021-02-26 |
Florian Griessenberger: [email protected],
Robert R. Junker: [email protected],
Valentin Petztel,
Wolfgang Trutschnig: [email protected]
# #Create data set # n <- 100 # x <- rnorm(n,0,2) # y <- x^2 + rnorm(n) # sample <- data.frame(x,y) # plot(sample, pch = 16) # # # #Function: qad() # qad(sample, p.value = TRUE, p.value_asymmetry = TRUE) # fit <- qad(sample) # plot(fit) # plot(fit, copula = TRUE) # plot(fit, copula = TRUE, addSample = T) # # #Functions: summary(), coef() # summary(fit) # coef(fit) # # #Function: predict() # values <- c(-2.4,1,0,2.6) # predict.qad(fit, values = values, conditioned = 'x1') # predict(fit, values, conditioned = "x1", pred_plot = TRUE, panel.grid = FALSE) # # values <- c(0.1,0.5) # predict(fit, values, conditioned = "x2", copula = TRUE, pred_plot = TRUE) # # #Function: pairwise.qad and heatmap.qad # df <- iris[1:4] # mod <- pairwise.qad(df) # heatmap.qad(mod, select = 'dependence')
# #Create data set # n <- 100 # x <- rnorm(n,0,2) # y <- x^2 + rnorm(n) # sample <- data.frame(x,y) # plot(sample, pch = 16) # # # #Function: qad() # qad(sample, p.value = TRUE, p.value_asymmetry = TRUE) # fit <- qad(sample) # plot(fit) # plot(fit, copula = TRUE) # plot(fit, copula = TRUE, addSample = T) # # #Functions: summary(), coef() # summary(fit) # coef(fit) # # #Function: predict() # values <- c(-2.4,1,0,2.6) # predict.qad(fit, values = values, conditioned = 'x1') # predict(fit, values, conditioned = "x1", pred_plot = TRUE, panel.grid = FALSE) # # values <- c(0.1,0.5) # predict(fit, values, conditioned = "x2", copula = TRUE, pred_plot = TRUE) # # #Function: pairwise.qad and heatmap.qad # df <- iris[1:4] # mod <- pairwise.qad(df) # heatmap.qad(mod, select = 'dependence')
Computation of the D1 distance between two checkerboard copulas A and B, corresponding to the random vectors (X1,Y1) and (X2,Y2), respectively.
The function D1()
computes the difference between the dependence structures of two random vectors. The function D1.ECBC()
computes the D1-distance between two checkerboard copulas with the same resolution.
The function zeta1()
is defined as 3D1(A,Pi), where Pi denotes the independence copula and returns the dependence measure computed in qad.
D1(x1, y1, x2, y2, resolution = NULL) D1.ECBC(A, B) zeta1(X, Y, resolution = NULL)
D1(x1, y1, x2, y2, resolution = NULL) D1.ECBC(A, B) zeta1(X, Y, resolution = NULL)
x1 |
a (non-empty) numeric vector of data values for the first random vector (first coordinate) |
y1 |
a (non-empty) numeric vector of data values for the first random vector (second coordinate) |
x2 |
a (non-empty) numeric vector of data values for the second random vector (first coordinate) |
y2 |
a (non-empty) numeric vector of data values for the second random vector (second coordinate) |
resolution |
integer indicating the resolution of the checkerboard copula. (default = NULL) |
A |
Numeric matrix of dimension NxN indicating the mass of the first N-checkerboad copula |
B |
Numeric matrix of dimension NxN indicating the mass of the second N-checkerboad copula |
X |
Numeric vector of values in the first coordinate |
Y |
Numeric vector of values in the second coordinate |
D1()
returns the D1 distance, introduced in (Trutschnig, 2011).
D1.ECBC()
returns the D1-distance between to checkerboard copulas A and B with same resolution
zeta1()
returns the directed dependence from x to y.
Trutschnig, W. (2011). On a strong metric on the space of copulas and its induced dependence measure. Journal of Mathematical Analysis and Applications. 384 (2), 690-705.
Junker, R.R., Griessenberger, F. and Trutschnig, W. (2021). Estimating scale-invariant directed dependence of bivariate distributions. Computational Statistics and Data Analysis, 153, 107058.
n <- 100 x1 <- runif(n) y1 <- x1 x2 <- runif(n) y2 <- 1-x2 D1(x1,y1,x2,y2) n <- 1000 x <- runif(n, 0, 1) y1 <- ifelse(x < 0.5, runif(length(x < 0.5), 0,0.5), runif(length(x >= 0.5), 0.5, 1)) y2 <- ifelse(x > 0.5, runif(length(x < 0.5), 0,0.5), runif(length(x >= 0.5), 0.5, 1)) A <- ECBC(x,y1, resolution = 50) B <- ECBC(x,y2, resolution = 50) #plot_density(A) #plot_density(B) D1.ECBC(A,B)
n <- 100 x1 <- runif(n) y1 <- x1 x2 <- runif(n) y2 <- 1-x2 D1(x1,y1,x2,y2) n <- 1000 x <- runif(n, 0, 1) y1 <- ifelse(x < 0.5, runif(length(x < 0.5), 0,0.5), runif(length(x >= 0.5), 0.5, 1)) y2 <- ifelse(x > 0.5, runif(length(x < 0.5), 0,0.5), runif(length(x >= 0.5), 0.5, 1)) A <- ECBC(x,y1, resolution = 50) B <- ECBC(x,y2, resolution = 50) #plot_density(A) #plot_density(B) D1.ECBC(A,B)
The function ECBC()
computes the mass distribution of the
empirical (checkerboard) copula, given a bi-variate sample.
If resolution equals sample size, the bi-linearly extended empirical copula is returned.
Note, if there are ties in the sample an adjusted empirical copula is calculated. The function
ECBC.eval()
evaluates the checkerboard copula at given points.
ECBC(X, Y, resolution = NULL) ECBC.eval(CB, eval.points)
ECBC(X, Y, resolution = NULL) ECBC.eval(CB, eval.points)
X |
Numeric vector of values in the first coordinate |
Y |
Numeric vector of values in the second coordinate |
resolution |
Integer indicating the resolution of the checkerboard aggregation, i.e. the number of vertical/horizontal strips of the checkerboard copula. (default = NULL, sets resolution = floor(sqrt(sample size))) |
CB |
A numeric mass matrix of a checkerboard copula (ECBC) |
eval.points |
A numeric matrix or data.frame indicating the eval.points (x,y) |
If the observations are drawn from a continuous distribution (no ties in the sample),
the function ECBC()
returns the commonly used empirical checkerboard copula.
If there are ties in the sample, the empirical copula is adjusted and calculated in the following way:
Let (u_i,v_i) := (F_n(x_i),G_n(y_i)) be the pseudo-observations for i in {1,...,n} and (u_1',v_1'),..., (u_m',v_m') the distinct pairs of pseudo-observations with m leq n. Moreover set S_1:={0, u_1, ..., u_m_1} and S_2:={0, v_1,..., v_m_2} and define the quantities t_i,r_i and s_i for i=1,..., m by
where 1 defines the indicator function. Define the empirical subcopula A'_n: S_1 x S_2 to {0,1/n, ..., (n-1)/n,1} by
for all s_1 in S_1 and s_2 in S_2.
We extend the subcopula A'_n to a copula by defining the transformations w_i:[0,1]^2 to [u_i'-r_i/n,u_i'] x [v_i'-s_i/n,v_i'] by
and set the measure of the empirical copula mu_A_n^B := 1/n sum_i=1^m t_i mu_B^w_i, where B denotes the product copula.
ECBC()
returns a matrix with the mass distribution of the empirical
(checkerboard) copula.
Deheuvels, P. (1979). La fonction de dependance empirique et ses proprietas: un test non parametrique d'independance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274-292.
Li, X., Mikusinski, P. and Taylor, M.D. (1998). Strong approximation of copulas, Journal of Mathematical Analysis and Applications, 255, 608-623.
Genest, C., Neshlehova J.G. and Remillard, B. (2014). On the empirical multilinear copula process for count data. Bernoulli, 20 (3), 1344-1371.
Junker, R.R., Griessenberger, F. and Trutschnig, W. (2021). Estimating scale-invariant directed dependence of bivariate distributions. Computational Statistics and Data Analysis, 153, 107058.
##Generate data drawn from the product copula and compute the empirical (checkerboard) copula n <- 100 x <- runif(n, 0, 1) y <- runif(n, 0, 1) mass <- ECBC(x,y, resolution = 10) plot_density(mass) mass <- ECBC(x,y, resolution = n) plot_density(mass) ## Compute empirical checkerboard copula of a sample with ties and plot density n <- 100 x <- sample(runif(n, -1, 1), n, replace=TRUE) y <- x^2 + rnorm(n, 0, 1) mass <- ECBC(x,y) plot_density(mass)
##Generate data drawn from the product copula and compute the empirical (checkerboard) copula n <- 100 x <- runif(n, 0, 1) y <- runif(n, 0, 1) mass <- ECBC(x,y, resolution = 10) plot_density(mass) mass <- ECBC(x,y, resolution = n) plot_density(mass) ## Compute empirical checkerboard copula of a sample with ties and plot density n <- 100 x <- sample(runif(n, -1, 1), n, replace=TRUE) y <- x^2 + rnorm(n, 0, 1) mass <- ECBC(x,y) plot_density(mass)
The function emp_c_copula()
computes the mass distribution of the
empirical (checkerboard) copula, given a bivariate sample. emp_c_copula_eval()
evaluates the
the empirical (checkerboard) copula at given points.
If smoothing
= FALSE, the empirical copula is computed (if there are ties in the sample an adjusted empirical copula is computed),
otherwise the empirical checkerboard copula - a smoothed version of the empirical copula - is computed. For more information of the calculations, see details.
emp_c_copula(X, smoothing = TRUE, resolution) emp_c_copula_eval(X, u, smoothing = TRUE, resolution)
emp_c_copula(X, smoothing = TRUE, resolution) emp_c_copula_eval(X, u, smoothing = TRUE, resolution)
X |
a data frame with two columns containing the observations of the sample. Each row contains one observation. |
smoothing |
a logial indicating whether the checkerboard aggregation is computed (default = TRUE). |
resolution |
an integer indicating the resolution of the checkerboard aggregation, i.e. the number of vertical/horizontal strips of the checkerboard copula. |
u |
a data.frame with two columns containing the evaluation points. Each row consists of a x and y value. |
If the observations come from a distribution with continuous margins,
i.e. there are no ties in the sample, the function emp_c_copula()
gives the same result
as the function C.n()
in the copula
package.
If there are ties in the sample, the empirical copula is adjusted and calculated in the following way:
Let (u_i,v_i) := (F_n(x_i),G_n(y_i)) be the pseudo-observations for i in {1,...,n} and (u_1',v_1'),..., (u_m',v_m') the distinct pairs of pseudo-observations with m leq n. Moreover set S_1:={0, u_1, ..., u_m_1} and S_2:={0, v_1,..., v_m_2} and define the quantities t_i,r_i and s_i for i=1,..., m by
where 1 defines the indicator function. Define the empirical subcopula A'_n: S_1 x S_2 to {0,1/n, ..., (n-1)/n,1} by
for all s_1 in S_1 and s_2 in S_2.
We extend the subcopula A'_n to a copula by defining the transformations w_i:[0,1]^2 to [u_i'-r_i/n,u_i'] x [v_i'-s_i/n,v_i'] by
and set the measure of the empirical copula mu_A_n^B := 1/n sum_i=1^m t_i mu_B^w_i, where B denotes the product copula.
The checkerboard aggregation is computed as usual (see references).
emp_c_copula()
returns a matrix with the mass distribution of the empirical
(checkerboard) copula.
emp_c_copula_eval()
returns a vector of evaluations of the empirical
(checkerboard) copula.
The calculation of the empirical copula with a high sample size (and resolution rate) can take time.
Deheuvels, P. (1979). La fonction de dependance empirique et ses proprietas: un test non parametrique d'independance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274-292.
Li, X., Mikusinski, P. and Taylor, M.D. (1998). Strong approximation of copulas, Journal of Mathematical Analysis and Applications, 255, 608-623.
Genest, C., Neshlehova J.G. and Remillard, B. (2014). On the empirical multilinear copula process for count data. Bernoulli, 20 (3), 1344-1371.
The pairwise computed dependence measures (output of the function pairwise.qad()
) are illustrated by a heatmap.
heatmap.qad( pw_qad, select = c("dependence", "max.dependence", "asymmetry"), fontsize = 4, significance = FALSE, use_p.adjust = TRUE, sign.level = 0.05, scale = "abs", color = "plasma", white_font = 0.7, rb_values = c(10, 0.315, 0.15), title = "" )
heatmap.qad( pw_qad, select = c("dependence", "max.dependence", "asymmetry"), fontsize = 4, significance = FALSE, use_p.adjust = TRUE, sign.level = 0.05, scale = "abs", color = "plasma", white_font = 0.7, rb_values = c(10, 0.315, 0.15), title = "" )
pw_qad |
output of the function |
select |
a character indicating which dependence value is plotted. Options are c("dependence", "max.dependence", "asymmetry"). |
fontsize |
a numeric specifying the font size of the values. |
significance |
a logical indicating whether significant values with respect to the (adjusted) qad p.values are denoted by a star. |
use_p.adjust |
a logical indicating if the adjusted p.values are used (default = TRUE). |
sign.level |
numeric value indicating the significance level. |
scale |
character indicating whether the heatmap uses a relative or absolute scale. Options are 'rel' or 'abs' (default). |
color |
Select the color palette. Options are c("plasma" (default), "viridis", "inferno", "magma", "cividis", "rainbow"). |
white_font |
numeric between 0 and 1 denoting the start value for white text font (default = 0.7) |
rb_values |
a vector of size 3 with number of values, start value and end value in the rainbow colors space (if color = 'rainbow'). |
title |
The text for the title |
If the output of pairwise.qad
() contains p-values, significant values can be highlighted by stars by setting significance=TRUE.
a heatmap
n <- 100 x1 <- runif(n, 0, 1) x2 <- x1^2 + rnorm(n, 0, 0.1) x3 <- runif(n, 0, 1) x4 <- x3 - x2 + rnorm(n, 0, 0.1) sample_df <- data.frame(x1,x2,x3,x4) #Fit qad model <- pairwise.qad(sample_df, p.value = FALSE) heatmap.qad(model, select = "dependence", fontsize = 6)
n <- 100 x1 <- runif(n, 0, 1) x2 <- x1^2 + rnorm(n, 0, 0.1) x3 <- runif(n, 0, 1) x4 <- x3 - x2 + rnorm(n, 0, 0.1) sample_df <- data.frame(x1,x2,x3,x4) #Fit qad model <- pairwise.qad(sample_df, p.value = FALSE) heatmap.qad(model, select = "dependence", fontsize = 6)
Pairwise computation of the function qad
(). qad
() is applied on each pair of variables of a numeric data.frame.
pairwise.qad( data_df, remove.00 = FALSE, min.res = 3, p.value = TRUE, nperm = 1000, p.adjust.method = "fdr", p.value_asymmetry = FALSE, nboot = 1000 )
pairwise.qad( data_df, remove.00 = FALSE, min.res = 3, p.value = TRUE, nperm = 1000, p.adjust.method = "fdr", p.value_asymmetry = FALSE, nboot = 1000 )
data_df |
a data frame containing numeric columns with the observations of the sample. |
remove.00 |
a logical indicating whether double 0 entries should be excluded (default = FALSE) |
min.res |
an integer indicating the necessary minimum resolution of the checkerboard grid to compute qad, otherwise the result is NA (default = 3). |
p.value |
a logical indicating whether to return a p-value of rejecting independence (based on permutation). |
nperm |
an integer indicating the number of permutation runs. |
p.adjust.method |
a character string denoting the p.value correction method (see function p.adjust in stats). Options are c('holm', 'hochberg', 'hommel', 'bonferroni', 'BH', 'BY', 'fdr' (default), 'none') |
p.value_asymmetry |
a logical indicating whether a p-value (based on bootstrap) is computed for the measure of asymmetry. |
nboot |
an integer indicating the number of bootstrapping runs. |
a list, containing data.frames with the dependence measures, corresponding p.values, the resolution of the checkerboard aggregation and the number of removed double zero entries (only if remove.00 = TRUE).
The output of pairwise.qad() can be illustrated using the function heatmap.qad()
.
n <- 100 x1 <- runif(n, 0, 1) x2 <- x1^2 + rnorm(n, 0, 0.1) x3 <- runif(n, 0, 1) x4 <- x3 - x2 + rnorm(n, 0, 0.1) sample_df <- data.frame(x1,x2,x3,x4) #Fit qad model <- pairwise.qad(sample_df, p.value = TRUE, p.adjust.method = "fdr") heatmap.qad(model, select = "dependence", fontsize = 6)
n <- 100 x1 <- runif(n, 0, 1) x2 <- x1^2 + rnorm(n, 0, 0.1) x3 <- runif(n, 0, 1) x4 <- x3 - x2 + rnorm(n, 0, 0.1) sample_df <- data.frame(x1,x2,x3,x4) #Fit qad model <- pairwise.qad(sample_df, p.value = TRUE, p.adjust.method = "fdr") heatmap.qad(model, select = "dependence", fontsize = 6)
A dataset containing 140 plots with estimated year of deglaciation and plant Shannon diversity.
PlantDiversity
PlantDiversity
A data frame with 140 rows and 2 variables:
estimated year of deglaciation
Shannon diversity of plant ecology
The data set contains the estimated year of deglaciation and plant diversity of 140 plots along the successional gradient of the forefield of the Ödenwinkelkees glacier (Austria). The plots (1 to 135) were established within the glacier forefield, the plots (136 to 140) were established outside the glacier forefield (i.e., age greater than 170 years). Since qad is invariant w.r.t. transformations the estimated year of these 5 plots is set to 1800.
#' @source doi:10.5194/we-20-95-2020
Plots the density/mass of the empirical checkerboard copula.
plot_density( mass_matrix, density = TRUE, color = "plasma", rb_values = c(10, 0.315, 0.15) )
plot_density( mass_matrix, density = TRUE, color = "plasma", rb_values = c(10, 0.315, 0.15) )
mass_matrix |
a squared matrix containing the mass distribution, e.g. output of the function |
density |
a logical (TRUE = default) whether the density or the mass is plotted. |
color |
Select the color palette. Options are c("plasma" (default), "viridis", "inferno", "magma", "cividis"). |
rb_values |
a vector of size 3 with number of values, start value and end value in the rainbow colors space. |
a density plot (or mass distribution)
n <- 1000 x <- runif(n,0,1) y <- runif(n,0,1) plot(x,y,pch = 16) mass <- ECBC(x,y,resolution = 10) plot_density(mass, density=TRUE) plot_density(mass, density=FALSE)
n <- 1000 x <- runif(n,0,1) y <- runif(n,0,1) plot(x,y,pch = 16) mass <- ECBC(x,y,resolution = 10) plot_density(mass, density=TRUE) plot_density(mass, density=FALSE)
Visualizes the conditional probabilities for each strip of the checkerboard copula in the copula setting or in the retransformed sample setting.
## S3 method for class 'qad' plot( x, addSample = FALSE, copula = FALSE, density = FALSE, margins = FALSE, title = "", x.axis = "X1", y.axis = "X2", point.size = 0.9, panel.grid = TRUE, color = "plasma", rb_values = c(10, 0.315, 0.15), ... )
## S3 method for class 'qad' plot( x, addSample = FALSE, copula = FALSE, density = FALSE, margins = FALSE, title = "", x.axis = "X1", y.axis = "X2", point.size = 0.9, panel.grid = TRUE, color = "plasma", rb_values = c(10, 0.315, 0.15), ... )
x |
an object of class qad. |
addSample |
a logical indicating whether the observations are returned. In the copula setting the mass squares of the empirical copula are added too. (default = FALSE). |
copula |
a logical indicating whether the plot depicts the conditional probabilities of the empirical checkerboard copula or of the retransformed data setting (default = FALSE). |
density |
a logical indicating whether the density should be plotted instead of the conditional probabilites (default = FALSE). Only works in the copula setting, i.e. if copula = TRUE. |
margins |
a logical indicating whether the margin distribution is added in form of a rug plot. |
title |
The text for the title |
x.axis |
The text for the x-axis |
y.axis |
The text for the y-axis |
point.size |
a numeric specifying the point size of the sample (relevant if addSample = TRUE). |
panel.grid |
a logical indicating whether the panel grid is plotted. (default = TRUE) |
color |
a color palette of the viridis package or rainbow. options are c("viridis", "magma", "inferno", "plasma", "cividis", "rainbow") |
rb_values |
a vector of size 3 with number of values, start value and end value in the rainbow colors space. |
... |
some methods for this generic require additional arguments. None are used in this method. |
The conditional probabilities are constant at squares in the copula setting. If the squares are retransformed in the data setting, the resulting objects are rectangles.
## Example 1 n <- 100 x <- runif(n, 0, 1) y <- runif(n, 0, 1) sample <- data.frame(x, y) #qad fit <- qad(sample) plot(fit, addSample = TRUE, copula = FALSE) ## Example 2 n <- 100 x <- runif(n, -1, 1) y <- x^2 + rnorm(n, 0, 0.1) sample <- data.frame(x, y) #qad fit <- qad(sample) plot(fit, addSample = TRUE, copula = TRUE) plot(fit, addSample = TRUE, copula = FALSE)
## Example 1 n <- 100 x <- runif(n, 0, 1) y <- runif(n, 0, 1) sample <- data.frame(x, y) #qad fit <- qad(sample) plot(fit, addSample = TRUE, copula = FALSE) ## Example 2 n <- 100 x <- runif(n, -1, 1) y <- x^2 + rnorm(n, 0, 0.1) sample <- data.frame(x, y) #qad fit <- qad(sample) plot(fit, addSample = TRUE, copula = TRUE) plot(fit, addSample = TRUE, copula = FALSE)
The function predict.qad()
can be utilized to predict the probabilities of the event that Y lies in
specific intervals given X=x, or vice versa. Thereby, the intervals are calculated as retransformed intervals (by using the empirical quantil function) defining the checkerboard grid. Additionally, the mass of the conditional distribution functions are plotted.
The prediction can be computed in the sample setting as well as in the copula setting (pseudo-observation in the unit square).
## S3 method for class 'qad' predict( object, values, conditioned = "x1", nr_intervals = NULL, prediction_interval = NULL, copula = FALSE, pred_plot = FALSE, panel.grid = TRUE, ... )
## S3 method for class 'qad' predict( object, values, conditioned = "x1", nr_intervals = NULL, prediction_interval = NULL, copula = FALSE, pred_plot = FALSE, panel.grid = TRUE, ... )
object |
an object of class 'qad', which determines the underlying checkerboard aggregation. |
values |
a vector containing the x or the y values for which the conditional probabilities should be predicted. |
conditioned |
a character specifying on which variable is conditioned. Options are "x1" (default) or "x2". |
nr_intervals |
an integer, which determines a different number of intervals for the prediction (only possible in the copula setting). |
prediction_interval |
a vector specifying the interval boundaries for which the conditional probability is computed. Options are NULL (default) to predict the conditional probabilities for all intervals or a vector c(lower boundary, upper boundary) indicating the boundaries. |
copula |
a logical (default = FALSE) determining whether the empirical checkerboard copula is used or the retransformed data. |
pred_plot |
a logical indicating if the conditional probabilities are plotted. |
panel.grid |
a logical indicating whether the panel.grid is plotted. |
... |
some methods for this generic require additional arguments. None are used in this method. |
a list containing a data.frame with the computed intervals (lower and upper boundaries) and the prediction probabilities (i.e., the probability that Y lies in the interval Ii given X = x). Furthermore, a heatmap depicting the mass of the conditional distribution functions is returned.
Predictions are only possible for values within the range of the sample (or between 0 and 1 in the copula setting). For given values exceeding the range NA is returned.
set.seed(1) n <- 100 x <- runif(n, -4 ,4) y <- x^2 + rnorm(n, 0, 1) sample <- data.frame(x, y) ##(Not Run) qad.fit <- qad(sample) predict.qad(qad.fit, values = c(-2,0.6), conditioned = "x1", pred_plot = TRUE) predict.qad(qad.fit, values = c(1,9), conditioned = "x2", pred_plot = TRUE) predict.qad(qad.fit, values = c(-2,0.6), conditioned = "x1", pred_plot = FALSE, nr_intervals = 4) predict.qad(qad.fit, values = c(-2,0.6), conditioned = "x1", pred_plot = FALSE, prediction_interval = c(4,6)) predict.qad(qad.fit, values = c(4,0.6), conditioned = "x2", pred_plot = FALSE, prediction_interval = c(2,3)) qad.pred <- predict.qad(qad.fit, values = c(-2,0.6), conditioned = "x1", pred_plot = FALSE) qad.pred$prediction qad.pred$plot
set.seed(1) n <- 100 x <- runif(n, -4 ,4) y <- x^2 + rnorm(n, 0, 1) sample <- data.frame(x, y) ##(Not Run) qad.fit <- qad(sample) predict.qad(qad.fit, values = c(-2,0.6), conditioned = "x1", pred_plot = TRUE) predict.qad(qad.fit, values = c(1,9), conditioned = "x2", pred_plot = TRUE) predict.qad(qad.fit, values = c(-2,0.6), conditioned = "x1", pred_plot = FALSE, nr_intervals = 4) predict.qad(qad.fit, values = c(-2,0.6), conditioned = "x1", pred_plot = FALSE, prediction_interval = c(4,6)) predict.qad(qad.fit, values = c(4,0.6), conditioned = "x2", pred_plot = FALSE, prediction_interval = c(2,3)) qad.pred <- predict.qad(qad.fit, values = c(-2,0.6), conditioned = "x1", pred_plot = FALSE) qad.pred$prediction qad.pred$plot
Quantification of (asymmetric and directed) dependence structures between two random variables X and Y.
qad(x, ...) ## S3 method for class 'data.frame' qad( x, resolution = NULL, p.value = TRUE, nperm = 1000, p.value_asymmetry = FALSE, nboot = 1000, print = TRUE, remove.00 = FALSE, ... ) ## S3 method for class 'numeric' qad( x, y, resolution = NULL, p.value = TRUE, nperm = 1000, p.value_asymmetry = FALSE, nboot = 1000, print = TRUE, remove.00 = FALSE, ... )
qad(x, ...) ## S3 method for class 'data.frame' qad( x, resolution = NULL, p.value = TRUE, nperm = 1000, p.value_asymmetry = FALSE, nboot = 1000, print = TRUE, remove.00 = FALSE, ... ) ## S3 method for class 'numeric' qad( x, y, resolution = NULL, p.value = TRUE, nperm = 1000, p.value_asymmetry = FALSE, nboot = 1000, print = TRUE, remove.00 = FALSE, ... )
x |
a data.frame containing two columns with the observations of the bi-variate sample or a (non-empty) numeric vector of data values |
... |
Further arguments passed to 'qad' will be ignored |
resolution |
an integer indicating the number of strips for the checkerboard aggregation (see ECBC). We recommend to use the default value (resolution = NULL) |
p.value |
a logical indicating whether to return a p-value of rejecting independence (based on permutation). |
nperm |
an integer indicating the number of permutation runs (if p.value = TRUE) |
p.value_asymmetry |
a logical indicating whether to return a (heuristic) p-value for the measure of asymmetry (based on bootstrap). |
nboot |
an integer indicating the number of runs for the bootstrap. |
print |
a logical indicating whether the result of qad is printed. |
remove.00 |
a logical indicating whether double 0 entries should be excluded (default = FALSE) |
y |
a (non-empty) numeric vector of data values. |
qad is the implementation of a strongly consistent estimator of the copula based dependence measure zeta_1 introduced in Trutschnig 2011. We first compute the empirical copula of a two-dimensional sample, aggregate it to the so called empirical checkerboard copula (ECBC), and calculate zeta_1 of the ECBC and its transpose. In order to test for independence (in both directions), a built-in p-value is implemented (a permutation test with nperm permutation runs to estimate the p-value). Furthermore, a (heuristic) bootstrap test with nboot runs can be applied to estimate a p-value for the measure of asymmetry a.
qad returns an object of class qad containing the following components:
data |
a data.frame containing the input data. |
q(X , Y)
|
influence of X on Y |
q(Y , X)
|
influence of Y on X |
max.dependence |
maximal dependence |
results |
a data.frame containing the results of the dependence measures. |
mass_matrix |
a matrix containing the mass distribution of the empirical checkerboard copula. |
resolution |
an integer containing the used resolution of the checkerboard aggregation. |
n |
Sample size. |
Trutschnig, W. (2011). On a strong metric on the space of copulas and its induced dependence measure, Journal of Mathematical Analysis and Applications 384, 690-705.
Junker, R., Griessenberger, F. and Trutschnig, W. (2021). Estimating scale-invariant directed dependence of bivariate distributions. Computational Statistics and Data Analysis, 153.
A tutorial can be found at http://www.trutschnig.net/software.html.
#Example 1 (independence) n <- 100 x <- runif(n,0,1) y <- runif(n,0,1) sample <- data.frame(x,y) qad(sample) ### #Example 2 (mutual complete dependence) n <- 500 x <- runif(n,0,1) y <- x^2 sample <- data.frame(x,y) qad(sample) #Example 3 (complete dependence) n <- 1000 x <- runif(n,-10,10) y <- sin(x) sample <- data.frame(x,y) qad(sample) #Example 4 (Asymmetry) n <- 100 x <- runif(n,0,1) y <- (2*x) %% 1 qad(x, y)
#Example 1 (independence) n <- 100 x <- runif(n,0,1) y <- runif(n,0,1) sample <- data.frame(x,y) qad(sample) ### #Example 2 (mutual complete dependence) n <- 500 x <- runif(n,0,1) y <- x^2 sample <- data.frame(x,y) qad(sample) #Example 3 (complete dependence) n <- 1000 x <- runif(n,-10,10) y <- sin(x) sample <- data.frame(x,y) qad(sample) #Example 4 (Asymmetry) n <- 100 x <- runif(n,0,1) y <- (2*x) %% 1 qad(x, y)
Distribution function - P_H0(qad <= q) - and quantile function for the qad distribution with regard to the null hypthesis (H0) stating independence between X and Y.
pqad(q, n, R = 1000, resolution = NULL) qqad(p, n, R = 1000, resolution = NULL)
pqad(q, n, R = 1000, resolution = NULL) qqad(p, n, R = 1000, resolution = NULL)
q |
vector of quantiles. |
n |
number of observations (or minimum of unique values, if ties occur). |
R |
number of repetitions (default R = 1000) |
resolution |
resolution of checkerboard copula (default = NULL) |
p |
vector of probabilities. |
The distribution of qad in the setting of independence, i.e., the random variables X and Y are independent. The distribution is calculated in the following way: Samples of size n are drawn from independent random variables. Then qad is calculated. The procedure is repeated R times. #'
pqad
gives the distribution function, i.e. P(qad <= q). qqad
gives the quantile function.
The length of the result is determined by the length of q or p, respectively.
pqad(0.3, 45) qqad(0.5, 30)
pqad(0.3, 45) qqad(0.5, 30)
Summary and coefficients of a qad
output. The function
summary()
prints the dependence measures, sample size and resolution of
the checkerboard copula and returns a list with the mentioned values.
The function coef()
returns a named vector with the selected values.
## S3 method for class 'qad' summary(object, ...) ## S3 method for class 'qad' coef( object, select = c("q(x1,x2)", "q(x2,x1)", "max.dependence", "asymmetry", "p.q(x1,x2)", "p.q(x2,x1)", "p.max.dependence", "p.asymmetry"), ... )
## S3 method for class 'qad' summary(object, ...) ## S3 method for class 'qad' coef( object, select = c("q(x1,x2)", "q(x2,x1)", "max.dependence", "asymmetry", "p.q(x1,x2)", "p.q(x2,x1)", "p.max.dependence", "p.asymmetry"), ... )
object |
an object of class 'qad' |
... |
some methods for this generic require additional arguments. None are used in this method. |
select |
a vector of strings indicating which dependence measure should be returned. Options are c('q(x1,x2)', 'q(x2,x1)', 'max.dependence', 'asymmetry') |
an object containing the calculated values of a qad
object.
n <- 100 x <- runif(n, 0, 1) y <- x^2 + rnorm(n, 0, 0.1) sample <- data.frame(x, y) ##(Not Run) fit <- qad(sample) summary(fit) coef(fit) coef(fit, select = c('q(x1,x2)','p.q(x1,x2)'))
n <- 100 x <- runif(n, 0, 1) y <- x^2 + rnorm(n, 0, 0.1) sample <- data.frame(x, y) ##(Not Run) fit <- qad(sample) summary(fit) coef(fit) coef(fit, select = c('q(x1,x2)','p.q(x1,x2)'))