Title: | Evidential Analysis of Genetic Association Data |
---|---|
Description: | Evidential regression analysis for dichotomous and quantitative outcome data. The following references described the methods in this package: Strug, L. J., Hodge, S. E., Chiang, T., Pal, D. K., Corey, P. N., & Rohde, C. (2010) <doi:10.1038/ejhg.2010.47>. Strug, L. J., & Hodge, S. E. (2006) <doi:10.1159/000094709>. Royall, R. (1997) <ISBN:0-412-04411-0>. |
Authors: | Dr. Lisa J Strug <[email protected]>; Dr. Zeynep Baskurt <[email protected]>; Bowei Xiao <[email protected]>; Ted Chiang <[email protected]> |
Maintainer: | Jiafen Gong <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1.0 |
Built: | 2025-02-13 04:12:00 UTC |
Source: | https://github.com/cran/evian |
This is a helper function that adjusts the genotye coding scheme based on the genetic model specified.
adjustModel(data_nomiss,model)
adjustModel(data_nomiss,model)
data_nomiss |
a data frame that contains the phenotype, covariates and genotype columns. Genotypes need to be coded as 0/1/2. |
model |
The model specified. Must be a string from one of the following: |
adjustModel
is an interior function that adjusts the genotype coding based on the genetic model specified. The coding scheme for different genetic models can be found in calculateEvianMLE
.
This function returns the data frame with the same columns but changed genotype coding based on the genetic model specified.
This is the function that calculates profileLikelihood for a single SNP. The main function evian
calls this function repeatedly to obtain results for multiple SNPs.
calculateEvianMLE(snp, formula_tofit, model, data, bim, lolim, hilim, m, bse, k, robust, family, plinkCC)
calculateEvianMLE(snp, formula_tofit, model, data, bim, lolim, hilim, m, bse, k, robust, family, plinkCC)
snp |
a string specifying the SNP of interests to be calculated. |
formula_tofit |
a formula object of the genetic model. The model should be formatted as |
model |
a string specifying the mode of inheritance parameterization: |
data |
data frame; read from the argument |
bim |
data frame; read from from the argument |
lolim |
numeric; the lower limit for the grid or the minimum value of the regression parameter |
hilim |
numeric; the upper limit for the grid or the maximum value of the regression parameter |
m |
numeric; the density of the grid at which to compute the standardized likelihood function. A beta grid is defined as the grid of values for the SNP parameter used to evaluate the likelihood function. |
bse |
numeric; the number of beta standard errors to utilize in constraining the beta grid limits. Beta grid is evaluated at |
k |
numeric or numeric vector; The strength of evidence criterion k. Reads from the input of |
robust |
logical; if |
family |
the link function for |
plinkCC |
A boolean type that specifies how case/control are coded. case/control were coded 1/0 if it is FALSE, and were coded 2/1 if TRUE. |
calculateEvianMLE
calculates the profile likelihood for a single SNP. A proper grid range is first established for then the standardized profile likelihood is evaluated at each of the
m
cuts uniformly spread across the grid. Based on the standardized profile likelihood, the MLE for is computed as well as the likelihood intervals for each value of
k
provided.
For different genetic models, their coding schemes are shown as below:
Additive AA 0 AB 1 BB 2 Dominant AA 0 AB 1 BB 1 Recessive AA 0 AB 0 BB 1 Overdominance model A D AA 0 0 AB 1 1 BB 2 0
Specifically for the overdominance model, the column of interest is the D column.
This function outputs a list containg 4 elements that can be directly accessed using '$
' operator.
theta |
numeric vector; It stores all |
profile.lik.norm |
numeric vector; the corresponding |
k_cutoff |
numeric vector; It specifies which k-cutoff had been used in the calculation, ordered from the smallest k to the largest k. |
SummaryStats |
data frame; contains the summary statistics of the profile likelihood calculation. It contains the following columns:
|
When lolim
or hilim
are NOT defined, then the boundaries of the beta grid will be determined by the default bse=5
, or a bse
defined by the user. Otherwise, the user can define the exact beta grid boundaries using lolim
and hilim
.
In some cases the beta grid (using bse
or lolim
,hilim
) may need to be increased substantially (bse
as large as 15) if covariates are present in the formula. This is automatically dealt by the current function, but contributes to longer computation time to find the appropriate ranges. Estimation may become inaccurate with large number of correlated covariates, which is a known limitation of profile likelihoods.
Strug, L. J., Hodge, S. E., Chiang, T., Pal, D. K., Corey, P. N., & Rohde, C. (2010). A pure likelihood approach to the analysis of genetic association data: an alternative to Bayesian and frequentist analysis. Eur J Hum Genet, 18(8), 933-941. doi:10.1038/ejhg.2010.47
Strug, L. J., & Hodge, S. E. (2006). An alternative foundation for the planning and evaluation of linkage analysis. I. Decoupling "error probabilities" from "measures of evidence". Hum Hered, 61(3), 166-188. doi:10.1159/000094709
Royall, R. (1997). Statistical Evidence: A Likelihood Paradigm. London, Chapman and Hall.
This is the function that calculates profileLikelihood for a single SNP. The main function evian
calls this function repeatedly to obtain results for multiple SNPs.
calculateGLR(snp, formula_tofit, model, data, bim, lolim, hilim, m, bse, family, c, plinkCC)
calculateGLR(snp, formula_tofit, model, data, bim, lolim, hilim, m, bse, family, c, plinkCC)
snp |
a string specifying the SNP of interests to be calculated. |
formula_tofit |
a formula object of the genetic model. The model should be formatted as |
model |
a string specifying the mode of inheritance parameterization: |
data |
data frame; read from the argument |
bim |
data frame; read from from the argument |
lolim |
numeric; the lower limit for the grid or the minimum value of the regression parameter |
hilim |
numeric; the upper limit for the grid or the maximum value of the regression parameter |
m |
numeric; the density of the grid at which to compute the standardized likelihood function. A beta grid is defined as the grid of values for the SNP parameter used to evaluate the likelihood function. |
bse |
numeric; the number of beta standard errors to utilize in constraining the beta grid limits. Beta grid is evaluated at |
family |
the link function for |
c |
numeric; interval of Null Hypothesis to be tested. |
plinkCC |
A boolean type that specifies how case/control are coded. case/control were coded 1/0 if it is FALSE, and were coded 2/1 if TRUE. |
calculateGLR
conducts a likelihood ratio test for testing the SNP of interest. It uses the same numerical approach as the main function calculateEvianMLE to construct the likelihood function and it is then testing whether the effect of the SNP falls in an interval (-c, c) instead of testing whether the effect is 0 as in the calculateEvianMLE.
This function outputs a dataframe that contains the summary statistics of the profile likelihood calculation. It contains the following columns:
GLR
: the estimated generalized Likelihood ratio, a value smaller than 1 indicating in favor of the null hypothesis whereas a value greater than 1 indicating in favor of the alternative hypothesis.
boundary
: the boundary where null hypothesis is defined. i.e. the value c in (-c, c)
AF
: allele frequency for the effective allele
SNP
: SNP ID
bp
: base pair position from the bim
input
effect
, ref
: the effective allele and the other allele from the bim
input
When lolim
or hilim
are NOT defined, then the boundaries of the beta grid will be determined by the default bse=5
, or a bse
defined by the user. Otherwise, the user can define the exact beta grid boundaries using lolim
and hilim
.
In some cases the beta grid (using bse
or lolim
,hilim
) may need to be increased substantially (bse
as large as 15) if covariates are present in the formula. This is automatically dealt by the current function, but contributes to longer computation time to find the appropriate ranges. Estimation may become inaccurate with large number of correlated covariates, which is a known limitation of profile likelihoods.
Dr. Lisa J Strug [email protected]
Bickel, D. R. (2012). “The strength of statistical evidence for composite hypotheses: Inference to the best explanation.” Statistica Sinica, 22, 1147-1198.
Zhang, Z., \& Zhang, B. (2013). “A likelihood paradigm forclinical trials. Journal of Statistical Theory and Practice”, 7, 157-177.
This function plots the density distribution for a single SNP calculated from the evian
functions.
densityPlot(dList, snpName, kcut = NULL, pl = 'linear', xlim = NULL, color = c('red','orange','green','blue'), round = 2,legend_cex=1)
densityPlot(dList, snpName, kcut = NULL, pl = 'linear', xlim = NULL, color = c('red','orange','green','blue'), round = 2,legend_cex=1)
dList |
a row-combined list, output from |
snpName |
a string specifying the SNP to be plotted. |
kcut |
numeric; the cut-off to be plotted. If |
pl |
a string specifying the y-axis for the plot. The y-axis will be plotted as 'Odds Ratio' if |
xlim |
graphical parameter used in |
color |
color of the likelihood interval lines from smallest to largest. For instance, |
round |
numeric; number of digits displayed on the plot. |
legend_cex |
numeric; control the size of the legend, default 1. |
This function takes output from evian
as input. It will plot the density of the estimated standardized profile likelihood for the SNP of interest. Some basic summary statistics will be included on the plot too.
Strug, L. J., Hodge, S. E., Chiang, T., Pal, D. K., Corey, P. N., & Rohde, C. (2010). A pure likelihood approach to the analysis of genetic association data: an alternative to Bayesian and frequentist analysis. Eur J Hum Genet, 18(8), 933-941. doi:10.1038/ejhg.2010.47
Royall, R. (1997). Statistical Evidence: A Likelihood Paradigm. London, Chapman and Hall.
data(evian_linear_raw) data(evian_linear_bim) rst1=evian(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), robust=FALSE, model="additive", m=200, lolim=-0.4, hilim=0.4, kcutoff = c(32,100), multiThread=1,family='gaussian',plinkCC=FALSE) # Plot the density for rs912 densityPlot(dList=rst1,snpName='rs912')
data(evian_linear_raw) data(evian_linear_bim) rst1=evian(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), robust=FALSE, model="additive", m=200, lolim=-0.4, hilim=0.4, kcutoff = c(32,100), multiThread=1,family='gaussian',plinkCC=FALSE) # Plot the density for rs912 densityPlot(dList=rst1,snpName='rs912')
Calculates the likelihood intervals for genetic association in a genomic region of interest. Covariates can be accommodated.
evian(data, bim, xcols = NULL, ycol = NULL, covariateCol = NULL, formula = NULL, robust = FALSE, model='additive', m=200, bse = 5, lolim = NULL, hilim = NULL, kcutoff = c(8,32,100,1000), multiThread = 1, family='gaussian',plinkCC=F)
evian(data, bim, xcols = NULL, ycol = NULL, covariateCol = NULL, formula = NULL, robust = FALSE, model='additive', m=200, bse = 5, lolim = NULL, hilim = NULL, kcutoff = c(8,32,100,1000), multiThread = 1, family='gaussian',plinkCC=F)
data |
a data frame includes a column for the response variable, one or multiple columns of genotype data (coded as |
bim |
a data frame with six columns representing chromosome, SNP ID, physical distance, base pair position, effective allele, and reference allele. i.e. data from a file in PLINK binary format (bim). No header is assumed, but the ordering of the columns must follow the standard bim file format. |
ycol |
numeric; column index in the |
xcols |
numeric vector; the column range in the |
covariateCol |
numeric or numeric vector; optional argument specifying which columns represent covariates. If left as |
formula |
string; this is an alternative way of specifying model rather than using |
robust |
logical; default |
model |
a string that specifies the mode of inheritance parameterization: |
m |
numeric; the density of the grid at which to compute the standardized likelihood function. A beta grid is defined as the grid of values for the SNP parameter used to evaluate the likelihood function. |
bse |
numeric; the number of beta standard errors to utilize in constraining the beta grid limits. Beta grid is evaluated at |
lolim |
numeric; the lower limit for the grid or the minimum value of the regression parameter |
hilim |
numeric; the upper limit for the grid or the maximum value of the regression parameter |
kcutoff |
numeric or numeric vector; default = |
multiThread |
numeric; number of threads to use for parallel computing. |
family |
the link function for |
plinkCC |
A boolean type that specifies how case/control are coded. case/control were coded 1/0 if it is FALSE, and were coded 2/1 if TRUE. |
evian
is the main function called to calculate the 1/k
likelihood intervals for the additive, dominant, recessive, or overdominance genotypic models. This function calls calculateEvianMLE
in parallel to calculate the likelihood for each SNP. The calculation details can be found in calculateEvianMLE
.
The input for the data
and bim
arguments can be obtained from the PLINK files; data
is expected to follow PLINK format when run with the --recodeA
option and bim
can be obtained directly from a PLINK binary format file. Note if covariates are to be included, it is expected that the covariates are appended to the data
file with a header for each covariate.
The statistical model can be specified in two ways. Column index can be provided through the xcols
, ycol
, and covariateCol
arguments or through the formula
argument, which can accept a formula specified as the formula
argument in the R glm
function. We recommend using xcols
, ycol
, and covariateCol
arguments in most scenarios as this is relatively easier to input and it works for all the cases that we have considered so far. The alternative formula
argument is not able to detect non-rsID variants as parameters of interests, and is only suggested in the scenario where only a single variant is of interest and that its rsID is known in advance. Since the profileLikelihood can only accomendate scalar parameter and thus if multiple rsID variants are inputted through formula
option, it will only assume the first one to be parameter of interests.s
Parallel computing is avaliable through the use of the multiThread
argument. This parallelization uses the foreach
and doMC
packages and will typically reduce computation time significantly. Due to this dependency, parallelization is not available on Windows OS as foreach
and doMC
are not supported on Windows.
This function outputs the row-combined the results from calculateEvianMLE
for each of the SNPs included in the data/bim files. The exact output for each SNP can be found in the calculateEvianMLE
documentation.
When lolim/hilim
is NOT defined, then the boundaries of the beta grid will be determined by the default bse=5
, or a bse
defined by the user. Otherwise, the user can define the exact beta grid boundaries using lolim/hilim
.
In some cases the beta grid (using bse
or lolim/hilim
) may need to be increased substantially (bse
as large as 15) if covariates are present in the formula. This is automatically dealt by the current function, but contribute to longer computation time to find the appropriate ranges. Estimation may become inaccurate with large number of correlated covariates, which is a known limitation of profile likelihoods.
data(evian_linear_raw) data(evian_linear_bim) rst1=evian(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), robust=FALSE, model="additive", m=200, lolim=-0.4, hilim=0.4, kcutoff = c(32,100), multiThread=1,family='gaussian',plinkCC=FALSE)
data(evian_linear_raw) data(evian_linear_bim) rst1=evian(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), robust=FALSE, model="additive", m=200, lolim=-0.4, hilim=0.4, kcutoff = c(32,100), multiThread=1,family='gaussian',plinkCC=FALSE)
This dataset stored the corresponding SNP information from evian_binary_raw
.
data(evian_binary_bim)
data(evian_binary_bim)
This is the corresponding map file for evian_binary_raw
. Specifically it stores the chromosome, base pair, and two alleles for the 30 SNPs listed in evian_binary_raw
in the same order.
This dataset included the genotypic and phenotypic information of 250 individuals in proper formats. This dataset is used together with evian_binary_bim
to illustrate how to use the main function evian
.
data(evian_binary_raw)
data(evian_binary_raw)
This is an example dataset for evian
function. It contained 250 individuals, and for each of the individuals, their genotype at 30 SNPs and a binary outcome (PHENOTYPE; coded as 0/1) were stored. Three additional covariates (age, weight, city) were provided as well. Specifically, our function can incorporate with related individuals, some of these individuals in the dataset are correlated with others, and are specified through the FID
column. This is usually what a plink .raw file looks like except for the case/control status not coded as 1/2.
This dataset stored the corresponding SNP information from evian_linear_raw
.
data(evian_linear_bim)
data(evian_linear_bim)
This is the corresponding map file for evian_linear_raw
. Specifically it stores the chromosome, base pair, and two alleles for the 10 SNPs listed in evian_linear_raw
in the same order.
This dataset included the genotypic and phenotypic information of 781 individuals in proper formats. This dataset is used together with evian_linear_bim
to illustrate how to use the main function evian
.
data(evian_linear_raw)
data(evian_linear_raw)
This is an example dataset for evian
function. It contained 781 individuals, and for each of the individuals, their genotype at 10 SNPs and a continuous outcome (Y_norma) were stored. Three additional covariates (Fev, BMI_group, Age_group) were provided as well. Specifically, our function can incorporate with related individuals, some of these individuals in the dataset are correlated with others, and are specified through the FID
column. This is usually what a plink .raw file looks like.
This is an internal function that finds the proper boundary of the grid.
expandBound(data,bse,parameters,formula,m,k,family)
expandBound(data,bse,parameters,formula,m,k,family)
data |
a data frame inputted from the main function. |
bse |
numeric. The number of beta standard errors to utilize in constraining the beta grid limits. Passed down from argument |
parameters |
a numeric vector of length 3 providing the starting values for the search. This is obtained from the |
formula |
a formula specifying the response and possible covariates to keep in the output dataframe. This is directly obtained from |
k |
numeric vector. The strength of evidence criterion k. Passed down from argument |
m |
numeric. The density of the grid at which to compute the standardized likelihood function. Passed down from argument |
family |
a string representing the link function for |
Even though the initial grid bound calculated from getGridBound
works for most of the data, there can be cases where bse
needs to be increased in order to observe all the Likelihood Intervals (LIs) specified from the main function in the range kcutoff
calculated. In this case, our approach is to check whether the current grid range includes the largest LIs. The function will expand the grid range by increasing bse
by 1 if it is not included. This step will be running recursively until the largest LIs are included in the current grid range.
This function returns a numeric vector of length two representing the optimal lower and upper bounds for the grid on which the later functions will search for MLE.
This is an internal function that provides the range where the profileLikelihood function would search for MLE.
getGridBound(formula,data,bse,k,m,family,robust)
getGridBound(formula,data,bse,k,m,family,robust)
formula |
a formula specifying the response and possible covariates to keep in the output dataframe. This is directly obtained from |
data |
a data frame inputted from the output of |
bse |
numeric. The number of beta standard errors to utilize in constraining the beta grid limits. Passed down from argument |
k |
numeric vector. The strength of evidence criterion k. Passed down from argument |
m |
numeric. The density of the grid at which to compute the standardized likelihood function. Passed down from argument |
family |
a string representing the link function for |
robust |
A numeric value, robust correction factor. |
getGridBound
is an interior function that searches for the proper grid range that would be used to search for MLE. This is done through two steps: First, it finds a starting grid range by fitting a (generalized) linear model to obtain the estimate and s.e. of the beta. Then the starting grid range can be defined as mean +- bse*s.e. . In the case where robust correction is needed, the grid will be defined as mean +- bse*s.e./correction factor. Then the function determines an optimal grid range by using expandBound
function.
This function returns a numeric vector of length 2 that represents the lower and upper bounds of the grid for the MLE search.
Conducts a generalized likelihood ratio test testing whether (-
c
,c
). Covariates can be accommodated.
glr(data, bim, xcols = NULL, ycol = NULL, covariateCol = NULL, formula = NULL, c, robust = FALSE, model = 'additive', m = 200, bse = 5, lolim = NULL, hilim = NULL, multiThread = 1, family='gaussian',plinkCC=F)
glr(data, bim, xcols = NULL, ycol = NULL, covariateCol = NULL, formula = NULL, c, robust = FALSE, model = 'additive', m = 200, bse = 5, lolim = NULL, hilim = NULL, multiThread = 1, family='gaussian',plinkCC=F)
data |
a data frame includes a column for the response variable, one or multiple columns of genotype data (coded as |
bim |
a data frame with six columns representing chromosome, SNP ID, physical distance, base pair position, effective allele, and reference allele. i.e. data from a file in PLINK binary format (bim). No header is assumed, but the ordering of the columns must follow the standard bim file format. |
ycol |
numeric; column index in the |
xcols |
numeric vector; the column range in the |
covariateCol |
numeric or numeric vector; optional argument specifying which columns represent covariates. If left as |
formula |
string; this is an alternative way of specifying model rather than using |
c |
numeric; interval of Null Hypothesis to be tested. |
robust |
logical; default |
model |
a string that specifies the mode of inheritance parameterization: |
m |
numeric; the density of the grid at which to compute the standardized likelihood function. A beta grid is defined as the grid of values for the SNP parameter used to evaluate the likelihood function. |
bse |
numeric; the number of beta standard errors to utilize in constraining the beta grid limits. Beta grid is evaluated at |
lolim |
numeric; the lower limit for the grid or the minimum value of the regression parameter |
hilim |
numeric; the upper limit for the grid or the maximum value of the regression parameter |
multiThread |
numeric; number of threads to use for parallel computing. |
family |
the link function for |
plinkCC |
A boolean type that specifies how case/control are coded. case/control were coded 1/0 if it is FALSE, and were coded 2/1 if TRUE. |
This is a similar function to the main function evian
. Instead of testing H0: =0, it tests for whether H0:
(-
c
,c
).
This function outputs the row-combined the results from calculateGLR
for each of the SNPs included in the data/bim files. The exact output for each SNP can be found in the calculateGLR
documentation.
When lolim/hilim
is NOT defined, then the boundaries of the beta grid will be determined by the default bse=5
, or a bse
defined by the user. Otherwise, the user can define the exact beta grid boundaries using lolim/hilim
.
In some cases the beta grid (using bse
or lolim/hilim
) may need to be increased substantially (bse
as large as 15) if covariates are present in the formula. This is automatically dealt by the current function, but contribute to longer computation time to find the appropriate ranges. Estimation may become inaccurate with large number of correlated covariates, which is a known limitation of profile likelihoods.
data(evian_linear_raw) data(evian_linear_bim) rst2=glr(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), c=0.025,robust=F, model="additive", m=200, lolim=-0.6, hilim=0.6, multiThread=1,family='gaussian',plinkCC=FALSE)
data(evian_linear_raw) data(evian_linear_bim) rst2=glr(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), c=0.025,robust=F, model="additive", m=200, lolim=-0.6, hilim=0.6, multiThread=1,family='gaussian',plinkCC=FALSE)
This function plots the log10 GLR value in the region calculated from the glr
functions.
glr_plot(glr, snpName,kcut,col,legend_cex=1,...)
glr_plot(glr, snpName,kcut,col,legend_cex=1,...)
glr |
a dataframe, output from |
snpName |
a string specifying the SNP to be marked. |
kcut |
numeric; the cut-off to be plotted. Can have value <=1. |
col |
color of the likelihood interval lines from smallest to largest. For instance, |
legend_cex |
numeric; control the size of the legend, default 1. |
... |
Argument passed to the |
This function takes output from glr
as input. It will plot the GLR value on a log-scale and marked the SNPs of interests (defaultly the SNP with the maximum GLR values).
data(evian_linear_raw) data(evian_linear_bim) rst2=glr(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), c=0.025,robust=F, model="additive", m=200, lolim=-0.6, hilim=0.6, multiThread=1,family='gaussian',plinkCC=FALSE) # Plot the density for rs912 glr_plot(glr=rst2,ylim=c(-0.5,2))
data(evian_linear_raw) data(evian_linear_bim) rst2=glr(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), c=0.025,robust=F, model="additive", m=200, lolim=-0.6, hilim=0.6, multiThread=1,family='gaussian',plinkCC=FALSE) # Plot the density for rs912 glr_plot(glr=rst2,ylim=c(-0.5,2))
This function plots the likelihood intervals (LIs) for all SNPs calculated using evian
.
multiLine_plot(bpstart = 0, bpend = 1000000000, dList, title = NULL, showmaxlr = 3, kcut = NULL, pl = 'linear', ylim = c(-0.5,10), color = c('violet','green','red','blue'), markSNP = NULL, round = 2,legend_cex=1)
multiLine_plot(bpstart = 0, bpend = 1000000000, dList, title = NULL, showmaxlr = 3, kcut = NULL, pl = 'linear', ylim = c(-0.5,10), color = c('violet','green','red','blue'), markSNP = NULL, round = 2,legend_cex=1)
bpstart , bpend
|
numeric; indicating the range of base pairs to be plotted. From |
dList |
a row-combined list, output from |
title |
string; title of plot |
showmaxlr |
numeric; number of top SNPs to display on the graph. Default = 3. SNPs are chosen by their maximum likelihood ratio values. |
kcut |
numeric; the cut-off to be plotted. If |
pl |
a string specifying the y-axis for the plot. The y-axis will be plotted as 'Odds Ratio' if |
markSNP |
vector of strings; indicates which SNPs to be marked on the plot. By default it will mark all SNPs that are significant at the smallest cut-off. |
round |
numeric; number of digits displayed on the plot. |
ylim |
graphical parameter used in |
color |
color of the likelihood interval lines from smallest to largest. For instance, |
legend_cex |
numeric; control the size of the legend, default 1. |
This function takes output from evian
as input. It will plot the likelihood intervals for each of the SNPs analyzed. If 1/k interval is significant then it will be colored by the specified color and will remain grey if the interval is not significant.
Dr. Lisa J Strug [email protected]
Strug, L. J., Hodge, S. E., Chiang, T., Pal, D. K., Corey, P. N., & Rohde, C. (2010). A pure likelihood approach to the analysis of genetic association data: an alternative to Bayesian and frequentist analysis. Eur J Hum Genet, 18(8), 933-941. doi:10.1038/ejhg.2010.47
data(evian_linear_raw) data(evian_linear_bim) rst1=evian(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), robust=FALSE, model="additive", m=200, lolim=-0.4, hilim=0.4, kcutoff = c(32,100), multiThread=1,family='gaussian',plinkCC=FALSE) # Plot the LIs for all 3 SNPs multiLine_plot(dList=rst1)
data(evian_linear_raw) data(evian_linear_bim) rst1=evian(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw), ycol=6, covariateCol=c(5,7:9), robust=FALSE, model="additive", m=200, lolim=-0.4, hilim=0.4, kcutoff = c(32,100), multiThread=1,family='gaussian',plinkCC=FALSE) # Plot the LIs for all 3 SNPs multiLine_plot(dList=rst1)
The robust function computes an adjustment that is applied to the likelihood function to account for the cluster nature of the data.
robust_forCluster(formula, data, family)
robust_forCluster(formula, data, family)
formula |
a formula specifying the response and possible covariates to keep in the output data frame. This is directly obtained from the |
data |
data frame; from the output of |
family |
string; the link function for |
The robust function is called from within evian
functions. It computes a robust adjustment factor that is
applied to the likelihood function to account for the cluster nature of the data. The family ID column (FID) specifies the clusters. The robust adjustment factor is the ratio of the regular variance estimator of the maximum likelihood estimate (MLE) to the sandwich variance estimator of the MLE, where the ‘meat’ of the sandwich variance estimator is corrected for clustering in the data (Blume et.al, 2007).
If the data is not clustered (i.e. the observations are independent) then the adjustment factor can still be applied to make the working model robust to possible model misspecifications (Royall and Tsou, 2003).
A numeric constant.
Zeynep Baskurt [email protected]
Blume, J. D., Su, L., Remigio, M. O., & McGarvey, S. T. (2007). Statistical evidence for GLM regression parameters: A robust likelihood approach. Statistics in Medicine, 26, 2919-2936.
Royall, R. , Tsou, T. S. (2003). Interpreting statistical evidence by using imperfect models: robust adjusted likelihood functions. J Roy Stat Soc B; 65: 391-404.
This is an internal function that subsets the SNP column with matching name and removes rows with missing observations.
subsetData(snp,formula_tofit,data,plinkCC)
subsetData(snp,formula_tofit,data,plinkCC)
snp |
a string specifying the SNP of interests. The SNP ID must exist in |
formula_tofit |
a formula object of the genetic model. This is directly obtained from |
data |
a data frame inputted from the main function. Should contain the SNP ID |
plinkCC |
A boolean type that specifies how case/control are coded. case/control were coded 1/0 if it is FALSE, and were coded 2/1 if TRUE. |
subsetData
is an interior function that subsets the full dataset into a smaller set containing only one specific SNP by the snp
option. It will then remove any rows with missing values.
This function returned a dataframe containing phenotype, covariates in their original column names as in the full dataset, and a column called X
representing the genotype information for the SNP chosen. The column names are essetial.