COMETS Analytics support all cohort-specific analyses of the COMETS consortium. This collaborative work is done via the COMETS harmonization group activities. For more information, see the [COMETS website] (http://epi.grants.cancer.gov/comets/).
The required input file shoudl be in excel format with the following 5 sheets:
An example input file is available HERE.
The first step of the analysis is to load in the data with the getModelData() function. Input for this function is an Excel spreadsheet, per the description above.
# Retrieve the full path of the input data
dir <- system.file("extdata", package="COMETS", mustWork=TRUE)
csvfile <- file.path(dir, "cometsInputAge.xlsx")
# Read in and process the input data
exmetabdata <- COMETS::readCOMETSinput(csvfile)
## [1] "Metabolites sheet is read in"
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
## [1] "SubjectMetabolites sheet is read in"
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
## [1] "SubjectData sheet is read in"
## [1] "VarMap sheet is read in"
## [1] "Models sheet is read in"
## [1] "There are 14 categorical variables"
## [1] "Running Integrity Check..."
## character(0)
## Joining, by = "sample_id"
## Joining, by = "hmdb_id"
## [1] "Input data has passed QC (metabolite and sample names match in all input files)"
To plot some the distribution of variances for each metabolite:
## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.
To plot the distribution of minimum/missing values:
## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.
There are 2 ways to specify your model, batch or interactive. In Batch mode, models are specified in your input file. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run. Input for this function is the data input in the previous step:
In Interactive mode, models are specified as parameters. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run. Input for this function is the data input in the previous step:
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec="Interactive",
colvars=c("age","bmi_grp"), where=c("age>40","bmi_grp>2"))
## [1] "Analysis will run on 'All metabolites'"
## [1] "Filtering subjects according to the rule(s)age>40. 279 of 1000are retained"
## [2] "Filtering subjects according to the rule(s)bmi_grp>2. 279 of 1000are retained"
The unstratified correlation analysis is run by calling the function runCorr(). This function runs the model(s) that is(are) defined in the input data (Models tab).
## [1] "No factors found, only performing near zero variance check for all covariates."
## [1] "Removed 1exposure(s):NA because of zero-variance"
## [1] "running unadjusted"
The output of the correlation analysis can then be compiled and output to a CSV file with the following function:
To view the first 3 lines of the correlation analysis output, simply type:
## cohort spec model outcomespec exposurespec
## 1 DPP Interactive _1_2_3_benzenetriol_sulfate_2 age
## 2 DPP Interactive _1_2_dipalmitoylglycerol age
## 3 DPP Interactive _1_2_propanediol age
## corr n pvalue adjspec adjvars outcome_uid
## 1 0.164624501 279 0.005506392 None None CHEM100006374
## 2 0.068903188 279 0.247951327 None None HMDB07098
## 3 0.001667259 279 0.977723441 None None HMDB01881
## outcome exposure_uid exposure adj_uid adj
## 1 1,2,3-benzenetriol sulfate (2) age Age at Entry None None
## 2 DG(16:0/16:0/0:0) age Age at Entry None None
## 3 Propylene glycol age Age at Entry None None
To display the heatmap of the resulting correlation matrix, use the showheatmap function.
To display the hierarchical clustering of the resulting correlation matrix, use the showHClust function. This diplay requires at least 2 rows and 2 columns in the correlation matrix.
## [1] "Analysis will run on 'All metabolites'"
## [1] "No factors found, only performing near zero variance check for all covariates."
## [1] "Removed 1 outcome(s): bradykinin because of zero-variance"
## [1] "running unadjusted"
Results can be written to an output CSV file with the following command:
The stratified correlation analysis is run by calling the function stratCorr(). This function runs the model(s) that is(are) defined in the input data (Models tab) or in interactive mode. In this example, exmodeldata includes an object scovs that specifies the stratification variable. The function requires one stratification variable at a time.
exmodeldata2 <- COMETS::getModelData(exmetabdata,modelspec="Interactive",rowvars=c("lactose","lactate"),
colvars=c("age","bmi_grp"),strvars="race_grp")
excorrdata2 <- COMETS::runCorr(exmodeldata2,exmetabdata,"DPP")
## [1] "Running analysis on subjects stratified by race_grp 0"
## [1] 912 4
## [1] "No factors found, only performing near zero variance check for all covariates."
## [1] "running unadjusted"
## [1] "Running analysis on subjects stratified by race_grp 1"
## [1] 87 4
## [1] "No factors found, only performing near zero variance check for all covariates."
## [1] "running unadjusted"
## [1] "Running analysis on subjects stratified by race_grp 2"
## [1] 1 4
## Warning in COMETS::runCorr(exmodeldata2, exmetabdata, "DPP"): Warning: strata
## (race_grp=2 could not be run because model check failed
All models desginated in the input file can be run with one command, and individual output CSV files or correlation results will be written in the current directory by default. The function returns a list of resulting data frames.
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 tidyr_1.1.2 jsonlite_1.7.2
## [4] viridisLite_0.3.0 splines_4.0.4 foreach_1.5.1
## [7] tmvnsim_1.0-2 prodlim_2019.11.13 Formula_1.2-4
## [10] assertthat_0.2.1 stats4_4.0.4 latticeExtra_0.6-29
## [13] cellranger_1.1.0 yaml_2.2.1 ipred_0.9-10
## [16] pillar_1.5.1 backports_1.2.1 lattice_0.20-41
## [19] glue_1.4.2 pROC_1.17.0.1 digest_0.6.27
## [22] RColorBrewer_1.1-2 checkmate_2.0.0 colorspace_2.0-0
## [25] recipes_0.1.15 htmltools_0.5.1.1 Matrix_1.3-2
## [28] plyr_1.8.6 psych_2.0.12 timeDate_3043.102
## [31] pkgconfig_2.0.3 caret_6.0-86 purrr_0.3.4
## [34] scales_1.1.1 jpeg_0.1-8.1 gower_0.2.2
## [37] lava_1.6.8.1 tibble_3.1.0 htmlTable_2.1.0
## [40] farver_2.0.3 generics_0.1.0 ggplot2_3.3.3
## [43] ellipsis_0.3.1 withr_2.4.1 nnet_7.3-15
## [46] lazyeval_0.2.2 mnormt_2.0.2 readxl_1.3.1
## [49] survival_3.2-7 magrittr_2.0.1 crayon_1.4.1
## [52] evaluate_0.14 fansi_0.4.2 nlme_3.1-152
## [55] MASS_7.3-53 foreign_0.8-81 class_7.3-18
## [58] tools_4.0.4 data.table_1.14.0 lifecycle_1.0.0
## [61] stringr_1.4.0 plotly_4.9.3 munsell_0.5.0
## [64] cluster_2.1.0 compiler_4.0.4 rlang_0.4.10
## [67] grid_4.0.4 iterators_1.0.13 rstudioapi_0.13
## [70] htmlwidgets_1.5.3 crosstalk_1.1.1 base64enc_0.1-3
## [73] rmarkdown_2.6 gtable_0.3.0 ModelMetrics_1.2.2.2
## [76] codetools_0.2-18 d3heatmap_0.6.1.3 reshape2_1.4.4
## [79] R6_2.5.0 gridExtra_2.3 lubridate_1.7.9.2
## [82] knitr_1.31 dplyr_0.8.5 COMETS_1.4.0.0
## [85] utf8_1.2.1 Hmisc_4.5-0 stringi_1.5.3
## [88] parallel_4.0.4 Rcpp_1.0.6 vctrs_0.3.6
## [91] rpart_4.1-15 png_0.1-7 tidyselect_1.1.0
## [94] xfun_0.20