Running COMETS Analytics locally

Ewy Mathé, Ella Temprosa

2021-03-23

Introduction

COMETS Analytics support all cohort-specific analyses of the COMETS consortium. This collaborative work is done via the COMETS harmonization group activities. For more information, see the [COMETS website] (http://epi.grants.cancer.gov/comets/).

Data Input Format

The required input file shoudl be in excel format with the following 5 sheets:

  1. Metabolites - from harmonized metabolites output
  2. SubjectMetabolites - abundances in columns and subject in rows
  3. SubjectData - other exposure and adjustment variables
  4. VarMap - maps the variables needed to conduct the cohort specific analysis. Specify the name of variables under CohortVariable column. if the VarReference has the same name in the cohort, leave blank
  5. Models - models used to conduct COMETS analysis. Outcomes, exposures and adjustment can specify multiple covariates delimited by spaces (ie: age bmi).

An example input file is available HERE.

Analysis Workflow for correlation analysis

1. Load Data

The first step of the analysis is to load in the data with the getModelData() function. Input for this function is an Excel spreadsheet, per the description above.

## [1] "Metabolites sheet is read in"
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
## [1] "SubjectMetabolites sheet is read in"
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
## [1] "SubjectData sheet is read in"
## [1] "VarMap sheet is read in"
## [1] "Models sheet is read in"
## [1] "There are 14 categorical variables"
## [1] "Running Integrity Check..."
## character(0)
## Joining, by = "sample_id"
## Joining, by = "hmdb_id"
## [1] "Input data has passed QC (metabolite and sample names match in all input files)"

To plot some the distribution of variances for each metabolite:

## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.

To plot the distribution of minimum/missing values:

## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.

2. Get Model Data

There are 2 ways to specify your model, batch or interactive. In Batch mode, models are specified in your input file. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run. Input for this function is the data input in the previous step:

In Interactive mode, models are specified as parameters. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run. Input for this function is the data input in the previous step:

## [1] "Analysis will run on 'All metabolites'"
## [1] "Filtering subjects according to the rule(s)age>40. 279 of 1000are retained"   
## [2] "Filtering subjects according to the rule(s)bmi_grp>2. 279 of 1000are retained"

3. Run Simple Correlation Analysis

The unstratified correlation analysis is run by calling the function runCorr(). This function runs the model(s) that is(are) defined in the input data (Models tab).

## [1] "No factors found,  only performing near zero variance check for all covariates."
## [1] "Removed 1exposure(s):NA because of zero-variance"
## [1] "running unadjusted"

The output of the correlation analysis can then be compiled and output to a CSV file with the following function:

To view the first 3 lines of the correlation analysis output, simply type:

##   cohort        spec model                   outcomespec exposurespec
## 1    DPP Interactive       _1_2_3_benzenetriol_sulfate_2          age
## 2    DPP Interactive            _1_2_dipalmitoylglycerol          age
## 3    DPP Interactive                    _1_2_propanediol          age
##          corr   n      pvalue adjspec adjvars   outcome_uid
## 1 0.164624501 279 0.005506392    None    None CHEM100006374
## 2 0.068903188 279 0.247951327    None    None     HMDB07098
## 3 0.001667259 279 0.977723441    None    None     HMDB01881
##                          outcome exposure_uid     exposure adj_uid  adj
## 1 1,2,3-benzenetriol sulfate (2)          age Age at Entry    None None
## 2              DG(16:0/16:0/0:0)          age Age at Entry    None None
## 3               Propylene glycol          age Age at Entry    None None

To display the heatmap of the resulting correlation matrix, use the showheatmap function.




To display the hierarchical clustering of the resulting correlation matrix, use the showHClust function. This diplay requires at least 2 rows and 2 columns in the correlation matrix.

## [1] "Analysis will run on 'All metabolites'"
## [1] "No factors found,  only performing near zero variance check for all covariates."
## [1] "Removed 1 outcome(s): bradykinin because of zero-variance"
## [1] "running unadjusted"

Results can be written to an output CSV file with the following command:

4. Run Stratified Correlation Analysis

The stratified correlation analysis is run by calling the function stratCorr(). This function runs the model(s) that is(are) defined in the input data (Models tab) or in interactive mode. In this example, exmodeldata includes an object scovs that specifies the stratification variable. The function requires one stratification variable at a time.

## [1] "Running analysis on subjects stratified by  race_grp   0"
## [1] 912   4
## [1] "No factors found,  only performing near zero variance check for all covariates."
## [1] "running unadjusted"
## [1] "Running analysis on subjects stratified by  race_grp   1"
## [1] 87  4
## [1] "No factors found,  only performing near zero variance check for all covariates."
## [1] "running unadjusted"
## [1] "Running analysis on subjects stratified by  race_grp   2"
## [1] 1 4
## Warning in COMETS::runCorr(exmodeldata2, exmetabdata, "DPP"): Warning: strata
## (race_grp=2 could not be run because model check failed

5. Run Analysis on all models defined in the input Excell sheet (‘super-batch’ mode)

All models desginated in the input file can be run with one command, and individual output CSV files or correlation results will be written in the current directory by default. The function returns a list of resulting data frames.

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2           tidyr_1.1.2          jsonlite_1.7.2      
##  [4] viridisLite_0.3.0    splines_4.0.4        foreach_1.5.1       
##  [7] tmvnsim_1.0-2        prodlim_2019.11.13   Formula_1.2-4       
## [10] assertthat_0.2.1     stats4_4.0.4         latticeExtra_0.6-29 
## [13] cellranger_1.1.0     yaml_2.2.1           ipred_0.9-10        
## [16] pillar_1.5.1         backports_1.2.1      lattice_0.20-41     
## [19] glue_1.4.2           pROC_1.17.0.1        digest_0.6.27       
## [22] RColorBrewer_1.1-2   checkmate_2.0.0      colorspace_2.0-0    
## [25] recipes_0.1.15       htmltools_0.5.1.1    Matrix_1.3-2        
## [28] plyr_1.8.6           psych_2.0.12         timeDate_3043.102   
## [31] pkgconfig_2.0.3      caret_6.0-86         purrr_0.3.4         
## [34] scales_1.1.1         jpeg_0.1-8.1         gower_0.2.2         
## [37] lava_1.6.8.1         tibble_3.1.0         htmlTable_2.1.0     
## [40] farver_2.0.3         generics_0.1.0       ggplot2_3.3.3       
## [43] ellipsis_0.3.1       withr_2.4.1          nnet_7.3-15         
## [46] lazyeval_0.2.2       mnormt_2.0.2         readxl_1.3.1        
## [49] survival_3.2-7       magrittr_2.0.1       crayon_1.4.1        
## [52] evaluate_0.14        fansi_0.4.2          nlme_3.1-152        
## [55] MASS_7.3-53          foreign_0.8-81       class_7.3-18        
## [58] tools_4.0.4          data.table_1.14.0    lifecycle_1.0.0     
## [61] stringr_1.4.0        plotly_4.9.3         munsell_0.5.0       
## [64] cluster_2.1.0        compiler_4.0.4       rlang_0.4.10        
## [67] grid_4.0.4           iterators_1.0.13     rstudioapi_0.13     
## [70] htmlwidgets_1.5.3    crosstalk_1.1.1      base64enc_0.1-3     
## [73] rmarkdown_2.6        gtable_0.3.0         ModelMetrics_1.2.2.2
## [76] codetools_0.2-18     d3heatmap_0.6.1.3    reshape2_1.4.4      
## [79] R6_2.5.0             gridExtra_2.3        lubridate_1.7.9.2   
## [82] knitr_1.31           dplyr_0.8.5          COMETS_1.4.0.0      
## [85] utf8_1.2.1           Hmisc_4.5-0          stringi_1.5.3       
## [88] parallel_4.0.4       Rcpp_1.0.6           vctrs_0.3.6         
## [91] rpart_4.1-15         png_0.1-7            tidyselect_1.1.0    
## [94] xfun_0.20