Running COMETS Analytics locally

Analysis Workflow for correlation analysis

1. Load Data

The first step of the analysis is to load in the data with the getModelData() function. Input for this function is an Excel spreadsheet, per the description above.

# Retrieve the full path of the input data
dir <- system.file("extdata", package="COMETS", mustWork=TRUE)
csvfile <- file.path(dir, "cometsInputAge.xlsx")

# Read in and process the input data
exmetabdata <- COMETS::readCOMETSinput(csvfile)

## [1] "Metabolites sheet is read in"

## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...

## [1] "SubjectMetabolites sheet is read in"

## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...

## [1] "SubjectData sheet is read in"
## [1] "VarMap sheet is read in"
## [1] "Models sheet is read in"
## [1] "There are 14 categorical variables"
## [1] "Running Integrity Check..."
## character(0)

## Joining, by = "sample_id"

## Joining, by = "hmdb_id"

## [1] "Input data has passed QC (metabolite and sample names match in all input files)"

To plot some the distribution of variances for each metabolite:

COMETS::plotVar(exmetabdata,titlesize=12)

## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.

To plot the distribution of minimum/missing values:

COMETS::plotMinvalues(exmetabdata,titlesize=12)

## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.

2. Get Model Data

There are 2 ways to specify your model, batch or interactive. In Batch mode, models are specified in your input file. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run. Input for this function is the data input in the previous step:

exmodeldata <- COMETS::getModelData(exmetabdata,modlabel="1 Gender adjusted")

In Interactive mode, models are specified as parameters. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run. Input for this function is the data input in the previous step:

exmodeldata <- COMETS::getModelData(exmetabdata, modelspec="Interactive",
    colvars=c("age","bmi_grp"), where=c("age>40","bmi_grp>2"))

## [1] "Analysis will run on 'All metabolites'"
## [1] "Filtering subjects according to the rule(s)age>40. 279 of 1000are retained"   
## [2] "Filtering subjects according to the rule(s)bmi_grp>2. 279 of 1000are retained"

3. Run Simple Correlation Analysis

The unstratified correlation analysis is run by calling the function runCorr(). This function runs the model(s) that is(are) defined in the input data (Models tab).

excorrdata  <- COMETS::runCorr(exmodeldata,exmetabdata,"DPP")

## [1] "No factors found,  only performing near zero variance check for all covariates."
## [1] "Removed 1exposure(s):NA because of zero-variance"
## [1] "running unadjusted"

The output of the correlation analysis can then be compiled and output to a CSV file with the following function:

COMETS::OutputCSVResults(filename="corr",dataf=excorrdata,cohort="DPP")

To view the first 3 lines of the correlation analysis output, simply type:

COMETS::showCorr(excorrdata,nlines=3)

##   cohort        spec model                   outcomespec exposurespec
## 1    DPP Interactive       _1_2_3_benzenetriol_sulfate_2          age
## 2    DPP Interactive            _1_2_dipalmitoylglycerol          age
## 3    DPP Interactive                    _1_2_propanediol          age
##          corr   n      pvalue adjspec adjvars   outcome_uid
## 1 0.164624501 279 0.005506392    None    None CHEM100006374
## 2 0.068903188 279 0.247951327    None    None     HMDB07098
## 3 0.001667259 279 0.977723441    None    None     HMDB01881
##                          outcome exposure_uid     exposure adj_uid  adj
## 1 1,2,3-benzenetriol sulfate (2)          age Age at Entry    None None
## 2              DG(16:0/16:0/0:0)          age Age at Entry    None None
## 3               Propylene glycol          age Age at Entry    None None

To display the heatmap of the resulting correlation matrix, use the showheatmap function.

COMETS::showHeatmap(excorrdata)

To display the hierarchical clustering of the resulting correlation matrix, use the showHClust function. This diplay requires at least 2 rows and 2 columns in the correlation matrix.

exmodeldata<-COMETS::getModelData(exmetabdata,modelspec = "Interactive",colvars = c("bmi_grp","age"))

## [1] "Analysis will run on 'All metabolites'"

excorrdata  <- COMETS::runCorr(exmodeldata,exmetabdata,"DPP")

## [1] "No factors found,  only performing near zero variance check for all covariates."
## [1] "Removed 1 outcome(s): bradykinin because of zero-variance"
## [1] "running unadjusted"

COMETS::showHClust(excorrdata)

Results can be written to an output CSV file with the following command:

COMETS::OutputCSVResults("Model1",excorrdata,cohort="")

4. Run Stratified Correlation Analysis

The stratified correlation analysis is run by calling the function stratCorr(). This function runs the model(s) that is(are) defined in the input data (Models tab) or in interactive mode. In this example, exmodeldata includes an object scovs that specifies the stratification variable. The function requires one stratification variable at a time.

  exmodeldata2 <- COMETS::getModelData(exmetabdata,modelspec="Interactive",rowvars=c("lactose","lactate"),
    colvars=c("age","bmi_grp"),strvars="race_grp")
  excorrdata2  <- COMETS::runCorr(exmodeldata2,exmetabdata,"DPP")

## [1] "Running analysis on subjects stratified by  race_grp   0"
## [1] 912   4
## [1] "No factors found,  only performing near zero variance check for all covariates."
## [1] "running unadjusted"
## [1] "Running analysis on subjects stratified by  race_grp   1"
## [1] 87  4
## [1] "No factors found,  only performing near zero variance check for all covariates."
## [1] "running unadjusted"
## [1] "Running analysis on subjects stratified by  race_grp   2"
## [1] 1 4

## Warning in COMETS::runCorr(exmodeldata2, exmetabdata, "DPP"): Warning: strata
## (race_grp=2 could not be run because model check failed

5. Run Analysis on all models defined in the input Excell sheet (‘super-batch’ mode)

All models desginated in the input file can be run with one command, and individual output CSV files or correlation results will be written in the current directory by default. The function returns a list of resulting data frames.

 allresults <- COMETS::runAllModels(exmetabdata,writeTofile=F)

sessionInfo()

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2           tidyr_1.1.2          jsonlite_1.7.2      
##  [4] viridisLite_0.3.0    splines_4.0.4        foreach_1.5.1       
##  [7] tmvnsim_1.0-2        prodlim_2019.11.13   Formula_1.2-4       
## [10] assertthat_0.2.1     stats4_4.0.4         latticeExtra_0.6-29 
## [13] cellranger_1.1.0     yaml_2.2.1           ipred_0.9-10        
## [16] pillar_1.5.1         backports_1.2.1      lattice_0.20-41     
## [19] glue_1.4.2           pROC_1.17.0.1        digest_0.6.27       
## [22] RColorBrewer_1.1-2   checkmate_2.0.0      colorspace_2.0-0    
## [25] recipes_0.1.15       htmltools_0.5.1.1    Matrix_1.3-2        
## [28] plyr_1.8.6           psych_2.0.12         timeDate_3043.102   
## [31] pkgconfig_2.0.3      caret_6.0-86         purrr_0.3.4         
## [34] scales_1.1.1         jpeg_0.1-8.1         gower_0.2.2         
## [37] lava_1.6.8.1         tibble_3.1.0         htmlTable_2.1.0     
## [40] farver_2.0.3         generics_0.1.0       ggplot2_3.3.3       
## [43] ellipsis_0.3.1       withr_2.4.1          nnet_7.3-15         
## [46] lazyeval_0.2.2       mnormt_2.0.2         readxl_1.3.1        
## [49] survival_3.2-7       magrittr_2.0.1       crayon_1.4.1        
## [52] evaluate_0.14        fansi_0.4.2          nlme_3.1-152        
## [55] MASS_7.3-53          foreign_0.8-81       class_7.3-18        
## [58] tools_4.0.4          data.table_1.14.0    lifecycle_1.0.0     
## [61] stringr_1.4.0        plotly_4.9.3         munsell_0.5.0       
## [64] cluster_2.1.0        compiler_4.0.4       rlang_0.4.10        
## [67] grid_4.0.4           iterators_1.0.13     rstudioapi_0.13     
## [70] htmlwidgets_1.5.3    crosstalk_1.1.1      base64enc_0.1-3     
## [73] rmarkdown_2.6        gtable_0.3.0         ModelMetrics_1.2.2.2
## [76] codetools_0.2-18     d3heatmap_0.6.1.3    reshape2_1.4.4      
## [79] R6_2.5.0             gridExtra_2.3        lubridate_1.7.9.2   
## [82] knitr_1.31           dplyr_0.8.5          COMETS_1.4.0.0      
## [85] utf8_1.2.1           Hmisc_4.5-0          stringi_1.5.3       
## [88] parallel_4.0.4       Rcpp_1.0.6           vctrs_0.3.6         
## [91] rpart_4.1-15         png_0.1-7            tidyselect_1.1.0    
## [94] xfun_0.20

Running COMETS Analytics locally

Ewy Mathé, Ella Temprosa

2021-03-23

Introduction

Data Input Format