bento_meta and MDF

Model Description Files

Model description files, or MDF, are simple descriptions of a graph data model, usually written in YAML. They are organized in terms of sections defined by the top-level keys Nodes, Relationships, and PropDefinitions. They describe the graph nodes and relationships belonging to a model, the properties associated with each. The MDF details and documents:

  • the “local” or internal names for each of these entities,

  • the node types that are allowed as the source and destination for each relationship,

  • detailed attributes that may be associated with any of these entities, for example:

    • the data types or enumerated values that are valid for a given property

    • whether a particular relationship or property is required to be present, for data to be valid

    • the cardinality (one-to-one, one-to-many, etc.) that is valid for a given relationship.

Examples of valid MDF are available for the Integrated Canine Data Commons and the Clinical Trials Data Commons models.

The MDF syntax is described here in detail. This syntax is defined by and can be validated against a JSONSchema document. This document lives here.

Slurping MDF into bento_meta

MDF functionality for bento_meta is included in the bento_mdf package, found at https://github.com/CBIIT/bento-mdf. The latest version can be installed as follows:

$ pip install bento_mdf@git+https://github.com/CBIIT/bento-mdf.git#egg=subdir\&subdirectory=drivers/python

Create a bento_meta.model.Model from MDF files as follows:

from bento_mdf.mdf import MDF

mdf = MDF('model.yml','model-props.yml', handle="test")
model = mdf.model

The model object can be used and modified as discussed in The Object Model. No database connection is necessary.

Note that the MDF can be spread out over multiple YAML files. Typically, the nodes and relationships are defined in one file, and the properties in a separate files. MDF merges the files provided according to the spec.

handle= is a keyword-only argument that must be provided, which sets the name (a.k.a. handle) for the model. This is used, for example, in setting the Entity.model attribute for the model objects. It enables pushing and pulling a model to and from a Neo4j database in a single call.

URLs that resolve to MDF files can also be used. To load the latest CTDC model, for example:

>>> from bento_mdf.mdf import MDF
>>> mdf = MDF('https://cbiit.github.io/ctdc-model/model-desc/ctdc_model_file.yaml',
>>>           'https://cbiit.github.io/ctdc-model/model-desc/ctdc_model_properties_file.yaml',
>>>            handle='CTDC')
>>> ctdc_model = mdf.model
>>> [x for x in mdf.model.nodes]
['case', 'specimen', 'metastatic_site', 'nucleic_acid', 'ihc_assay_report', 'sequencing_assay', 'variant_report', 'file', 'snv_variant', 'delins_variant', 'indel_variant', 'copy_number_variant', 'gene_fusion_variant', 'assignment_report', 'disease_eligibility_criterion', 'drug_eligibility_criterion', 'arm', 'clinical_trial']
>>> [x for x in mdf.model.nodes['case'].props]
['show_node', 'case_id', 'source_id', 'gender', 'race', 'ethnicity', 'patient_status', 'current_step', 'disease', 'ctep_category', 'ctep_subcategory', 'meddra_code', 'prior_drugs', 'extent_of_disease', 'ecog_performance_status']

Squirting a Model into MDF YAML

The MDF.write_mdf() returns a MDF-structured dictionary for a Model object, and optionally writes a YAML file encoding this dictionary:

mdf_dict = MDF.write_mdf(model=your_model)
# write to file
MDF.write_mdf(model=your_model, file=open("your_mdf.yaml", "w"))

The MDF will be written as a single file. Property definitions, for example, will be included with everything else.