bento_meta and MDF
Model Description Files
Model description files, or MDF, are simple descriptions of a graph data model, usually written in YAML. They are organized in terms of sections defined by the top-level keys Nodes
, Relationships
, and PropDefinitions
. They describe the graph nodes and relationships belonging to a model, the properties associated with each. The MDF details and documents:
the “local” or internal names for each of these entities,
the node types that are allowed as the source and destination for each relationship,
detailed attributes that may be associated with any of these entities, for example:
the data types or enumerated values that are valid for a given property
whether a particular relationship or property is required to be present, for data to be valid
the cardinality (one-to-one, one-to-many, etc.) that is valid for a given relationship.
Examples of valid MDF are available for the Integrated Canine Data Commons and the Clinical Trials Data Commons models.
The MDF syntax is described here in detail. This syntax is defined by and can be validated against a JSONSchema document. This document lives here.
Slurping MDF into bento_meta
MDF functionality for bento_meta is included in the bento_mdf
package, found at
https://github.com/CBIIT/bento-mdf. The latest version can be installed as follows:
$ pip install bento_mdf@git+https://github.com/CBIIT/bento-mdf.git#egg=subdir\&subdirectory=drivers/python
Create a bento_meta.model.Model
from MDF files as follows:
from bento_mdf.mdf import MDF
mdf = MDF('model.yml','model-props.yml', handle="test")
model = mdf.model
The model object can be used and modified as discussed in The Object Model. No database connection is necessary.
Note that the MDF can be spread out over multiple YAML files. Typically, the nodes and relationships are defined in one file, and the properties in a separate files. MDF merges the files provided according to the spec.
handle=
is a keyword-only argument that must be provided, which sets the name (a.k.a. handle) for the model. This is used, for example, in setting the Entity.model attribute for the model objects. It enables pushing and pulling a model to and from a Neo4j database in a single call.
URLs that resolve to MDF files can also be used. To load the latest CTDC model, for example:
>>> from bento_mdf.mdf import MDF
>>> mdf = MDF('https://cbiit.github.io/ctdc-model/model-desc/ctdc_model_file.yaml',
>>> 'https://cbiit.github.io/ctdc-model/model-desc/ctdc_model_properties_file.yaml',
>>> handle='CTDC')
>>> ctdc_model = mdf.model
>>> [x for x in mdf.model.nodes]
['case', 'specimen', 'metastatic_site', 'nucleic_acid', 'ihc_assay_report', 'sequencing_assay', 'variant_report', 'file', 'snv_variant', 'delins_variant', 'indel_variant', 'copy_number_variant', 'gene_fusion_variant', 'assignment_report', 'disease_eligibility_criterion', 'drug_eligibility_criterion', 'arm', 'clinical_trial']
>>> [x for x in mdf.model.nodes['case'].props]
['show_node', 'case_id', 'source_id', 'gender', 'race', 'ethnicity', 'patient_status', 'current_step', 'disease', 'ctep_category', 'ctep_subcategory', 'meddra_code', 'prior_drugs', 'extent_of_disease', 'ecog_performance_status']
Squirting a Model into MDF YAML
The MDF.write_mdf()
returns a MDF-structured dictionary for a Model
object, and optionally
writes a YAML file encoding this dictionary:
mdf_dict = MDF.write_mdf(model=your_model)
# write to file
MDF.write_mdf(model=your_model, file=open("your_mdf.yaml", "w"))
The MDF will be written as a single file. Property definitions, for example, will be included with everything else.