Example Usage
To use bento-mdf in a project, start by installing the latest version with pip install bento-mdf and importing it into your project.
import bento_mdf
from pathlib import Path # for file paths
from importlib.metadata import version # check package version
version("bento_mdf")
'0.13.0'
Loading the Model from MDF(s)
The bento-mdf package provides functionality for loading, validating, and manipulating MDF file content in Python.
The MDFReader class parses and validates MDF files, creating a bento-meta Model interface with convenient features, demonstrated below. An MDFReader is initialized with the relevant MDF file(s), filepath(s), or URL pointing to these.
from bento_mdf import MDFReader
Loading from File(s)
First, we can specify the paths to the MDF files we want to load. Then, we provide these to the MDFReader class to initalize the model. This loads the content of these files into their corresponding bento-meta Python object representations, which we can access via the Model object found at MDFReader.model.
(Note: if a top-level model Handle is not present in the MDFs, it needs to be provided to the MDFReader class’s handle argument.)
import logging
logging.basicConfig(filename='mdf.log')
mdf_dir = Path.cwd().parent / "tests" / "samples"
ctdc_model = mdf_dir / "ctdc_model_file.yaml"
ctdc_props = mdf_dir / "ctdc_model_properties_file.yaml"
mdf_from_file = MDFReader(ctdc_model, ctdc_props, handle="CTDC")
mdf_from_file.model
<bento_meta.model.Model at 0x7f30ac350da0>
Loading from URL(s)
Similarly, we can instantiate an MDF from URL(s) pointing to the model file(s):
model_url = "https://cbiit.github.io/icdc-model-tool/model-desc/icdc-model.yml"
props_url = "https://cbiit.github.io/icdc-model-tool/model-desc/icdc-model-props.yml"
mdf = MDFReader(model_url, props_url, handle="ICDC")
mdf.model
<bento_meta.model.Model at 0x7f309849dd90>
Setting the parameter raise_error to True in the MDFReader call will raise a RuntimeError if any MDF issues are found. In any case, all issues found will appear in the log.
Exploring the Model
Once we’ve loaded the model, we can start looking at the entities that make it up, including Nodes, Relationships, Properties, and Terms. These are conveniently stored in the bento-meta Model object.
Note: This example will use the model created in the previous section from a URL.
Nodes
Model nodes are stored as dictionaries in Model.nodes, where the keys are node handles and the values are bento-meta Node objects.
nodes = mdf.model.nodes
len(nodes)
26
list(nodes.keys())[:3]
['program', 'study', 'consent_group']
list(nodes.values())[:3]
[<bento_meta.objects.Node at 0x7f309849e2d0>,
<bento_meta.objects.Node at 0x7f308bfe05f0>,
<bento_meta.objects.Node at 0x7f308bfe0740>]
nodes["study"]
<bento_meta.objects.Node at 0x7f308bfe05f0>
The get_attr_dict() method is a convenient way to get a dictionary of a bento-meta Entity's set attributes. This will return string versions of the attributes. This can be useful for exploring the entity or for providing parameters to Neo4j Cypher queries.
Note: this only includes simple attributes and not other bento-meta Entities or collections of Entities. All attributes can be accessed via methods matching their names.
nodes["diagnosis"].get_attr_dict()
{'handle': 'diagnosis',
'model': 'ICDC',
'desc': 'The Diagnosis node contains numerous properties which fully characterize the type of cancer with which any given patient/subject/donor was diagnosed, inclusive of stage. This node also contains properties pertaining to comorbidities, and the availability of pathology reports, treatment data and follow-up data.'}
Relationships
Simlarly, Model relationships are stored in Model.edges. This is a dictionary where the keys are (edge.handle, src.handle, dst.handle) tuples. The values are Edge objects.
edges = mdf.model.edges
len(edges)
40
list(edges.keys())[:3]
[('member_of', 'case', 'cohort'),
('member_of', 'cohort', 'study_arm'),
('member_of', 'study_arm', 'study')]
list(edges.values())[:3]
[<bento_meta.objects.Edge at 0x7f308bfd1e50>,
<bento_meta.objects.Edge at 0x7f308bfd3950>,
<bento_meta.objects.Edge at 0x7f308bfd3560>]
edges[("of_case", "diagnosis", "case")].get_attr_dict()
{'handle': 'of_case', 'model': 'ICDC', 'multiplicity': 'many_to_one'}
edge = edges[("of_case", "diagnosis", "case")]
print(edge.handle, edge.src.handle, edge.dst.handle, sep=", ")
# TIP: here's a convenient method to get the 3-tuple of an edge
print(edge.triplet)
of_case, diagnosis, case
('of_case', 'diagnosis', 'case')
An Edge's src and dst attributes are Nodes
print(edge.src)
print(edge.src.handle)
<bento_meta.objects.Node object at 0x7f308bf69d00>
diagnosis
The Model object also has some useful methods to work with relationships/edges including:
edges_by_src(node)- get all edges that have a given node as their src attributeedges_by_dst(node)- get all edges that have a given node as their dst attributeedges_by_type(edge_handle)- get all edges that have a given edge type (i.e., handle)
[e.triplet for e in mdf.model.edges_by_dst(mdf.model.nodes["case"])]
[('of_case', 'enrollment', 'case'),
('of_case', 'demographic', 'case'),
('of_case', 'diagnosis', 'case'),
('of_case', 'cycle', 'case'),
('of_case', 'sample', 'case'),
('of_case', 'file', 'case'),
('of_case', 'visit', 'case'),
('of_case', 'adverse_event', 'case'),
('of_case', 'registration', 'case')]
[e.triplet for e in mdf.model.edges_by_type("of_study")]
[('of_study', 'human_relevance', 'study'),
('of_study', 'study_site', 'study'),
('of_study', 'principal_investigator', 'study'),
('of_study', 'file', 'study'),
('of_study', 'publication', 'study')]
Properties
Model properties are stored in Model.props. This is a dictionary where the keys are ({edge|node}.handle, prop.handle) tuples. The values are Property objects.
props = mdf.model.props
len(props)
229
list(props.keys())[:3]
[('program', 'program_name'),
('program', 'program_acronym'),
('program', 'program_short_description')]
list(props.values())[:3]
[<bento_meta.objects.Property at 0x7f308bfc7890>,
<bento_meta.objects.Property at 0x7f308bfc7a10>,
<bento_meta.objects.Property at 0x7f308bfc7fb0>]
primary_disease_site = props[("diagnosis", "primary_disease_site")]
primary_disease_site.get_attr_dict()
{'handle': 'primary_disease_site',
'model': 'ICDC',
'value_domain': 'value_set',
'is_required': 'Yes',
'is_key': 'False',
'is_nullable': 'False',
'is_strict': 'True',
'desc': 'The anatomical location at which the primary disease originated, recorded in relatively general terms at the subject level; the anatomical locations from which tumor samples subject to downstream analysis were acquired is recorded in more detailed terms at the sample level.'}
Properties with Value Sets
Properties with the value_domain “value_set” have the value_set attribute (bento-meta ValueSet), which has a terms attribute (bento-meta Term dictionary like {term.value: Term}).
primary_disease_site.value_set
<bento_meta.objects.ValueSet at 0x7f308b98a1b0>
primary_disease_site.value_set.terms
{'Abdomen': <bento_meta.objects.Term object at 0x7f308b98a540>, 'Bladder': <bento_meta.objects.Term object at 0x7f308b989b50>, 'Bladder, Prostate': <bento_meta.objects.Term object at 0x7f308b98adb0>, 'Bladder, Urethra': <bento_meta.objects.Term object at 0x7f308b989fd0>, 'Bladder, Urethra, Prostate': <bento_meta.objects.Term object at 0x7f308b98a570>, 'Bladder, Urethra, Vagina': <bento_meta.objects.Term object at 0x7f308b989fa0>, 'Bone': <bento_meta.objects.Term object at 0x7f308b98ad20>, 'Bone (Appendicular)': <bento_meta.objects.Term object at 0x7f308b98a780>, 'Bone (Axial)': <bento_meta.objects.Term object at 0x7f308b98a420>, 'Bone Marrow': <bento_meta.objects.Term object at 0x7f308b98a7e0>, 'Brain': <bento_meta.objects.Term object at 0x7f308b98b710>, 'Carpus': <bento_meta.objects.Term object at 0x7f308b98a960>, 'Chest Wall': <bento_meta.objects.Term object at 0x7f308b98ade0>, 'Cranial Sternum': <bento_meta.objects.Term object at 0x7f308b98a930>, 'Distal Urethra': <bento_meta.objects.Term object at 0x7f308b98acc0>, 'Elbow Joint': <bento_meta.objects.Term object at 0x7f308b98abd0>, 'Femur': <bento_meta.objects.Term object at 0x7f308b98b3b0>, 'Flank': <bento_meta.objects.Term object at 0x7f308b98afc0>, 'Hip': <bento_meta.objects.Term object at 0x7f308b98aff0>, 'Hock': <bento_meta.objects.Term object at 0x7f308b98b290>, 'Humerus': <bento_meta.objects.Term object at 0x7f308b98af60>, 'Inguinal Region': <bento_meta.objects.Term object at 0x7f308b98b1a0>, 'Kidney': <bento_meta.objects.Term object at 0x7f308b98b440>, 'Knee Region': <bento_meta.objects.Term object at 0x7f308b98b320>, 'Lip': <bento_meta.objects.Term object at 0x7f308b98b080>, 'Lung': <bento_meta.objects.Term object at 0x7f308b98b620>, 'Lymph Node': <bento_meta.objects.Term object at 0x7f308b98b740>, 'Mammary Gland': <bento_meta.objects.Term object at 0x7f308b98bb60>, 'Mandible': <bento_meta.objects.Term object at 0x7f308b98b8f0>, 'Maxilla': <bento_meta.objects.Term object at 0x7f308b98bf80>, 'Mouth': <bento_meta.objects.Term object at 0x7f308b98b950>, 'Neck': <bento_meta.objects.Term object at 0x7f308b98bcb0>, 'Not Applicable': <bento_meta.objects.Term object at 0x7f308b98b830>, 'Pleural Cavity': <bento_meta.objects.Term object at 0x7f308b98bb00>, 'Rib Region': <bento_meta.objects.Term object at 0x7f308b98bd40>, 'Shoulder': <bento_meta.objects.Term object at 0x7f308b98baa0>, 'Skin': <bento_meta.objects.Term object at 0x7f308b98bad0>, 'Spleen': <bento_meta.objects.Term object at 0x7f308b98bd10>, 'Subcutis': <bento_meta.objects.Term object at 0x7f308b98be00>, 'Tarsus': <bento_meta.objects.Term object at 0x7f308b98be30>, 'Thigh': <bento_meta.objects.Term object at 0x7f308b98bfe0>, 'Thorax': <bento_meta.objects.Term object at 0x7f308bfb5be0>, 'Thyroid Gland': <bento_meta.objects.Term object at 0x7f308bfb7620>, 'Unknown': <bento_meta.objects.Term object at 0x7f308bfb6cf0>, 'Urethra': <bento_meta.objects.Term object at 0x7f308bfb6a50>, 'Urethra, Prostate': <bento_meta.objects.Term object at 0x7f308bfb6570>, 'Urinary Tract': <bento_meta.objects.Term object at 0x7f308bfb6ba0>, 'Urogenital Tract': <bento_meta.objects.Term object at 0x7f308bfb6a80>}
Property objects with value sets have some useful methods to get to those terms and their values including:
.termsreturns a list ofTermobjects from the property’s value set.valuesreturns a list of the term values from the property’s value set
print(primary_disease_site.terms)
# TIP: this is the same object found at the ValueSet's `terms` attribute
print(primary_disease_site.terms is primary_disease_site.value_set.terms)
{'Abdomen': <bento_meta.objects.Term object at 0x7f308b98a540>, 'Bladder': <bento_meta.objects.Term object at 0x7f308b989b50>, 'Bladder, Prostate': <bento_meta.objects.Term object at 0x7f308b98adb0>, 'Bladder, Urethra': <bento_meta.objects.Term object at 0x7f308b989fd0>, 'Bladder, Urethra, Prostate': <bento_meta.objects.Term object at 0x7f308b98a570>, 'Bladder, Urethra, Vagina': <bento_meta.objects.Term object at 0x7f308b989fa0>, 'Bone': <bento_meta.objects.Term object at 0x7f308b98ad20>, 'Bone (Appendicular)': <bento_meta.objects.Term object at 0x7f308b98a780>, 'Bone (Axial)': <bento_meta.objects.Term object at 0x7f308b98a420>, 'Bone Marrow': <bento_meta.objects.Term object at 0x7f308b98a7e0>, 'Brain': <bento_meta.objects.Term object at 0x7f308b98b710>, 'Carpus': <bento_meta.objects.Term object at 0x7f308b98a960>, 'Chest Wall': <bento_meta.objects.Term object at 0x7f308b98ade0>, 'Cranial Sternum': <bento_meta.objects.Term object at 0x7f308b98a930>, 'Distal Urethra': <bento_meta.objects.Term object at 0x7f308b98acc0>, 'Elbow Joint': <bento_meta.objects.Term object at 0x7f308b98abd0>, 'Femur': <bento_meta.objects.Term object at 0x7f308b98b3b0>, 'Flank': <bento_meta.objects.Term object at 0x7f308b98afc0>, 'Hip': <bento_meta.objects.Term object at 0x7f308b98aff0>, 'Hock': <bento_meta.objects.Term object at 0x7f308b98b290>, 'Humerus': <bento_meta.objects.Term object at 0x7f308b98af60>, 'Inguinal Region': <bento_meta.objects.Term object at 0x7f308b98b1a0>, 'Kidney': <bento_meta.objects.Term object at 0x7f308b98b440>, 'Knee Region': <bento_meta.objects.Term object at 0x7f308b98b320>, 'Lip': <bento_meta.objects.Term object at 0x7f308b98b080>, 'Lung': <bento_meta.objects.Term object at 0x7f308b98b620>, 'Lymph Node': <bento_meta.objects.Term object at 0x7f308b98b740>, 'Mammary Gland': <bento_meta.objects.Term object at 0x7f308b98bb60>, 'Mandible': <bento_meta.objects.Term object at 0x7f308b98b8f0>, 'Maxilla': <bento_meta.objects.Term object at 0x7f308b98bf80>, 'Mouth': <bento_meta.objects.Term object at 0x7f308b98b950>, 'Neck': <bento_meta.objects.Term object at 0x7f308b98bcb0>, 'Not Applicable': <bento_meta.objects.Term object at 0x7f308b98b830>, 'Pleural Cavity': <bento_meta.objects.Term object at 0x7f308b98bb00>, 'Rib Region': <bento_meta.objects.Term object at 0x7f308b98bd40>, 'Shoulder': <bento_meta.objects.Term object at 0x7f308b98baa0>, 'Skin': <bento_meta.objects.Term object at 0x7f308b98bad0>, 'Spleen': <bento_meta.objects.Term object at 0x7f308b98bd10>, 'Subcutis': <bento_meta.objects.Term object at 0x7f308b98be00>, 'Tarsus': <bento_meta.objects.Term object at 0x7f308b98be30>, 'Thigh': <bento_meta.objects.Term object at 0x7f308b98bfe0>, 'Thorax': <bento_meta.objects.Term object at 0x7f308bfb5be0>, 'Thyroid Gland': <bento_meta.objects.Term object at 0x7f308bfb7620>, 'Unknown': <bento_meta.objects.Term object at 0x7f308bfb6cf0>, 'Urethra': <bento_meta.objects.Term object at 0x7f308bfb6a50>, 'Urethra, Prostate': <bento_meta.objects.Term object at 0x7f308bfb6570>, 'Urinary Tract': <bento_meta.objects.Term object at 0x7f308bfb6ba0>, 'Urogenital Tract': <bento_meta.objects.Term object at 0x7f308bfb6a80>}
True
print(primary_disease_site.values[20])
print(len(primary_disease_site.values))
print(primary_disease_site.values == list(primary_disease_site.terms.keys()))
Humerus
48
True
Properties via Parent
Model properties can also be accessed via their parent node|edge’s props attribute, which is a dictionary of properties.
diagnosis_props = nodes["diagnosis"].props
len(diagnosis_props)
15
list(diagnosis_props.keys())[:3]
['crdc_id', 'diagnosis_record_id', 'disease_term']
list(diagnosis_props.values())[:3]
[<bento_meta.objects.Property at 0x7f308bfe8d10>,
<bento_meta.objects.Property at 0x7f308b988080>,
<bento_meta.objects.Property at 0x7f308b9885f0>]
Properties accesed via their parents are the same Property objects found in Model.props.
diagnosis_props["primary_disease_site"] is props[("diagnosis", "primary_disease_site")]
True
Terms
Model terms are stored in Model.terms as a dictionary of Term objects. The keys are the term handles, and the values are the Term objects. Terms are used to relate string descriptors in the model, such as permissible values in a property’s value set, or semantic concepts from other frameworks that can describe an entity in the model via annotation (e.g. a caDSR Common Data Element/CDE annotating a model property).
The keys in Model.terms are (term.handle, term.origin) tuples and the values are bento-meta Term objects.
terms = mdf.model.terms
len(terms)
588
list(terms.keys())[:3]
[('program_name_text', 'caDSR', '11444542', '1.00'),
('program_short_name_text', 'caDSR', '11459801', '1.0'),
('clinical_study_identifier', 'caDSR', '5054234', '1')]
list(terms.values())[:3]
[<bento_meta.objects.Term at 0x7f308bfc7a40>,
<bento_meta.objects.Term at 0x7f308bfc7b90>,
<bento_meta.objects.Term at 0x7f308bfe88f0>]
shoulder = terms[("Shoulder", "ICDC")]
shoulder.get_attr_dict()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[33], line 1
----> 1 shoulder = terms[("Shoulder", "ICDC")]
2 shoulder.get_attr_dict()
KeyError: ('Shoulder', 'ICDC')
Terms via ValueSet
Terms that are part of value set can be accessed via the owner of that value set as well. This is the same object found in Model.terms
primary_disease_site.terms["Shoulder"] is shoulder
True
Term Annotations
Terms are also used to annotate model entities with semantic represenations from some other framework. For example, a Term from caDSR may be used to annotate a model property with a semantically equivalent CDE. In the MDF, these annotations are provided under the Term key for a given entity.
mdf_dir = Path.cwd().parent / "tests" / "samples"
model_with_terms = mdf_dir / "test-model-with-terms-a.yml"
# Tip: model 'Handle' key is in the yaml file so we don't need to provide one to MDF()
terms_mdf = MDFReader(model_with_terms)
terms_mdf.model
100%|███████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 8665.92it/s]
<bento_meta.model.Model at 0x105772c30>
Terms can annotate nodes, relationships, and properties. The annotating term(s) are linked to the annotated entity via a bento-meta Concept, which stores them in a dictionary of the same format found at Model.terms (i.e. {(term.value, term.origin_name): Term}).
case_concept = terms_mdf.model.nodes["case"].concept
case_concept
<bento_meta.objects.Concept at 0x106153140>
case_concept.terms
{('case_term', 'CTDC'): <bento_meta.objects.Term object at 0x106153020>, ('subject', 'caDSR'): <bento_meta.objects.Term object at 0x1061507d0>}
# TIP: to find an annotating CDE, we can look for entries where the origin is 'caDSR'
for term_key, term in case_concept.terms.items():
if term_key[1] == "caDSR":
print(term.get_attr_dict())
{'handle': 'subject', 'value': 'subject', 'origin_name': 'caDSR'}
terms_mdf.model.edges[("of_case", "sample", "case")].concept.terms
{('of_case_term', 'CTDC'): <bento_meta.objects.Term object at 0x106153c20>}
terms_mdf.model.props[("case", "case_id")].concept.terms
{('case_id', 'CTDC'): <bento_meta.objects.Term object at 0x106178fe0>}
# TIP: terms found in Model.terms are the same objects as those in an entity's concept
case_id_anno = terms_mdf.model.props[("case", "case_id")].concept.terms[("case_id", "CTDC")]
terms_mdf.model.terms[("case_id", "CTDC")] is case_id_anno
True
Model Diff
bento-mdf also provides the diff_models function, which can be used to compare two models and report on the differences between them. This is useful for comparing models that have been updated or modified over time.
diff_models() has two required arguments, both of which are bento_meta.Model objects:
mdl_a: The first model to compare.mdl_b: The second model to compare.
The function returns a dict with keys for nodes, edges, props, and terms, each with a dictionary with keys:
"added": found inmdl_abut not inmdl_b"removed": found inmdl_bbut not inmdl_a"changed": found in both models but with altered attributes
Writing MDF from the Model
Schema-valid MDF may produced from a bento-meta Model, using the MDFWriter class. This can be useful if you wish to make changes to the Model within Python using the update methods of that interface, and then write out the updated model in MDF format for sharing.
Consider a simple data model in MDF format:
# sample-model.yml
Handle: test
Version: 0.01
Nodes:
sample:
Props:
- sample_type
- amount
Relationships:
is_subsample_of:
Mul: many_to_one
Ends:
- Src: sample
Dst: sample
Props: null
PropDefinitions:
sample_type:
Enum:
- normal
- tumor
amount:
Type:
units:
- mg
value_type: number
Suppose we want to add a property from the ICDC model to this simple model, and write out a new MDF. We add the property to the model, then we can create an MDFWriter instance from the MDFReader instance. Then the mdf attribute of the writer will contain a dict that can be written as YAML.
import yaml
from bento_mdf import MDFReader, MDFWriter
smodel = MDFReader("./sample-model.yml")
new_prop = mdf.model.props[('sample', 'tumor_sample_origin')]
smodel.model.add_prop( smodel.model.nodes['sample'], new_prop )
print(yaml.dump(MDFWriter(smodel).mdf, indent=4))
Handle: test
Nodes:
sample:
Props:
- amount
- sample_type
- tumor_sample_origin
PropDefinitions:
amount:
Key: false
Nul: false
Req: false
Strict: true
Type:
units:
- mg
value_type: number
sample_type:
Enum:
- normal
- tumor
Key: false
Nul: false
Req: false
Strict: true
tumor_sample_origin:
Desc: An indication as to whether a tumor sample was derived from a primary
versus a metastatic tumor.
Enum:
- Primary
- Metastatic
- Not Applicable
- Unknown
Key: false
Nul: false
Req: 'Yes'
Strict: true
Tags:
Labeled: Tumor Sample Origin
Relationships:
is_subsample_of:
Ends:
- Dst: sample
Props: null
Src: sample
Mul: many_to_one
Props: null
Terms:
normal:
Origin: test
Value: normal
tumor:
Origin: test
Value: tumor
URI: null
Version: 0.01
Note that the new property tumor_sample_origin appears in the new MDF.
Make changes to the underlying model
Validating the Model
As the MDFReader class loads the model, it automatically validates it against the MDF schema and will raise an exception if the model is invalid. This will use the default schema unless one is provided via the MDFReader class’s mdf_schema argument.
bento-mdf also provides the MDFValidator class, which can be used to validate a model against the MDF schema directly.
from bento_mdf.validator import MDFValidator
validator = MDFValidator(
None,
*[ctdc_model, ctdc_props],
raise_error=True,
)
validator
<bento_mdf.validator.MDFValidator at 0x106186bd0>
validator.load_and_validate_schema(); # load and check that JSON schema is valid
validator.load_and_validate_yaml().as_dict(); # load and check YAML is valid
validator.validate_instance_with_schema(); # check YAML against the schema
If the schema or yaml instances (from MDF files) are invalid, the validation will fail.
from jsonschema import SchemaError, ValidationError
from yaml.parser import ParserError
from IPython.display import clear_output
Schema is invalid
bad_schema = mdf_dir / "mdf-bad-schema.yaml"
try:
MDFValidator(bad_schema, raise_error=True).load_and_validate_schema()
except SchemaError as e:
clear_output()
print(e)
'crobject' is not valid under any of the given schemas
Failed validating 'anyOf' in metaschema['properties']['properties']['additionalProperties']['properties']['type']:
{'anyOf': [{'$ref': '#/definitions/simpleTypes'},
{'type': 'array',
'items': {'$ref': '#/definitions/simpleTypes'},
'minItems': 1,
'uniqueItems': True}]}
On schema['properties']['UniversalNodeProperties']['type']:
'crobject'
YAML structure is invalid
bad_yaml = mdf_dir / "ctdc_model_bad.yaml"
try:
MDFValidator(None, bad_yaml, raise_error=True).load_and_validate_yaml()
except ParserError as e:
clear_output()
print(e)
while parsing a block mapping
in "/Users/jensenma/Code/bento-mdf/python/tests/samples/ctdc_model_bad.yaml", line 1, column 1
expected <block end>, but found '<block mapping start>'
in "/Users/jensenma/Code/bento-mdf/python/tests/samples/ctdc_model_bad.yaml", line 3, column 3
MDF YAMLs are invalid against the MDF schema
test_schema = mdf_dir / "mdf-schema.yaml"
ctdc_bad = mdf_dir / "ctdc_model_file_invalid.yaml"
try:
v = MDFValidator(
test_schema,
*[ctdc_bad, ctdc_props],
raise_error=True
)
v.load_and_validate_schema()
v.load_and_validate_yaml()
v.validate_instance_with_schema()
except ValidationError as e:
clear_output()
print(e)
'case.show_node' does not match '^[A-Za-z_][A-Za-z0-9_]*$'
Failed validating 'pattern' in schema['properties']['PropDefinitions']['propertyNames']:
{'$id': '#snake_case_id',
'type': 'string',
'pattern': '^[A-Za-z_][A-Za-z0-9_]*$'}
On instance['PropDefinitions']:
'case.show_node'
from bento_mdf.diff import diff_models
old_model = mdf_dir / "test-model-d.yml"
new_model = mdf_dir / "test-model-e.yml"
old_mdf = MDFReader(old_model, handle="TEST")
new_mdf = MDFReader(new_model, handle="TEST")
diff_models(mdl_a=old_mdf.model, mdl_b=new_mdf.model)
{'nodes': {'changed': {'diagnosis': {'props': {'removed': {'fatal': <bento_meta.objects.Property at 0x1061d73b0>},
'added': None}}},
'removed': None,
'added': {'outcome': <bento_meta.objects.Node at 0x10619cad0>}},
'edges': {'removed': None,
'added': {('end_result',
'diagnosis',
'outcome'): <bento_meta.objects.Edge at 0x10619ffe0>}},
'props': {'removed': {('diagnosis',
'fatal'): <bento_meta.objects.Property at 0x1061d73b0>},
'added': {('outcome',
'fatal'): <bento_meta.objects.Property at 0x1061e0c80>}}}
diff_models has two optional arguments:
objects_as_dicts: if True, the output will convertbento-meta Entityobjects likeNodeorEdgeto dictionaries withget_attr_dict()include_summary: if True, the output will include a formatted string summary of the differences between the two models. This can be useful for GitHub changelogs when a model is updated, for example.
diff = diff_models(
old_mdf.model,
new_mdf.model,
objects_as_dicts=True, include_summary=True)
diff["nodes"]["changed"]
{'diagnosis': {'props': {'removed': {'fatal': {'handle': 'fatal',
'model': 'TEST',
'value_domain': 'value_set',
'is_required': 'False',
'is_key': 'False',
'is_nullable': 'False',
'is_strict': 'True'}},
'added': None}}}
print(diff["summary"], sep="\n")
1 node(s) added; 1 edge(s) added; 1 prop(s) removed; 1 prop(s) added; 1 attribute(s) changed for 1 node(s)
- Added node: 'outcome'
- Added edge: 'end_result' with src: 'diagnosis' and dst: 'outcome'
- Removed prop: 'fatal' with parent: 'diagnosis'
- Added prop: 'fatal' with parent: 'outcome'