Example Usage
To use bento-mdf in a project, start by installing the latest version with pip install bento-mdf and importing it into your project.
import bento_mdf
from pathlib import Path # for file paths
from importlib.metadata import version # check package version
version("bento_mdf")
'0.13.1'
Loading the Model from MDF(s)
The bento-mdf package provides functionality for loading, validating, and manipulating MDF file content in Python.
The MDFReader class parses and validates MDF files, creating a bento-meta Model interface with convenient features, demonstrated below. An MDFReader is initialized with the relevant MDF file(s), filepath(s), or URL pointing to these.
from bento_mdf import MDFReader
Loading from File(s)
First, we can specify the paths to the MDF files we want to load. Then, we provide these to the MDFReader class to initalize the model. This loads the content of these files into their corresponding bento-meta Python object representations, which we can access via the Model object found at MDFReader.model.
(Note: if a top-level model Handle is not present in the MDFs, it needs to be provided to the MDFReader class’s handle argument.)
import logging
logging.basicConfig(filename='mdf.log')
mdf_dir = Path.cwd().parent / "tests" / "samples"
ctdc_model = mdf_dir / "ctdc_model_file.yaml"
ctdc_props = mdf_dir / "ctdc_model_properties_file.yaml"
mdf_from_file = MDFReader(ctdc_model, ctdc_props, handle="CTDC")
mdf_from_file.model
<bento_meta.model.Model at 0x7fcdb872ef60>
Loading from URL(s)
Similarly, we can instantiate an MDF from URL(s) pointing to the model file(s):
model_url = "https://cbiit.github.io/icdc-model-tool/model-desc/icdc-model.yml"
props_url = "https://cbiit.github.io/icdc-model-tool/model-desc/icdc-model-props.yml"
mdf = MDFReader(model_url, props_url, handle="ICDC")
mdf.model
<bento_meta.model.Model at 0x7fcdb8b4ede0>
Setting the parameter raise_error to True in the MDFReader call will raise a RuntimeError if any MDF issues are found. In any case, all issues found will appear in the log.
Exploring the Model
Once we’ve loaded the model, we can start looking at the entities that make it up, including Nodes, Relationships, Properties, and Terms. These are conveniently stored in the bento-meta Model object.
Note: This example will use the model created in the previous section from a URL.
Nodes
Model nodes are stored as dictionaries in Model.nodes, where the keys are node handles and the values are bento-meta Node objects.
nodes = mdf.model.nodes
len(nodes)
26
list(nodes.keys())[:3]
['program', 'study', 'consent_group']
list(nodes.values())[:3]
[<bento_meta.objects.Node at 0x7fcdb8b4ef00>,
<bento_meta.objects.Node at 0x7fcde09d2c30>,
<bento_meta.objects.Node at 0x7fcdb81bf920>]
nodes["study"]
<bento_meta.objects.Node at 0x7fcde09d2c30>
The get_attr_dict() method is a convenient way to get a dictionary of a bento-meta Entity's set attributes. This will return string versions of the attributes. This can be useful for exploring the entity or for providing parameters to Neo4j Cypher queries.
Note: this only includes simple attributes and not other bento-meta Entities or collections of Entities. All attributes can be accessed via methods matching their names.
nodes["diagnosis"].get_attr_dict()
{'handle': 'diagnosis',
'model': 'ICDC',
'desc': 'The Diagnosis node contains numerous properties which fully characterize the type of cancer with which any given patient/subject/donor was diagnosed, inclusive of stage. This node also contains properties pertaining to comorbidities, and the availability of pathology reports, treatment data and follow-up data.'}
Relationships
Simlarly, Model relationships are stored in Model.edges. This is a dictionary where the keys are (edge.handle, src.handle, dst.handle) tuples. The values are Edge objects.
edges = mdf.model.edges
len(edges)
40
list(edges.keys())[:3]
[('member_of', 'case', 'cohort'),
('member_of', 'cohort', 'study_arm'),
('member_of', 'study_arm', 'study')]
list(edges.values())[:3]
[<bento_meta.objects.Edge at 0x7fcdb81e7050>,
<bento_meta.objects.Edge at 0x7fcdb81e7110>,
<bento_meta.objects.Edge at 0x7fcdb81e70e0>]
edges[("of_case", "diagnosis", "case")].get_attr_dict()
{'handle': 'of_case', 'model': 'ICDC', 'multiplicity': 'many_to_one'}
edge = edges[("of_case", "diagnosis", "case")]
print(edge.handle, edge.src.handle, edge.dst.handle, sep=", ")
# TIP: here's a convenient method to get the 3-tuple of an edge
print(edge.triplet)
of_case, diagnosis, case
('of_case', 'diagnosis', 'case')
An Edge's src and dst attributes are Nodes
print(edge.src)
print(edge.src.handle)
<bento_meta.objects.Node object at 0x7fcdb8175be0>
diagnosis
The Model object also has some useful methods to work with relationships/edges including:
edges_by_src(node)- get all edges that have a given node as their src attributeedges_by_dst(node)- get all edges that have a given node as their dst attributeedges_by_type(edge_handle)- get all edges that have a given edge type (i.e., handle)
[e.triplet for e in mdf.model.edges_by_dst(mdf.model.nodes["case"])]
[('of_case', 'enrollment', 'case'),
('of_case', 'demographic', 'case'),
('of_case', 'diagnosis', 'case'),
('of_case', 'cycle', 'case'),
('of_case', 'sample', 'case'),
('of_case', 'file', 'case'),
('of_case', 'visit', 'case'),
('of_case', 'adverse_event', 'case'),
('of_case', 'registration', 'case')]
[e.triplet for e in mdf.model.edges_by_type("of_study")]
[('of_study', 'human_relevance', 'study'),
('of_study', 'study_site', 'study'),
('of_study', 'principal_investigator', 'study'),
('of_study', 'file', 'study'),
('of_study', 'publication', 'study')]
Properties
Model properties are stored in Model.props. This is a dictionary where the keys are ({edge|node}.handle, prop.handle) tuples. The values are Property objects.
props = mdf.model.props
len(props)
230
list(props.keys())[:3]
[('program', 'program_name'),
('program', 'program_acronym'),
('program', 'program_short_description')]
list(props.values())[:3]
[<bento_meta.objects.Property at 0x7fcdb8003290>,
<bento_meta.objects.Property at 0x7fcdb80034a0>,
<bento_meta.objects.Property at 0x7fcdb8003d40>]
primary_disease_site = props[("diagnosis", "primary_disease_site")]
primary_disease_site.get_attr_dict()
{'handle': 'primary_disease_site',
'model': 'ICDC',
'value_domain': 'value_set',
'is_required': 'Yes',
'is_key': 'False',
'is_nullable': 'False',
'is_strict': 'True',
'desc': 'The anatomical location at which the primary disease originated, recorded in relatively general terms at the subject level; the anatomical locations from which tumor samples subject to downstream analysis were acquired is recorded in more detailed terms at the sample level.'}
Properties with Value Sets
Properties with the value_domain “value_set” have the value_set attribute (bento-meta ValueSet), which has a terms attribute (bento-meta Term dictionary like {term.value: Term}).
primary_disease_site.value_set
<bento_meta.objects.ValueSet at 0x7fcdb8022840>
primary_disease_site.value_set.terms
{'Abdomen': <bento_meta.objects.Term object at 0x7fcdb8020aa0>, 'Bladder': <bento_meta.objects.Term object at 0x7fcdb80229c0>, 'Bladder, Prostate': <bento_meta.objects.Term object at 0x7fcdb8022ba0>, 'Bladder, Urethra': <bento_meta.objects.Term object at 0x7fcdb8022b70>, 'Bladder, Urethra, Prostate': <bento_meta.objects.Term object at 0x7fcdb8022630>, 'Bladder, Urethra, Vagina': <bento_meta.objects.Term object at 0x7fcdb8022c60>, 'Bone': <bento_meta.objects.Term object at 0x7fcdb8022510>, 'Bone (Appendicular)': <bento_meta.objects.Term object at 0x7fcdb8023890>, 'Bone (Axial)': <bento_meta.objects.Term object at 0x7fcdb8022ff0>, 'Bone Marrow': <bento_meta.objects.Term object at 0x7fcdb8022d80>, 'Brain': <bento_meta.objects.Term object at 0x7fcdb8023230>, 'Carpus': <bento_meta.objects.Term object at 0x7fcdb8022de0>, 'Chest Wall': <bento_meta.objects.Term object at 0x7fcdb8023050>, 'Cranial Sternum': <bento_meta.objects.Term object at 0x7fcdb80233b0>, 'Distal Urethra': <bento_meta.objects.Term object at 0x7fcdb80234a0>, 'Elbow Joint': <bento_meta.objects.Term object at 0x7fcdb8023620>, 'Femur': <bento_meta.objects.Term object at 0x7fcdb80233e0>, 'Flank': <bento_meta.objects.Term object at 0x7fcdb80239b0>, 'Hip': <bento_meta.objects.Term object at 0x7fcdb8023b00>, 'Hock': <bento_meta.objects.Term object at 0x7fcdb80236b0>, 'Humerus': <bento_meta.objects.Term object at 0x7fcdb8023b30>, 'Inguinal Region': <bento_meta.objects.Term object at 0x7fcdb8023b60>, 'Kidney': <bento_meta.objects.Term object at 0x7fcdb8023bf0>, 'Knee Region': <bento_meta.objects.Term object at 0x7fcdb8023c80>, 'Lip': <bento_meta.objects.Term object at 0x7fcdb8023e60>, 'Lung': <bento_meta.objects.Term object at 0x7fcdb8023e90>, 'Lymph Node': <bento_meta.objects.Term object at 0x7fcdb8023fe0>, 'Mammary Gland': <bento_meta.objects.Term object at 0x7fcdb8023e00>, 'Mandible': <bento_meta.objects.Term object at 0x7fcdb8023f20>, 'Maxilla': <bento_meta.objects.Term object at 0x7fcdb8023dd0>, 'Mouth': <bento_meta.objects.Term object at 0x7fcdb8023350>, 'Neck': <bento_meta.objects.Term object at 0x7fcdb8022f60>, 'Not Applicable': <bento_meta.objects.Term object at 0x7fcdb8020530>, 'Pleural Cavity': <bento_meta.objects.Term object at 0x7fcdb8155730>, 'Rib Region': <bento_meta.objects.Term object at 0x7fcdb8154fe0>, 'Shoulder': <bento_meta.objects.Term object at 0x7fcdb81d64e0>, 'Skin': <bento_meta.objects.Term object at 0x7fcdb81d51f0>, 'Spleen': <bento_meta.objects.Term object at 0x7fcdb81d55b0>, 'Subcutis': <bento_meta.objects.Term object at 0x7fcdb81d5730>, 'Tarsus': <bento_meta.objects.Term object at 0x7fcdb81d5100>, 'Thigh': <bento_meta.objects.Term object at 0x7fcdb81d5790>, 'Thorax': <bento_meta.objects.Term object at 0x7fcdb81d5640>, 'Thyroid Gland': <bento_meta.objects.Term object at 0x7fcdb81d5ca0>, 'Unknown': <bento_meta.objects.Term object at 0x7fcdb81d5880>, 'Urethra': <bento_meta.objects.Term object at 0x7fcdb81d6f00>, 'Urethra, Prostate': <bento_meta.objects.Term object at 0x7fcdb81d5c10>, 'Urinary Tract': <bento_meta.objects.Term object at 0x7fcdb81d5dc0>, 'Urogenital Tract': <bento_meta.objects.Term object at 0x7fcdb81d5f10>}
Property objects with value sets have some useful methods to get to those terms and their values including:
.termsreturns a list ofTermobjects from the property’s value set.valuesreturns a list of the term values from the property’s value set
print(primary_disease_site.terms)
# TIP: this is the same object found at the ValueSet's `terms` attribute
print(primary_disease_site.terms is primary_disease_site.value_set.terms)
{'Abdomen': <bento_meta.objects.Term object at 0x7fcdb8020aa0>, 'Bladder': <bento_meta.objects.Term object at 0x7fcdb80229c0>, 'Bladder, Prostate': <bento_meta.objects.Term object at 0x7fcdb8022ba0>, 'Bladder, Urethra': <bento_meta.objects.Term object at 0x7fcdb8022b70>, 'Bladder, Urethra, Prostate': <bento_meta.objects.Term object at 0x7fcdb8022630>, 'Bladder, Urethra, Vagina': <bento_meta.objects.Term object at 0x7fcdb8022c60>, 'Bone': <bento_meta.objects.Term object at 0x7fcdb8022510>, 'Bone (Appendicular)': <bento_meta.objects.Term object at 0x7fcdb8023890>, 'Bone (Axial)': <bento_meta.objects.Term object at 0x7fcdb8022ff0>, 'Bone Marrow': <bento_meta.objects.Term object at 0x7fcdb8022d80>, 'Brain': <bento_meta.objects.Term object at 0x7fcdb8023230>, 'Carpus': <bento_meta.objects.Term object at 0x7fcdb8022de0>, 'Chest Wall': <bento_meta.objects.Term object at 0x7fcdb8023050>, 'Cranial Sternum': <bento_meta.objects.Term object at 0x7fcdb80233b0>, 'Distal Urethra': <bento_meta.objects.Term object at 0x7fcdb80234a0>, 'Elbow Joint': <bento_meta.objects.Term object at 0x7fcdb8023620>, 'Femur': <bento_meta.objects.Term object at 0x7fcdb80233e0>, 'Flank': <bento_meta.objects.Term object at 0x7fcdb80239b0>, 'Hip': <bento_meta.objects.Term object at 0x7fcdb8023b00>, 'Hock': <bento_meta.objects.Term object at 0x7fcdb80236b0>, 'Humerus': <bento_meta.objects.Term object at 0x7fcdb8023b30>, 'Inguinal Region': <bento_meta.objects.Term object at 0x7fcdb8023b60>, 'Kidney': <bento_meta.objects.Term object at 0x7fcdb8023bf0>, 'Knee Region': <bento_meta.objects.Term object at 0x7fcdb8023c80>, 'Lip': <bento_meta.objects.Term object at 0x7fcdb8023e60>, 'Lung': <bento_meta.objects.Term object at 0x7fcdb8023e90>, 'Lymph Node': <bento_meta.objects.Term object at 0x7fcdb8023fe0>, 'Mammary Gland': <bento_meta.objects.Term object at 0x7fcdb8023e00>, 'Mandible': <bento_meta.objects.Term object at 0x7fcdb8023f20>, 'Maxilla': <bento_meta.objects.Term object at 0x7fcdb8023dd0>, 'Mouth': <bento_meta.objects.Term object at 0x7fcdb8023350>, 'Neck': <bento_meta.objects.Term object at 0x7fcdb8022f60>, 'Not Applicable': <bento_meta.objects.Term object at 0x7fcdb8020530>, 'Pleural Cavity': <bento_meta.objects.Term object at 0x7fcdb8155730>, 'Rib Region': <bento_meta.objects.Term object at 0x7fcdb8154fe0>, 'Shoulder': <bento_meta.objects.Term object at 0x7fcdb81d64e0>, 'Skin': <bento_meta.objects.Term object at 0x7fcdb81d51f0>, 'Spleen': <bento_meta.objects.Term object at 0x7fcdb81d55b0>, 'Subcutis': <bento_meta.objects.Term object at 0x7fcdb81d5730>, 'Tarsus': <bento_meta.objects.Term object at 0x7fcdb81d5100>, 'Thigh': <bento_meta.objects.Term object at 0x7fcdb81d5790>, 'Thorax': <bento_meta.objects.Term object at 0x7fcdb81d5640>, 'Thyroid Gland': <bento_meta.objects.Term object at 0x7fcdb81d5ca0>, 'Unknown': <bento_meta.objects.Term object at 0x7fcdb81d5880>, 'Urethra': <bento_meta.objects.Term object at 0x7fcdb81d6f00>, 'Urethra, Prostate': <bento_meta.objects.Term object at 0x7fcdb81d5c10>, 'Urinary Tract': <bento_meta.objects.Term object at 0x7fcdb81d5dc0>, 'Urogenital Tract': <bento_meta.objects.Term object at 0x7fcdb81d5f10>}
True
print(primary_disease_site.values[20])
print(len(primary_disease_site.values))
print(primary_disease_site.values == list(primary_disease_site.terms.keys()))
Humerus
48
True
Properties via Parent
Model properties can also be accessed via their parent node|edge’s props attribute, which is a dictionary of properties.
diagnosis_props = nodes["diagnosis"].props
len(diagnosis_props)
15
list(diagnosis_props.keys())[:3]
['crdc_id', 'diagnosis_record_id', 'disease_term']
list(diagnosis_props.values())[:3]
[<bento_meta.objects.Property at 0x7fcdb81dc680>,
<bento_meta.objects.Property at 0x7fcdb8020b90>,
<bento_meta.objects.Property at 0x7fcdb8021580>]
Properties accesed via their parents are the same Property objects found in Model.props.
diagnosis_props["primary_disease_site"] is props[("diagnosis", "primary_disease_site")]
True
Terms
Model terms are stored in Model.terms as a dictionary of Term objects. The keys are the term handles, and the values are the Term objects. Terms are used to relate string descriptors in the model, such as permissible values in a property’s value set, or semantic concepts from other frameworks that can describe an entity in the model via annotation (e.g. a caDSR Common Data Element/CDE annotating a model property).
The keys in Model.terms are (term.handle, term.origin) tuples and the values are bento-meta Term objects.
terms = mdf.model.terms
len(terms)
588
list(terms.keys())[:3]
[('program_name_text', 'caDSR', '11444542', '1'),
('program_short_name_text', 'caDSR', '11459801', '1'),
('clinical_study_identifier', 'caDSR', '5054234', '1')]
list(terms.values())[:3]
[<bento_meta.objects.Term at 0x7fcdb80028a0>,
<bento_meta.objects.Term at 0x7fcdb8003770>,
<bento_meta.objects.Term at 0x7fcdb81ddd60>]
shoulder = terms[("Shoulder", "ICDC")]
shoulder.get_attr_dict()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[33], line 1
----> 1 shoulder = terms[("Shoulder", "ICDC")]
2 shoulder.get_attr_dict()
KeyError: ('Shoulder', 'ICDC')
Terms via ValueSet
Terms that are part of value set can be accessed via the owner of that value set as well. This is the same object found in Model.terms
primary_disease_site.terms["Shoulder"] is shoulder
True
Term Annotations
Terms are also used to annotate model entities with semantic represenations from some other framework. For example, a Term from caDSR may be used to annotate a model property with a semantically equivalent CDE. In the MDF, these annotations are provided under the Term key for a given entity.
mdf_dir = Path.cwd().parent / "tests" / "samples"
model_with_terms = mdf_dir / "test-model-with-terms-a.yml"
# Tip: model 'Handle' key is in the yaml file so we don't need to provide one to MDF()
terms_mdf = MDFReader(model_with_terms)
terms_mdf.model
100%|███████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 8665.92it/s]
<bento_meta.model.Model at 0x105772c30>
Terms can annotate nodes, relationships, and properties. The annotating term(s) are linked to the annotated entity via a bento-meta Concept, which stores them in a dictionary of the same format found at Model.terms (i.e. {(term.value, term.origin_name): Term}).
case_concept = terms_mdf.model.nodes["case"].concept
case_concept
<bento_meta.objects.Concept at 0x106153140>
case_concept.terms
{('case_term', 'CTDC'): <bento_meta.objects.Term object at 0x106153020>, ('subject', 'caDSR'): <bento_meta.objects.Term object at 0x1061507d0>}
# TIP: to find an annotating CDE, we can look for entries where the origin is 'caDSR'
for term_key, term in case_concept.terms.items():
if term_key[1] == "caDSR":
print(term.get_attr_dict())
{'handle': 'subject', 'value': 'subject', 'origin_name': 'caDSR'}
terms_mdf.model.edges[("of_case", "sample", "case")].concept.terms
{('of_case_term', 'CTDC'): <bento_meta.objects.Term object at 0x106153c20>}
terms_mdf.model.props[("case", "case_id")].concept.terms
{('case_id', 'CTDC'): <bento_meta.objects.Term object at 0x106178fe0>}
# TIP: terms found in Model.terms are the same objects as those in an entity's concept
case_id_anno = terms_mdf.model.props[("case", "case_id")].concept.terms[("case_id", "CTDC")]
terms_mdf.model.terms[("case_id", "CTDC")] is case_id_anno
True
Model Diff
bento-mdf also provides the diff_models function, which can be used to compare two models and report on the differences between them. This is useful for comparing models that have been updated or modified over time.
diff_models() has two required arguments, both of which are bento_meta.Model objects:
mdl_a: The first model to compare.mdl_b: The second model to compare.
The function returns a dict with keys for nodes, edges, props, and terms, each with a dictionary with keys:
"added": found inmdl_abut not inmdl_b"removed": found inmdl_bbut not inmdl_a"changed": found in both models but with altered attributes
Writing MDF from the Model
Schema-valid MDF may produced from a bento-meta Model, using the MDFWriter class. This can be useful if you wish to make changes to the Model within Python using the update methods of that interface, and then write out the updated model in MDF format for sharing.
Consider a simple data model in MDF format:
# sample-model.yml
Handle: test
Version: 0.01
Nodes:
sample:
Props:
- sample_type
- amount
Relationships:
is_subsample_of:
Mul: many_to_one
Ends:
- Src: sample
Dst: sample
Props: null
PropDefinitions:
sample_type:
Enum:
- normal
- tumor
amount:
Type:
units:
- mg
value_type: number
Suppose we want to add a property from the ICDC model to this simple model, and write out a new MDF. We add the property to the model, then we can create an MDFWriter instance from the MDFReader instance. Then the mdf attribute of the writer will contain a dict that can be written as YAML.
import yaml
from bento_mdf import MDFReader, MDFWriter
smodel = MDFReader("./sample-model.yml")
new_prop = mdf.model.props[('sample', 'tumor_sample_origin')]
smodel.model.add_prop( smodel.model.nodes['sample'], new_prop )
print(yaml.dump(MDFWriter(smodel).mdf, indent=4))
Handle: test
Nodes:
sample:
Props:
- amount
- sample_type
- tumor_sample_origin
PropDefinitions:
amount:
Key: false
Nul: false
Req: false
Strict: true
Type:
units:
- mg
value_type: number
sample_type:
Enum:
- normal
- tumor
Key: false
Nul: false
Req: false
Strict: true
tumor_sample_origin:
Desc: An indication as to whether a tumor sample was derived from a primary
versus a metastatic tumor.
Enum:
- Primary
- Metastatic
- Not Applicable
- Unknown
Key: false
Nul: false
Req: 'Yes'
Strict: true
Tags:
Labeled: Tumor Sample Origin
Relationships:
is_subsample_of:
Ends:
- Dst: sample
Props: null
Src: sample
Mul: many_to_one
Props: null
Terms:
normal:
Origin: test
Value: normal
tumor:
Origin: test
Value: tumor
URI: null
Version: 0.01
Note that the new property tumor_sample_origin appears in the new MDF.
Make changes to the underlying model
Validating the Model
As the MDFReader class loads the model, it automatically validates it against the MDF schema and will raise an exception if the model is invalid. This will use the default schema unless one is provided via the MDFReader class’s mdf_schema argument.
bento-mdf also provides the MDFValidator class, which can be used to validate a model against the MDF schema directly.
from bento_mdf.validator import MDFValidator
validator = MDFValidator(
None,
*[ctdc_model, ctdc_props],
raise_error=True,
)
validator
<bento_mdf.validator.MDFValidator at 0x106186bd0>
validator.load_and_validate_schema(); # load and check that JSON schema is valid
validator.load_and_validate_yaml().as_dict(); # load and check YAML is valid
validator.validate_instance_with_schema(); # check YAML against the schema
If the schema or yaml instances (from MDF files) are invalid, the validation will fail.
from jsonschema import SchemaError, ValidationError
from yaml.parser import ParserError
from IPython.display import clear_output
Schema is invalid
bad_schema = mdf_dir / "mdf-bad-schema.yaml"
try:
MDFValidator(bad_schema, raise_error=True).load_and_validate_schema()
except SchemaError as e:
clear_output()
print(e)
'crobject' is not valid under any of the given schemas
Failed validating 'anyOf' in metaschema['properties']['properties']['additionalProperties']['properties']['type']:
{'anyOf': [{'$ref': '#/definitions/simpleTypes'},
{'type': 'array',
'items': {'$ref': '#/definitions/simpleTypes'},
'minItems': 1,
'uniqueItems': True}]}
On schema['properties']['UniversalNodeProperties']['type']:
'crobject'
YAML structure is invalid
bad_yaml = mdf_dir / "ctdc_model_bad.yaml"
try:
MDFValidator(None, bad_yaml, raise_error=True).load_and_validate_yaml()
except ParserError as e:
clear_output()
print(e)
while parsing a block mapping
in "/Users/jensenma/Code/bento-mdf/python/tests/samples/ctdc_model_bad.yaml", line 1, column 1
expected <block end>, but found '<block mapping start>'
in "/Users/jensenma/Code/bento-mdf/python/tests/samples/ctdc_model_bad.yaml", line 3, column 3
MDF YAMLs are invalid against the MDF schema
test_schema = mdf_dir / "mdf-schema.yaml"
ctdc_bad = mdf_dir / "ctdc_model_file_invalid.yaml"
try:
v = MDFValidator(
test_schema,
*[ctdc_bad, ctdc_props],
raise_error=True
)
v.load_and_validate_schema()
v.load_and_validate_yaml()
v.validate_instance_with_schema()
except ValidationError as e:
clear_output()
print(e)
'case.show_node' does not match '^[A-Za-z_][A-Za-z0-9_]*$'
Failed validating 'pattern' in schema['properties']['PropDefinitions']['propertyNames']:
{'$id': '#snake_case_id',
'type': 'string',
'pattern': '^[A-Za-z_][A-Za-z0-9_]*$'}
On instance['PropDefinitions']:
'case.show_node'
from bento_mdf.diff import diff_models
old_model = mdf_dir / "test-model-d.yml"
new_model = mdf_dir / "test-model-e.yml"
old_mdf = MDFReader(old_model, handle="TEST")
new_mdf = MDFReader(new_model, handle="TEST")
diff_models(mdl_a=old_mdf.model, mdl_b=new_mdf.model)
{'nodes': {'changed': {'diagnosis': {'props': {'removed': {'fatal': <bento_meta.objects.Property at 0x1061d73b0>},
'added': None}}},
'removed': None,
'added': {'outcome': <bento_meta.objects.Node at 0x10619cad0>}},
'edges': {'removed': None,
'added': {('end_result',
'diagnosis',
'outcome'): <bento_meta.objects.Edge at 0x10619ffe0>}},
'props': {'removed': {('diagnosis',
'fatal'): <bento_meta.objects.Property at 0x1061d73b0>},
'added': {('outcome',
'fatal'): <bento_meta.objects.Property at 0x1061e0c80>}}}
diff_models has two optional arguments:
objects_as_dicts: if True, the output will convertbento-meta Entityobjects likeNodeorEdgeto dictionaries withget_attr_dict()include_summary: if True, the output will include a formatted string summary of the differences between the two models. This can be useful for GitHub changelogs when a model is updated, for example.
diff = diff_models(
old_mdf.model,
new_mdf.model,
objects_as_dicts=True, include_summary=True)
diff["nodes"]["changed"]
{'diagnosis': {'props': {'removed': {'fatal': {'handle': 'fatal',
'model': 'TEST',
'value_domain': 'value_set',
'is_required': 'False',
'is_key': 'False',
'is_nullable': 'False',
'is_strict': 'True'}},
'added': None}}}
print(diff["summary"], sep="\n")
1 node(s) added; 1 edge(s) added; 1 prop(s) removed; 1 prop(s) added; 1 attribute(s) changed for 1 node(s)
- Added node: 'outcome'
- Added edge: 'end_result' with src: 'diagnosis' and dst: 'outcome'
- Removed prop: 'fatal' with parent: 'diagnosis'
- Added prop: 'fatal' with parent: 'outcome'