Syndrome Ontology: A Dynamic Tool for Diagnosis, Sunshine Weiss

I regret I was unable to be at PAS to present this work in May 2013. 

This page is a summary of the poster 119 at session 4500 in Hall D/E.

If you would like to discuss ontologies, diagnostic and teaching tools, or other aspects of this work, contact me at:

Sunshine Weiss, MD

Director of NICU Informatics

Santa Clara Valley Medical Center

San Jose, CA 95128

USA

Sunshine.Weiss@hhs.sccgov.org

415-997-9347

Overview

Background: Diagnosing infants with dysmorphic features requires clinical knowledge and experience. Syndrome finder organizes OMIM data by clinical finding, revealing patterns in data. This allows users to create evidence based differential diagnoses, and suggest physical exam findings and laboratories that can confirm diagnoses.

Methodology: The tool does not contain any patient data or any patient health information. The tool is implemented with Splink, java GUI to query an OpenRDF Sesame triple store with SPARQL, which runs on Apache Tomcat. The test users were presented with the tool and test cases and encouraged to use it in their clinical work environment. They were supported by the author who helped them enter their queries and showed them how to use the tool.  They were asked to create their own test cases and use the tool for them, and they were asked to compare the tool to methods they currently use.

How it works: A user types a search query, and a list of possible diagnoses is generated. Queries can contain strings and logical expressions. Once the output list is generated, the user can click on a diagnosis to see additional features associated with that diagnosis. They can then click on other syndromes or features to see their associations. Features include physical findings, lab anomalies, historical facts, and other types of data from the OMIM database.

The data structure of the ontology is based on the clinical findings organization outlined by the Online Mendelian Inheritance of Man and the anatomic categories outlined in Smith's Recognizable Patterns of Human Malformations. The tool allows for a variety of user algorithms, implemented through pattern matching of relationships stored in a dynamic graph navigation.

Results: Syndrome Ontology was validated by five users, four clinicians and a software engineer, using four fictitious sample test cases. Two cases were created by the tool developer and author, and two cases were generated by the users. The users were asked to compare the tool to methods they currently use.

Four of the five health care workers commented that the built in logic and interactive nature of the tool was useful, though the interface was difficult to navigate. They especially appreciated the ability to enter multiple symptoms and browse a differential diagnosis. The residents also found it helpful to be able to quickly read the list of features associated with several syndromes in the differential diagnosis, rather than have to sift through text in another context. More senior pediatricians found that the tool was easier to use and more useful than a textbook.

Conclusion: Syndrome Ontology empowers pediatricians, nurse practitioners, neonatologists, and geneticists to diagnose rare and common syndromes based on their patients' history, physical exams, and additional medical data. By focusing on the relationships between the dozens of findings and the hundreds of syndromes, we use the knowledge about relevant attributes to match patients with syndromes. This tool makes it less likely that a patient's diagnosis will be missed due to inexperience of the clinician and ensures that the patient's detected findings lead toward an appropriate diagnosis.

Background/Motivation

Genetic syndromes can be difficult to diagnose in infants and children. Pediatricians and geneticists are often called upon to evaluate the history and physical findings of patients with unknown syndromes in order to make predictions for their future medical and developmental needs and to answer questions about whether a condition is inheritable or life limiting. Estimates of the incidence of congenital malformations range between 0.7 and 7 percent of infants (1, 2). Many families who care for children with unidentified syndromes would benefit from diagnosing their child's syndrome so that they can better understand the developmental trajectory and needs of their child. With recent advances in genetic testing, identification of diagnoses is becoming easier. Patterns of malformations can be used to diagnose syndromes, and once a diagnosis is made.

Newborn infants are routinely assessed by pediatricians, neonatologists, or nurse practitioners, and of course, by their parents. When a child is found to have a congenital malformation, the question of diagnosis arises. There are dozens of categories of malformations, and each may be associated with a number of diagnoses. A particular syndrome, however, typically comprises a number of predictable findings. Patterns of multiple malformations, however, tend to predict a unique syndrome. Many syndromes are difficult to diagnose in the newborn period, before a child develops. Diagnosis is also complicated by the sheer number-- thousands-- of rare syndromes; many pediatricians, particularly residents and nurse practitioners do not know which syndromes are associated with their patients' findings. Syndromes that are rare or only recently described are less likely to be recognized because providers may not have sufficient experience with them to comfortably make the diagnoses. This decision support tool greatly facilitates the diagnosis of infants with dysmorphic features.

There are currently three principle tools used for diagnosing infants and children with congenital malformations:

The Online Mendelian Inheritence of Man is a database of human genes and genetic disorders developed by staff at Johns Hopkins University. It is a widely used and frequently updated database that contains entries for genetic diseases. It is widely used by geneticists and pediatricians searching for genetic disease information. This database supports basic text queries.

The widely used core dysmorphology text, "Smith's Recognizable Patterns of Human Malformation." This book contains descriptions of several hundred dysmorphic conditions, and an index of physical findings and the syndromes they are most frequently associated. This is a static reference. PossumWeb database is a searchable dysmorphology database developed in Australia. It runs only on windows platform. It does not offer dynamic search or reasoning.

There are currently no validated dysmorphology ontologies capable of inferring diagnoses, prognoses, or recurrence risks. This tool is the first tool to make inferences about syndromes based on clinical data. This dysmorphology ontology infers a differential diagnosis and suggests additional findings to narrow the diagnosis. It is intended to be used by pediatricians, family practitioners, and other health care providers who seek to identify syndromic diagnoses based on the specific findings they encounter.

Methods: Data structure/Ontology

The data structure for syndrome diagnosis lends itself to two categories: Syndromes and Features. Syndromes can be genetic, environmental, both, other, complex, unknown, or some combination of those. Some syndromes are chromosomal, some are microdeletions, others are mitochondrial, and some are more complicated. As genetics progresses the taxonomy of syndromes evolves. For the purposes of this pilot project, syndromes were not subclassified. The syndrome data comes from the OMIM database. Each diagnosis from the database is represented as an instance, each with a unique 6 digit identifier, called a MIM number. Each syndrome is entered into the ontology as a MIM number, and is associated with a string literal of its name. Syndromes are also associated with alternative names, including alternate eponyms.

Features such as physical exam features have traditionally been the principle findings relevant to diagnosing an infant with dysmorphic features. In addition to physical findings, populations of patients with a specific syndrome may share traits such as maternal and family history, growth and development, and laboratory testing, including genetic testing, and imaging studies such as X-rays and ultrasounds. In this ontology any medical datum that may be associated with a syndrome is represented as a feature. Features belong to medical categories, such as Head, Limb, Lab, or Radiology, and Chest. The medical features and categories are taken from OMIM's clinical synopses. Each category is treated as a class, and subclassification was done to reflect semantic relationships. For example, Ears falls in the category Head, and Head falls in the category HeadAndNeck. This is an example of a transitive property. Some features, such as Hemangioma, belong to more than one medical category. Like Syndromes, Features can have names and descriptions. The ontology defining the syndrome and feature types and their relationships is implemented using a relationship file, which consists of several prefixes and triples, and a data file. The triples relate the instances to the data types and categories. The data file was populated by the OMIM database. The database was queried for all diagnoses that contained clinical synopses, and the output-- containing 4,807 disorders and thousands of features-- was translated into triples according to the ontology definitions.

The data file contains several types of triples: MIM numbers are defined as syndromes and given string literal names. Then, syndrome aliases are entered; features are defined and given string literal names. Next, features are associated with syndromes. Finally, features are tied to medical categories.

The ontology and data files are loaded into a repository with the rdf and rdfs prefixes. From there they are accessed by Splink, the java GUI that uses SPARQL to query the OpenRDF triple repository.

Algorithm

Physicians typically make diagnoses by gathering historical and physical exam data and pattern matching based on their perception of sensitivities and specificities that the findings they observe match the findings typical of the diagnoses. In diagnosing a newborn with a syndrome, a variety of incomplete information is often available, and many practitioners depend heavily on the physical exam. Upon exam, major and minor malformations may be detected. Just as providers think about the malformations they observe, the tool also focuses on detected anomalies. Users input their findings as text and decide on a query based on what they know and what they seek to find out.

The principle algorithm involves entering the features detected as abnormal as free text into fields in a SPARQL query. The reasoner searches the graph of triples as outlined in the ontology for a match to the pattern that is described in the query. The output is a table that has list of syndromes that have all of the features entered. For each syndrome, all of the features associated with it are also displayed. The table can be outputted sorted by feature, feature's medical category, or syndrome, as the user prefers. Then, if the user seeks to see a list of features shared by one of the possible diagnoses or diagnoses shared by one of the possible additional features, the user can double click on the feature or diagnosis, respectively, to generate a new list. The user can then add a feature to the search query to further narrow the diagnosis. There are many more possible pattern matching algorithms, that amount to different ways to traverse the triple space.

To try out the ontology, follow the instructions in the How To.

Results

The Syndrome Ontology was validated by users. The users were an attending neonatologist, a pediatrics junior resident, a pediatrics intern, a hospitalist, a neonatal nurse practitioner, and a software engineer. The users ran 4 fictitious sample test cases. Two cases were created by the tool developer (SW), and two were generated by the users, who created their own test cases. The users were asked to compare the tool to methods they currently use.

Four of the five health care workers commented that the built in logic and interactive nature of the tool was useful, though the interface was difficult to navigate. They especially appreciated the ability to enter multiple symptoms and browse a differential diagnosis. The residents also found it helpful to be able to quickly read the list of features associated with several syndromes in the differential diagnosis, rather than have to sift through text in another context.

In two cases, the query resulted in a null answer, once because of a spelling error, and once because there were no syndromes that fit the made up constellation of features. In each of the other cases, the queries resulted in useful, relevant responses, and in one case led to a care team considering a new diagnosis for a patient previously not considered. The software engineer was able to create more useful queries, generating more focused differential diagnoses. The ontology served as a clinical learning tool for residents, demonstrating associations between syndromes and clinical findings.

The tool was able to successfully diagnose Beckwith-Wiedemann Syndrome, generate a differential diagnosis of ptosis, identify a reasonable differential diagnosis for a case with an unknown diagnosis, and suggest physical findings to look for to narrow a differential diagnosis. In one test case, a provider tried to generate a differential diagnosis that included Down Syndrome and was unsuccessful. (This was discovered to correlate with an inconsistency in the OMIM database output file formatting.)

Residents reported that when compared to OMIM, the tool was more difficult to query, but once the query was completed, it was found to be faster and easier to interpret. It lacked the medical literature references that OMIM's website has, but was found to be more useful as a tool that does not require internet access.

More senior pediatricians found that the tool was easier to use and more useful than a textbook.

Discussion/Future work

The tool has the potential to empower pediatricians, nurse practitioners, neonatologists, and geneticists to diagnose rare syndromes based on their patients' history, physical exams, and basic laboratory testing. By inputting the relationships between the thousands of findings and the hundreds of syndromes, health care providers can focus on their clinical practice-- determining relevant patient attributes-- and leave the complex data processing and pattern matching to the computer. This methodology makes it less likely that a patient's diagnosis will be missed because the health care provider has not previously seen a case. It also makes sure to factor all of a patient's detected findings into the possible diagnosis.

Evaluation revealed two types of failure. First, the data input interface was too difficult for most medical practitioners. They were only able to use the tool after some teaching or interpretation. Second, there are many semantically linked features that are spelled or worded differently, and thus don't get detected as the same node in the graph. This is likely doe to the fact that OMIM data comes from so many different people, and OMIM does not subscribe to a standard thesaurus for it's findings.

Another disadvantage of the current implementation is that it does not take sensitivities or specificities into account. A future work would include probabilities for each feature that could be updated as research is done, and perhaps even a way to gather data from populations of instantiated patients whose features and diagnoses could build an ever more accurate dataset. A future version of this work could use a Bayesian belief network to integrate the knowledge base with sensitivities and specificities from the literature, or perhaps even from patient data entered into the model.

References

Open source references are linked in the text above. Additional text references include:

1. McIntosh, et al. (1954) "The incidence of congenital malformations: a study of 5,964 pregnancies." Pediatrics. 14:5 pp. 505-522.

2. Moeschler, et al. (2006) Committee on Genetics: Clinical Genetic Evaluation of the Child With Mental Retardation or Developmental Delays." Pediatrics. 117:6 pp. 2304-2316.

3. Jones, KL. (2006) Smith's Recognizable Patterns of Human Malformations. Elsevier Saunders.

Syndrome Ontology Links

A: Ontology

B: Relationship file

C: Data file excerpt

D: Screen Shots of a Sample Query

E: Sample Queries

F: How To