Automating the Detection of Adverse Events

Adverse event (AE) reporting in the U.S. is largely voluntary, yet it forms the foundation for the detection of post-marketing safety signals that were not identified in clinical trials. It is generally recognized that only a small percentage of AEs get reported. This is certainly my experience in clinical practice. There simply wasn't time to report them all. I reported only those that were the most serious and clearly not described in labeling.

The deployment of Electronic Health Record (EHR) systems nationwide provides a tremendous opportunity to increase reporting. One area that interests me is the potential for EHRs to automatically detect an AE. This requires an unambiguous and computable definition of an AE. I'm not suggesting that EHRs replace a clinician's role in the process, but the potential to automate many steps that the clinician now performs manually is clearly in the best interest of public health.

In a recent post, I discussed how adverse events (AEs) are defined and modeled in BRIDG. I've been doing more reading on this topic and continue to have discussions with others on this important concept. The existing BRIDG definition closely reflects the definition provided in U.S. federal regulations (see 21 CFR 312.32(a)), which states the following:

Adverse event means any untoward medical occurrence associated with the use of a drug in humans, whether or not considered drug related.

The BRIDG definition is:

Any unfavorable and unintended sign, symptom, disease, or other medical occurrence with a temporal association with the use of a medical product, procedure or other therapy, or in conjunction with a research study, regardless of causal relationship. 

The BRIDG definition is appropriately broader. Both contain the concept "medical occurrence." But what is a medical occurrence? How can an EHR detect "medical occurrences?" We're not quite there yet in establishing a computable definition for an AE. But I think we are close.

The Free Medical Dictionary defines occurrence as "any event or incident." The BRIDG definition includes "unintended sign, symptom," which are clinical observations and I think can be considered  incidents. Can a clinical observation be an adverse event? I think the answer is No. The observation needs to undergo an assessment by a qualified individual, such as a health care provider, to establish that:

  1. The observation is indicative of the presence of a medical condition
  2. The onset (or worsening) of the medical condition occurs after a medical intervention (e.g. drug administration)
Only when these two criteria are met can an one identify an adverse event. What do I mean by a medical condition? The Free Medical Dictionary provides an excellent definition:

medical condition

A disease, illness or injury; any physiologic, mental or psychological condition or disorder (e.g., orthopaedic;visual, speech or hearing impairments; cerebral palsy; epilepsy; muscular dystrophy; multiple sclerosis; cancer; coronary artery disease; diabetes; mental retardation; emotional or mental illness; specific learning disabilities; HIV disease; TB; drug addiction; alcoholism). A biological or psychological state which is within the range of normal human variation is not a medical condition. 

Medical condition is a phrase used in documents for physicians applying to licensing agencies (e.g., state medical boards, malpractice insurancecarriers, third-party payers, etc.), which is used to determine a physician’s physical “suitability” to practise medicine.

What is a medical intervention? I mean any activity (e.g. drug administration, surgery, radiation, device implantation, etc.) undertaken to treat, prevent, cure, mitigate, diagnose, or induce a medical condition.

It is clear that a temporal association with a medical intervention is necessary to establish an AE, so I expect general agreement with the second criterion. Note that a causal relationship is not necessary. The first criterion is where there may be disagreement. Here are a some examples where just relying on an observation to establish an AE is problematic. 

A patient is started on Drug X for a valid indication. The patient has no prior history of hypertension. He also happens to be morbidly obese. A week later, the nurse measure his BP at 155/100 mmHg. Is this an adverse event related to Drug X? I would argue it is not, because an assessment hasn't been done to establish that the patient does indeed have hypertension. In this example, the nurse used a normal sized BP cuff, which is well known to give falsely elevated BP readings in morbidly obese patients. When the BP was repeated using a large BP cuff, the readings were repeatedly within the normal range. 

Let's now say that the patient also had a serum chemistry panel and the serum potassium came back elevated at 5.5 mg/dL (normal for the lab is 3.5-5.0 mg/dL). Is this an AE? Again, for the same reason, an assessment is necessary to establish the presence of an underlying medical condition, in this case hyperkalemia. In this example, examination of the biospecimen sent to the lab indicated the presence of hemolysis. It is well known that hemolysis can spuriously raise a serum potassium measurement due to the high concentrations of intracellular potassium in erythrocytes. The chemistry panel was repeated making sure hemolysis was not present in the biospecimen and the serum potassium was in the normal range. 

The following week the patient was involved in a motor vehicle accident (MVA). Is the accident an adverse event? Again, an assessment is needed. Additional observations might reveal that the patient was sleepy at the time. Hypersomnolence, a new onset medical condition, would be a valid AE that may have precipitated the MVA, but the MVA itself is not the adverse event. 

So my computable definition of an adverse event is:

a new onset medical condition that begins after a medical intervention OR a pre-existing medical condition that worsens after a medical intervention

This definition allows a program in an EHR system to identify adverse events automatically if the medical conditions and interventions are appropriately coded. Clearly there is a time interval component that must also be defined. It would be silly to report a new medical condition that happened years after taking a short-acting drug. One can establish guidelines for a reasonable time interval between the intervention and the medical condition. These should take into consideration things like the pharmacokinetic and pharmacodynamic properties of a drug intervention. Longer time intervals between the intervention and medical condition will increase false positives with regard to the causality assessment, and vice-versa, so the right time interval must be chosen wisely.

From a modeling perspective, I think an Adverse Event is a subclass of a Medical Condition. Medical conditions are the results of assessments of observations, and the definition of an adverse event should be modified accordingly to reflect these relationships. I think this will help pave the way for automatic detection of AEs within EHRs.  


Therapeutic Area Standards: What are they really?

The fifth reauthorization of the Prescription Drug User Fee Act (PDUFA V) gave rise to the goals that FDA agrees to meet in exchange for the user fees it collects (PDUFA V "goals letter"). Under section XII, part E of the goals letter, FDA agrees to the following:

Clinical Terminology Standards:Using a public process that allows for stakeholder input, FDA shall develop standardized clinical data terminology through open standards development organizations (i.e., the Clinical Data Interchange Standards Consortium (CDISC)) with the goal of completing clinical data terminology and detailed implementation guides by FY 2017.  

1. FDA shall develop a project plan for distinct therapeutic indications, prioritizing clinical terminology standards development within and across review divisions. FDA shall publish a proposed project plan for stakeholder review and comment by June 30, 2013. FDA shall update and publish its project plan annually. 

This section of the goals letter has given risen to the development of so-called "Therapeutic Area (TA) Standards." I was involved in numerous activities associated with TA standards development. I found there exists a substantial amount of confusion or lack of clarity regarding what these are and how they should be managed. I decided to write this post to explore TA standards from a data standards and regulatory review policy perspective that hopefully provides useful insight in what these standards are and how best to manage them in the future. 

Standard vs. Data Standard

When discussing TA standards, it’s useful to draw a distinction between a standard and a data standard. A standard is defined in dictionary.com as “something considered by an authority or by general consent as a basis of comparison; an approved model.”  There are many different kinds of standards: manufacturing standards, measurement standards, data standards, etc.  To understand the distinction, consider a ruler. The ruler can be marked in inches or centimeters. Which ruler is used to measure length depends on the measurement standard that has been selected for the task. Once a measurement standard is selected, then the data standard provides a consistent approach to document and/or share the measurement. If the measurement standard is inches, and the measurement is 10 inches, then the data standard describes whether it’s 10”, 10 in, or 10 inches. This distinction is important when considering TA standards. How to represent a measurement (i.e. an observation) requires two decisions: a business decision (what to measure, which measurement standard to use), followed by a data standards decision (how to standardize the representation of the measurement, which data standard(s) to use).

What is a Therapeutic Area Standard?

I find there is no widely established “standard definition” for a TA standard. One working (perhaps prevailing?) definition is that a TA standard is a data standard for a therapeutic area or indication. I argue that a TA standard is not a data standard. Let’s examine this definition more closely.

Consider a clinical observation, specifically a clinical laboratory test:  glycosylated hemoglobin  (HbA1C). The standardization of HbA1C data is straightforward. CDISC provides controlled terminology for the HbA1C lab test (represented by the NCI Enterprise Vocabulary Services code C64849). The CDISC SDTM Implementation Guide describes how to represent lab test data (which includes HbA1C data) using the LB domain. The result is a numeric value, expressed as a percentage of the total hemoglobin in blood. Anyone conducting a clinical trial that includes the collection of HbA1C need only look at the SDTM IG and CDISC controlled terminology to understand how to standardize this information. No additional data standards are needed.

Consider now a single therapeutic area: Diabetes Mellitus. Let’s assume that, for the purpose of determining efficacy of a new diabetes drug, only one outcome measure is necessary:  the HbA1C.
So what does a Diabetes Mellitus TA Standard then look like? What are we “standardizing” that isn’t already standardized?

One can envision a separate Diabetes TA Standards document that says, “if you’re studying a new drug to treat diabetes, you should collect HbA1C and here is how you should represent HbA1C data using these existing standards: SDTM + CDISC controlled terminology.” For this document to be truly useful, an independent scientific and/or regulatory body should first decide what design features and clinical observations are relevant for diabetes studies. This can be described as a “good clinical research practice guideline” for diabetes. One could consider this a standard but it is not a data standard. Such a guideline is analogous to a manufacturing or building standard. Just as a builder might say: “A hurricane-resistant building must/should contain these materials: ….,” a clinical researcher would say: “A good diabetes study must/should contain HbA1C testing.” 

FDA publishes such guidelines. These are called indication-specific guidances, which help sponsors design their development programs, including clinical trials to support U.S. approval of new drugs for a given indication. I was involved in the development of a standard template for these guidances so that there is consistency in the content and presentation across therapeutic areas. Other organizations may publish similar guidelines: professional societies, other government agencies (e.g. NIH), consortia, etc.

In this simple example, a researcher would only need to read the clinical research guideline for diabetes, and understand how to represent HbA1C data using existing exchange and terminology standards. No additional documentation is necessary. A Diabetes TA user guide is not needed. The “standard” for diabetes trial is the clinical research guideline itself and the existing data standards.

Of course therapeutic areas are much more complicated than this. Each TA has multiple relevant clinical observations, and any given observations may have additional metadata and qualifiers needed to interpret the observation. In this setting a TA “user guide” is useful to demonstrate how to represent all TA-relevant data and meta-data using existing data standards. But the user guide itself is not a data standard. The data standards are the exchange and terminology standards that the user guide references.

So an alternative definition for a TA Standard is a best practice guideline for conducting clinical trials for a specific therapeutic area, with an accompanying illustration (e.g. “user guide”) on how to use existing data standards (exchange and terminology standards) for that TA. If FDA generates the guideline, then it would be an indication-specific guidance, and any available user guide would ideally be incorporated by reference to the guidance.

An analogy would be a best practice specification for designing a kitchen. The kitchen “standard” would say: it must have cabinets, a refrigerator, a sink and faucet, and a stove and oven. It may have a garbage disposal, dishwasher, and trash compactor. The kitchen TA user guide might say, “this is what your kitchen would look like if you use standard Ikea cabinets and General Electric appliances.”

We should all therefore agree that a TA standard is not a data standard. A TA Standard is a use case for existing exchange and terminology standards. The data standard is the exchange and terminology standards that are used to standardize TA-specific data. A TA Standard is a use case for existing exchange and terminology standards.The FDA publishes a Data Standards Catalog that lists the data standards the Agency supports for various use cases. Because TA standards are not data standards, I do not think they belong in the FDA Data Standards Catalog as new data standards that FDA supports.  FDA does need a new approach to convey that the TA use cases are adequately supported by the data standards listed in the catalog. 

So what about the user guides? We should recognize that a TA standard has two components:
  1. A clinical research ‘best research practice guideline’ or standard (i.e. the data requirements)
  2. A description of how available data exchange and terminology standards can represent the TA data requirements. A user guide may be useful (but not necessary) to illustrate how this is done.
In the trivial example described here, there is really no need for a separate Diabetes TA user guide because it is clear how to represent HbA1C information using existing data standards. The problem arises when the data requirements are complex and the existing data standards do not provide a clear and unambiguous representation of the clinical data. Then, a user guide is helpful and necessary.
For a TA standard to be effective for FDA, the Agency needs to ensure that its regulatory data needs are met. FDA needs a process to ensure that:

  1. the TA best practice research guideline is accurate (the TA-specific data requirements; in CDER this is largely a function of the Office of New Drugs(OND)). Ideally this is captured in an indication-specific guidance, and
  2. data standards exist (or have been adequately modified) to represent the data requirements in a standard format (in CDER this is likely a collaboration between OND, the Office of Strategic Programs (OSP), the Office of Translational Sciences (OTS) and data standards development organization(s)).


Mapping Terminologies using RDF

I received an email this morning from a colleague describing how RDF could be used to map local terms in a company's information system to standard terms for regulatory submission. It reminded me of a recent conversation on terminology mapping.

Just before leaving FDA, I was involved in a lengthy conversation about the challenges in the post-marketing world in mapping adverse events to MedDRA. The FDA expends a tremendous amount of resources coding post-marketing adverse event reports using MedDRA. MedDRA is the terminology adopted for this use case by the International Conference on Harmonisation (ICH), of which FDA is a member.

The problem is magnified because Electronic Health Record systems in the U.S. don't use MedDRA. The Office of the National Coordinator for Health Information Technology has adopted SNOMED CT and ICD-9 for medical problems/conditions (which include adverse events; see my recent post on this topic).

Needless to say, the FDA could use a mapping from SNOMED CT and ICD-9 to MedDRA. This is not an easy task, but assuming such a mapping existed, how could one implement it easily? Here the RDF provides a solution.

First, the terminologies must exist in RDF format. I recently came across this web site: the NCBO Bioportal, which makes available common medical terminology as ontologies. I have not evaluated it thoroughly, but it certainly looks promising.

Then comes the hard part...identifying concepts across terminologies that are the same.

Then comes the easy part... making links across ontologies using the owl:sameAs property. Here's an example. Let's assume the terminologies already assert the following in RDF:
meddra:10027599 rdf:type meddra:MedDRAConcept.
meddra:10027599 rdfs:label “Migraine”.
snomed:37796009 rdf:type snomed:SNOMEDConcept.
snomed:37796009 rdfs:label “Migraine (Disorder)”.

One asserts the following triple in the database:

           meddra:10027599 owl:sameAs snomed:37796009.

Then as Medwatch reports roll in from EHRs with an Adverse Event coded in SNOMED CT, one simply loads and stores the report with the SNOMED code in the knowledgebase. Any reviewer or analyst that queries the knowledgebase using the MedDRA term will automatically retrieve all the reports with the SNOMED code because the system treats them the same.

A benefit of this approach is that one doesn't have to map the entire terminology. It can be done incrementally. A large percentage of reports refer to a relatively small number of concepts. Those can be mapped first, leaving a small manual mapping process for the relatively less common terms as they come in. This would be huge improvement over today's process.

Another benefit is that one could leverage other organizations' mappings. If those are posted on the web in RDF, they can easily be imported and used. One would need additional metadata, such as provenance information, to help determine whether the mapping is reliable. We all do this now manually pretty routinely when evaluating information on the web. I'm more likely to trust a news report from www.cnn.com than from nationalenquirer.com, for example. An organization could develop a list of "trusted sources" for mapping information, or could conduct multiple searches, using different mappings from different sources, to see how they affect the search results.

The possibilities boggle the mind.


Observations, Assessments, and BRIDG

In a previous post, I discussed my view of how clinical data are generated and used (see Modeling Clinical Data). I discussed the differences between an observation and an assessment. This is an important distinction in clinical medicine.

An observation is a measure of the physical, physiological, or psychological state of an individual. It can be subjective (i.e. patient reported) or objective (reported by a provider or a device). They are reported without bias (as much as possible) and without interpretation by the observer. A serum potassium level of 5.5 mg/dL is an example of an observation. A pain score of 1, using a scale 0-3, is an example of a subjective observation. 

Observations are used as inputs to assessments. The assessment represents an assessor's interpretation or analysis of the observations. The result of an assessment is generally a medical condition and its properties (e.g. severity, change from last assessment). I would like to say "...is always a medical condition..." but I try never to say never or always, because invariably an exception emerges. Because an assessment includes an assessor's interpretation, bias can be a problem. The same set of observations can, and not infrequently, be interpreted differently by different assessors. Formal adjudication processes are sometimes put in place in clinical trials to minimize this type of bias. In health care, second opinions are quite commonly solicited to seek a better assessment; one that is closer to the truth. 

As a simple example, let's take the serum potassium of 5.5 mg/dL. To do a proper assessment, more information is needed. What is considered the normal range for the laboratory that conducted the test? (e.g. 3.0-4.5 mg/dL). Are there other clinical observations suggesting clinical hyperkalemia (e.g. EKG findings)? Is the patient on medications or does the patient have a known medical condition that can cause hyperkalemia (does the finding make sense)? Could this be due to hemolysis of the sample (this is a common cause of falsely elevated potassium measurement; it may require calling the lab and getting missing information about the biospecimen)? Could it be laboratory error (is a repeat measurement necessary)? Depending on the assessment, the assessor may determine that a new medical condition: hyperkalemia is indeed present, and may need to measure additional observations to determine its cause, and may need to order an intervention to bring the level down. In this example, the patient was recently placed on an angiotensin converting enzyme (ACE) inhibitor for the treatment of hypertension. ACE inhibitors are associated with hyperkalemia. The hyperkalemia was an adverse event related to ACE inhibitor use. 

An important clinical distinction between observation results and assessment results is that only assessment results get treated and tracked on a patient's problem list. As a medical student, it was ingrained into me "never treat the lab test or the x-ray; always treat the patient.

Another important conclusion is that an Adverse Event is a Medical Condition; a special type of medical condition: one that is temporally associated with some medical intervention. In this example, the intervention was the administration of an ACE inhibitor. 

So when I look at BRIDG 4.0, I don't see the distinction between observations and assessments. In fact, the results of assessments are modeled as other observations. Specifically, a PerformedObservationResult is a generalization of AdverseEvent in the model. I believe this is incorrect. Furthermore,  the BRIDG definition of an Adverse Event is:

Any unfavorable and unintended sign, symptom, disease, or other medical occurrence with a temporal association with the use of a medical product, procedure or other therapy, or in conjunction with a research study, regardless of causal relationship. 

I disagree with this definition. A sign or symptom is an observation and, for the reasons I state here, is not an adverse event. I would modify the definition to read:

Any unfavorable and unintended disease or other medical condition with a temporal association with the use of a medical product, procedure or other therapy, or in conjunction with a research study, regardless of causal relationship.

There are other BRIDG classes that have this same issue (e.g. PerformedMedicalConditionResult). I don't attempt to provide a comprehensive list here. 

In discussions with the BRIDG modeling team, my understanding is that observation results and assessment results are handled the same way from a data management perspective, so the current modeling paradigm works from that respect. They propose developing a higher, conceptual presentation layer that draws the distinction between observations and assessments without necessarily changing the underlying model. I am not a modeler so I don't know if this is the right approach. I'm certainly willing to explore what a more subject-matter-expert-friendly presentation layer for BRIDG might look like and how that would address my concern. But I do have an underlying unease that these two very different concepts in clinical medicine: observations and assessments, can be collapsed in this way in an information model without some adverse consequences downstream from a computational perspective. 

I welcome other thoughts on this issue. 


Using the RDF to Generate (and Validate) an SDTM Demographics Domain

In a previous post, I discussed how the Resource Description Framework could be used to improve the management of biomedical knowledge. The discussion was theoretical. It was not possible to appreciate how RDF could be used in the short term to provide immediate value. RDF, or any other technology for that matter, will not be adopted or implemented unless it can solve real problems especially in the short term.

Here I discuss a simple, practical  application of the RDF to demonstrate how it can solve a real problem now. In another prior post, I described that the ideal role of the SDTM is a standard report from a database that is used for analysis. This example automates the creation of an SDTM demographics domain from an RDF database (called a triple store). First, I create a simple ontology of a study. Then, I use it to generate sample study data in RDF. Then I then store the ontology and the data in a simple RDF triple store database (a "knowledgebase"), and then I use SPARQL (the RDF query language) to query the database and generate an SDTM demographics domain. I discuss how this strategy can also be used to validate the data. I used a commercial, off-the-shelf (COTS) product: TopBraid Composer. The RDF file used for this exercise is available for download in Turtle format.

First I created a mini study ontology, containing only the classes and properties needed for this small exercise. You'll recognize many of the classes from BRIDG and the SDTM. I added a new class called SDTM_Domain which will contain a resource for each instance of an SDTM domain/dataset.

Mini Study Ontology - Classes

I then created the properties. First are the object properties (in blue)...those that relate two classes with each other, then the datatype properties (in green)...those that describe the data:

Mini Study Ontology - Properties

For example, the :conductedAt property relates the Study_Site class with the Country class. It enables asserting that Site 0001 was conducted in Germany, for example. These relationships are captured as Domain and Range information for each property using the standard rdfs:domain and rdfs:range properties. Another example is that the :age property has the :Subject class as its domain and the datatype xsd:integer as its value (i.e. range).

Using this ontology, I populated the triple store using dummy instance data for 10 subjects (you'll see the number 10 next to :Study_Subject in the diagram above to indicate the database has 10 study subject instances). Similarly I entered 4 instances of study sites, 2 instances of Sex (male, female), etc.

Finally, I created a single instance of an SDTM demographics domain, calling it :demographics:
Demographics domain Resource
As a property of this resource, I selected a standard property "spin:query" using the SPIN ontology (SPARQL Inference Notation) to embed the SPARQL query in a triple in the database. My understanding is that the instructions on how to generate the DM dataset (written in SPARQL) now becomes part of the knowledgebase. I need experts to confirm this is correct. Here is what the SPIN query looks like.

SPARQL Query to generate DM Domain
So when I run the query, I get the DM domain of the study as a report out of the knowledgebase.

DM Domain generated from RDF data
The fact that I was able to do this having very limited technical experience speaks, I think, to the power and simplicity of this technology. Another benefit is that the knowledgebase is perfectly functional to generate one domain now, but can easily be expanded iteratively over time to generate all domains from RDF study data. Another big advantage is that validation rules in RDF can be added to the knowledgebase to enable reasoning to identify validation errors. In fact, the existing triple in this knowledgebase,

             :age rdfs:range xsd:integer.

which basically says permissible values for ages are integers, represents an executable validation rule in the knowledgebase. If one enters a non-integer value for age, a reasoner can identify and surface the contradiction....which is basically a validation report. More complex validation rules can be constructed using the RDF, for example, for positive integers, cardinality constraints, value set constraints, etc.

As a next step, it would be useful to redo this example using the publicly available CDISC Foundational Standards in RDF specification. I was going to do this but haven't gotten around to it.

Longer term, these datasets should be "generate-able" from a BRIDG ontology. I believe an OWL representation of BRIDG will pave the way to generate even more useful reports for analysis (see another post: BRIDG as a Computable Ontology). Then one could populate the knowledgebase with, or incorporate by reference, more and more validation rules and even FDA policy statements expressed in the RDF and automate the ability not only to detect invalid data, but also data that doesn't conform to FDA study data submission policies as described in guidances and regulations.

In summary, the RDF provides the capability to implement practical solutions now that provide an alternate mechanism to automatically generate SDTM datasets using simple COTS tools, while at the same time provides the flexibility to increase the capabilities of the knowledgebase to support more and more solutions, such as:
  1. Generate all SDTM datasets 
  2. Validate the data 
  3. Determine conformance with FDA submission policies
  4. Generate more useful views/reports for analysis

Once this capability in the knowledgebase is fully developed, then one would not need to exchange the tabular reports, but can exchange the RDF data themselves. Or better yet, given the distributed or "linked data" capabilities of the semantic web, the recipient can simply be granted access to the RDF data on the web.