Deciding When to Change the SDTM

I frequently come across proposals to change or upgrade the SDTM. Experience has shown that changes to the SDTM are a big deal. Even minor changes cause a ripple effect that result in substantial resources (e.g. time and money) for stakeholders to implement. It's often years before a change can be fully implemented industry-wide. Often the change results in updates to the model, the implementation guide, the I.G., and significant change management costs, e.g. training, upgrades to information systems and tools and business processes. Here we are well into the 2nd decade of the SDTM lifecycle and what the industry needs most is stability in the standard. We need to be creative in incorporating new requirements that do not change the standard and avoid those that are not absolutely necessary. Or we need to take a critical look at the underlying model and make major changes (SDTM 2.0?) that hopefully will result in a more stable standard (see my previous post). We recognize that some change is inevitable, but change that can be accomplished through other means should be pursued. Stakeholders need a stable standard.

So here I propose an approach to limit certain types of changes. But first, I discuss fundamental principles about exchange and analysis standards. These principles explain the reasoning behind my suggested approach.

First of all, let's consider what is an exchange standard vs. an analysis standard? Here are some working definitions:

Exchange Standard
  • A standard way of exchanging data between computer systems.  It describes the information classes/attributes/variables/data elements, and relationships necessary to achieve the unambiguous exchange of information between different information systems (interoperability).

Analysis Standard

  • A standard presentation of the data intended to support analysis. It includes extraction, transformation, and derivations of the exchanged data.
Exchange standards exist to support interoperability. Here I use the HIMSS definition: "The ability of different information technology systems and software applications to communicate, exchange data, and use the information that has been exchanged.

Exchange and analysis standards meet very specific and different needs:
  • A good exchange standard promotes machine readability, understandability, and business process automation at the expense of end user (human) readability
  • A good analysis standard promotes human readability and use/analysis of the data using analysis tools. It maintains strict traceability with the exchanged data
As an analogy, consider data as pieces of furniture. The exchange standard is the truck that moves the furniture from point A to point B. The size and configuration of the truck is dependent on the contents. The furniture is packed in a way so it can be moved efficiently. It cannot readily be used in its packed environment. The analysis standard describes how to organize the furniture at its destination to maximize its usefulness. For example, a dining room table goes in the dining room. A desk goes in the den, etc. Furniture may need to be assembled (i.e. transformed) from its packed state to its useful state. 

Because of the different use case associated with each type of standard, it comes as no surprise that the individual requirements are very different. Often they are competing or contradictory. Here are some examples: 

Exchange Standard
Analysis Standard
No duplication of data/records (minimize errors, minimize file size)
Duplicate data (e.g. demographics and treatment assignment in each dataset)
No derived data (minimize errors, avoid data inconsistencies, promote data integrity)
Derived data (facilitate analysis)
Very standard structure (highly normalized, facilitate data validation and storage)
Variable structure to fit the analysis
Numeric codes for coded terms (machine readable, facilitate data validation)
No numeric codes (human understandable)
Multi-dimensional (to facilitate creation of relational databases and reports of interest)
Two-dimensional  (tabular) containing only relationships of interest for specific analysis (easier for analysis tools)
Standard character date fields (ISO 8601: ensures all information systems can understand them)
Numeric date fields (facilitates date calculations, intervals, time-to-event analyses)

So where does the SDTM fall? It was originally designed as an exchange standard for study data submissions and certainly that is its main purpose currently. However, because similar data from multiple subjects exist together into domains make them suitable for simple analyses as well. In fact, FDA reviewers have been performing simple analyses using SDTM datasets for years. More recently, the FDA has developed standard analysis panels that use native SDTM datasets as input. So the answer is: SDTM is both an exchange standard and an analysis standard (for simple, standard analyses; ADaM data are used for complex and study-specific analyses)

Here is the important observation from the above discussion. Since the requirements are different for exchange and analysis, SDTM is being pulled in two directions, and cannot perform either role optimally. We must choose which role is more important. How often have we heard the discussion: should SDTM have derived data? Well, it depends what use case you think is more important. As an analysis standard, of course it should have derived data. As an exchange standard, this is generally not a good idea. 

In an ideal world, the SDTM as an exchange standard would be retired and replaced by a next generation exchange standard that is based on a more robust relational data model that strictly meets data exchange requirements (e.g. BRIDG+XML, HL7 FHIR, RDF/OWL). The data would come in, it would be loaded and stored in a data warehouse, and tools transform the data into useful, manageable chunks for the analysts. This future state would free the existing SDTM to take on more analysis/end-user friendly features (e.g. non-standard variables in parent domains, numeric dates, demographic/treatment arm information in each domain), but these enhanced SDTM views would be generated by tooling. 

The reality is we do not live in an ideal world and SDTM as an exchange standard is here to stay for the foreseeable future. Therefore, for the foreseeable future, changes to the SDTM should stress data management requirements. Changes to make SDTM more end-user friendly for analysis should be resisted as they invariably will make data exchange more complicated and costly. Requirements to make SDTM data more user-friendly should be handled by better tooling by the recipient to facilitate post-submission data transformations needed for analysis. Going back to the furniture analogy, it's not efficient to ship the dining table and chairs in the exact arrangement they will be used at the destination; rather, we need more and better furniture unpackers at its destination. 

So, I propose the following algorithm to evaluate proposed structural changes to the SDTM (i.e. those that do not add new content). As always, I welcome your thoughts.