2015-09-30

Restriction Classes

As I delve deeper into the world of RDF and OWL, I keep coming across a single OWL class that turns out to be critical in enabling computer-assisted reasoning. This class is called owl:Restriction. Here's my understanding of what this class is and why it's so important in enabling computer assisted reasoning across biomedical data. We need to understand it and use it more in the data we collect and analyze.

In the semantic web, everything is grouped into classes or "groups of things" that share common properties. A member of a class is called an individual (or an instance). Classes can have subclasses (or subsets) that share additional properties with each other, but not necessarily with other members of the superclass.  So the Apple class is a subclass of the Fruit class since every Apple is a Fruit (but not vice versa).

So let's define two classes called Automobile and Corolla. Let's also say Corolla is a subclass of Automobile. In RDF this would look like this:

:Automobile rdf:type owl:Class.
:Corolla rdf:type owl:Class.
:Corolla rdfs:subClassOf :Automobile. 

I use turtle syntax throughout because it's easy for humans to read and understand.

So it turns out my friend Ruth has a Corolla that she calls "Tristan" (she's a big opera lover and names all her cars after Wagnerian characters). So in RDF this would be expressed as:

ruth:Tristan rdf:type :Corolla.

Which simply says there is a thing/resource called Tristan and it's a type of :Corolla; i.e. a member of the :Corolla class. (I'm intentionally avoiding the namespace issue.)

A reasoner can analyze these triples and conclude that Tristan is an Automobile. The following triple (new knowledge) can be inferred:

ruth:Tristan rdf:type :Automobile. 

Now this is rather trivial new knowledge, but bear with me.

The limitation of this example is that if another resource comes along, let's call it Isolde, and someone out there has asserted that Isolde is an Automobile, the computer has no way of reasoning what type of automobile Isolde might be. The computer doesn't understand what property (or properties) of an automobile make it a Corolla, or any other type of automobile.

How does this apply to clinical trials? Replace "Automobile" with "Subject" and "Corolla" with "EligibleSubject" in my example. From the triples I asserted,  I know that Tristan is an EligibleSubject in the trial simply because I said so (in reality, I manually analyzed his screening data and confirmed him to be eligible). But the computer doesn't know why....it just knows that he's an EligibleSubject because someone said so. But when the next Subject, Isolde, comes along, the computer is clueless. Is she eligible too? You'd like the computer to be able to figure it out, right?

So how does owl:Restriction help? owl:Restriction allows us to define new classes based on the properties that individuals in that class share. So let's assume that the study in question is an oral contraceptive study in females. To be considered eligible the Subject must be female. So in RDF we define a property called :sex and we assert:

:Isolde rdf:type :Subject.
:Isolde :sex :Female. 

These triples say Isolde is a Subject and she's female.

But how does the computer know she's eligible for the study? We create a new (nameless) class and "restrict" membership in that class only to females. Then we say only members of that class are EligibleSubjects. Before we see how it looks in RDF, we need to review blank nodes.

A "nameless" resource is allowed in RDF. It's called a blank node or bnode. It basically has no subject. It is represented using brackets by listing the properties that that the blank node has. For example:

[ :sex :Female; 
     :wrote :WutheringHeights]

In plain English, one would say, and a computer would interpret:  "this is a nameless resource who is a female who also wrote Wuthering Heights."

Now let's create a blank node that looks like this:

[a owl:Restriction;
     owl:onProperty :sex;
     owl:hasValue :Female]

In plain English: this is a nameless resource whose members all have a restriction on the property :sex that has value of :Female. It basically defines a class of resources that are females.

So now one can assert the following triple in the protocol:

:EligibleSubject owl:equivalentClass [a owl:Restriction;
                                           owl:onProperty :sex;
                                           owl:hasValue :Female].                                                                            

This says only females are eligible subjects.

So now when :Isolde comes along and the computer sees the following triple saying she's a female:

:Isolde :sex :Female. 

the computer can infer the following new triple:

:Isolde rdf:type :EligibleSubject. 

We've successfully defined the subclass EligibleSubject based on a property they all share in common and now a computer can identify new members of that class. If the computer had access to many individuals and their properties on the web or in EHR systems, this approach can be used for recruitment. This is a hot topic in clinical trials at the moment.

This same strategy can be used in many settings. Consider these triples:

:Drug rdf:type owl:Class.
:EffectiveDrug rdf:type owl:Class.
:EffectiveDrug rdfs:subClassOf :Drug. 

We're asserting the Drug is a class, and EffectiveDrug is a class, which is a subclass of Drug.

By using owl:Restriction and the associated properties owl:onProperty and owl:hasValue, one has the ability to tell a computer what properties of a drug make it an effective drug. This way computers can help identify effective drugs for us. It's a paradigm shift in how we do efficacy evaluations right now, but owl:Restriction makes it possible.

The possibilities are endless. One can imagine many classes such as PureDrug, ExpiredDrug, InvestigationalDrug, MarketedDrug, and basically define them as owl:Restriction classes, allowing computers to automatically determine which class(es) any given drug belongs to.

In a future post, I'll discuss how owl:Restriction can be used to manage study workflow; specifically how it can manage the sequence of study activities as described in the protocol, including support for branching and adaptive designs.







No comments:

Post a Comment