Monday, May 9, 2011

Some thoughts on Canonical Pedigrees

One of my colleagues is working on the HL7 Canonical Pedigree Project.  The point of this project is to develop reference content that could be used to test various representations of the pedigree.

One of the interesting challenges in Pedigree representation is being able to look the genetic information from the perspective of different probands.  Being able to look at a genetic history from different perspectives allows for a variety of different techniques to be used for analysis.

In order to represent the family tree, the HL7 Pedigree model allows for two persons to be represented with a coded relationship between them.  The coded relationship comes from the HL7 Family Relationship Role Type vocabulary.  The essential model is that the patient is related to (at least) one other person, who could in turn be related to other persons, et cetera.

There is no requirement for the "pedigree" graph produced to use any specific relationship (e.g., parent, sibling, spouse), unlike what would be found in a "historical pedigree" such as this one.  The vocabulary allows for exact (natural mother) and inexact (mother) relationships to be represented. Transforming when this sort of vocabulary is used to represent relationships, changing the viewpoint from one subject to another can be challenging.

I argued with my colleague that any canonical representation of a pedigree needs to also include a canonical representation of the relationships, and that the current vocabulary doesn't help at all, since it has a focal point that isn't really reversible.  Changing the proband requires changing the direction of relationships and associated vocabulary.

23 and me has a great video describing the cousin relationship which helped me work through some of this.  If you want to whether someone is your Nth cousin, and at how many removes they are, there's a simple answer.  Cousins share a common grandparent (or great-grandparent, et cetera), but not common parents (that would be sibling, nephew, aunt, et cetera).  To find the degree, count the number of greats between both parties to their common grandparent.  Take the largest number and add one.  If you and I share a common grand-parent, then the number of greats for both of us is 0, to which I add one, and discover we are first cousins.  Now, if it's my grandparent, but your great-grandparent.  To find the number removed, you are looking at the difference in generations.  Simply take the difference in the "great" count.


The unifying principal that I worked out is this:
In order to canonically represent relationships, you must only represent those between ancestors and their offspring.  You can take it in either direction, dealing with "begat" or "was begatted by" as the preferred direction.

In the canonical form, then, you wouldn't need to represent the "cousin" relationship directly.  Instead, you'd relate the two subject to their common *-grandparent.

We can construct a vocabulary to do that.  Just use F to represent natural father, and M to represent natural mother.  FF becomes my father's father, and FM my father's mother.  The vocabulary can be modified to be more concise.  Whenever F or M repeats, just put the repeat count following it.  FF could be represented as F2.

To deal with ambiguity of the ancestry (if X is my first cousin, is it through my father's or mother's side), we could simply use P to represent parent.  So, if X is my first cousin, we have a common grandparent, Y.  X is related to Y by P2, and I'm related to Y by P2.  If you don't know how far back the relationship goes, you could use the + operator.  My cousin (first, second, third or more) and I would be related to a common ancestor Y using P+.

The nice thing about using this form for a canonical relationship is that it doesn't matter who the proband is, the set of relationships that are in the pedigree don't have to be modified when the subject changes.


While this would seem to capture almost everything needed in a pedigree, there's one genetic relationship that isn't expressed.  See if you can discover which one.  The answer is in comments below.

3 comments:

  1. There's one genetic relationship in which two subjects share more DNA than any other: Identical twins. To relate identical twins, just use I.

    Note, children share 50% of their DNA with each parent. Siblings can have as little as 0% to as much as 100% of their DNA shared, but average around 50%.

    ReplyDelete
  2. My first reaction upon seeing the list was, "What? Don't these people watch movies? Where's Godfather?" :-)

    More seriously, this will introduce an interesting ethical connundrum. Normal social mores combine the social relationships with the genetic relationships. It is usually the same, e.g., social parent is the genetic parent, but it's often enough different to present difficulties. This is a major ethical issue with the use of genetic identification for unidentified war dead. It can reveal a lot of information that causes great pain for no good purpose.

    ReplyDelete
  3. The degree to which nurture impacts our health, including social aspects is even harder to quantify than what nature provide with respect to genetic material, I expect to see this data first divided, then reintegrated in subsequent generations. Ethnicity is some part culture and another genetic, and either sway, an important characteristic in many health events.

    ReplyDelete