Thursday, November 4, 2010

The Storm inside my Head

A while back I mentioned that I'm a connector.  There must be something in the way my brain works, because I wind up with information coming from a number of diverse sources going in, and out comes, sometimes a really good idea, and other times, a really knarly problem.

I was talking over the last week to someone who wanted to send information coming from a patient to an HIE.  Now, I could see how this information could easily be coming from a PHR.

Yesterday, I was updating the part of my book talking about how the XML in CDA, and how the encoding of the XML could be in UTF-8, EBCDIC, et cetera, but would still result in the same content.

Then John Moehrke published something about Signing CDA Documents (digitally).

A small thunderstorm erupted thenceforth into my head.  When you publish a CDA document to a repository, the repository computes a hash code, and sends that and the size of the document as metadata to the registry.  The content of a CDA document that uses different character encodings in XML is no different from the perspective of the standard.  When you sign an XML document, it gets canonicalized (reduced to a fixed format when there are possible variations), before it gets hashed.

So, XML can have two different byte representations for the EXACT same content.  The byte length and hash would be different for those byte streams, but the identity of the document is the same.

Where this potentially causes a problem in an XDS registry is when a patient shares the same document with two different providers and the solutions accesing the CDA content read it as XML and then push it out instead of dealing with the stream of bytes.  There are good reasons why they would read the XML, because the content could identify useful metadata.

The challenge is the notion of identity, because from a CDA perspective (and in fact for any XML content), two different byte streams can have the same identity.

I think the solution to this is to alert folks who are submitting that:  If you have the original byte stream of the CDA document, you SHOULD send that instead trying to recreate the byte stream from what was read.  That should reduce changes of getting hash mismatch exceptions.

From a submitter perspective, if you submit a document and get back a fault due to hash mismatch on documents with the same identity, the solution would be to as follows:
Get the document that already existed (the reason the exception was thrown).  Canonicalize it and the document that was attempted to be submitted.  Compare the two.  If they are equal, resubmit using the document that already existed.

The reason for this particular issue has to do with layer mismatches between the notion of identity of content, which is in a different layer than storage of it. Borrowing from the OSI model, XML is really the presentation layer used for the application data, and the byte stream is really at the network layer (sequences of bytes).

While this particular brainstorm isn't really a big issue for CDA and XDS (see the solution proposed above).  But it does bring into question current thinking about the identity and equivalence of data, especially as we move into more XML formats.

0 comments:

Post a Comment