Saturday, August 29, 2015

Stupid XSLT Tricks for OID and UUID recognition

I'm building a FHIR to CDA translator to convert a FHIR Composition in a Bundle to a CDA Document.  One of my challenges is recognizing identifiers that are already in OID or UUID form.

This is a simplified token matching problem.
A UUID is in the form ########-####-####-############, where each # is one of the hexidecimal digits in [0-9a-fA-F].  To test for this, I can take the string, translate all hex digits into # characters and then test for a match to the form.  This test can be used in a choice as follows:

<xsl:when test="translate($value,'0123456789abcdefABCDEF','######################')                   = '########-####-####-############'">
  <!-- ... stuff to do when $value is an UUID -->
</xsl:when>

Handling OIDs is a little more difficult.  The pattern there is number[.number]*, where number matches the pattern 0|[1-9][0-9]* (ensuring no leading zeros in the number).

First off, we can reject anything that is not solely made up of digits or the . character.  That's an easy task for translate again.  The expression translate($value,'0123456789.','') will turn any string in $value to the empty string if it is made up of the specified characters.

We also need to make sure that the OID neither starts with, nor ends with a . character.  The first just uses not(starts-with($value,'.')).  It would be nice if XSLT Version 1.0 supported ends-with, but it doesn't.  So we have to find the last character using substring, and check to see that it isn't a . character.  That expression is substring($value,string-length($value))!='.'.

Next, we need to make sure than no sequence of digits starts with 0 except the single digit sequence containing 0.  Let's create a new string called testValue as follows:

<xsl:variable name='testValue' select='translate($value,'123456789','#########')/>

If testValue contains a .0#, then we have a problem, because it contains a number with a leading 0. But we need to go a bit further than that, because two leading zeros are also a problem, so we need to check to see if it contains .00.  That also catches three or more leading zeros, so we've solved that case.  Oh, and we need to check for the case where the first number contains leading zeros, as it won't have a preceding '.'.  We could either check that one separately, or we could force testValue to contain a leading ., and that would let us reuse the previous test.

Leading to this test for OIDs:
<xsl:variable name='testValue'
  select='translate(concat('.',$value),'123456789','#########')/>
<xsl:when test="
    string-length(translate($value,'0123456789.',''))=0 and
    not(contains($value,'..')) and
    not(contains($testValue,'.0#') or contains($testValue,'.00')) and
    not(starts-with($value,'.')) and
        substring($value,string-length($value))!='.'">
  <!-- ... stuff to do when $value is an UUID -->
</xsl:when>

Using translate to match character classes can also help with other test patterns, for example matching dates, phone numbers,  etc., without needing to rely on an external regular expression library (such do exist though, see EXSLT).

You have to be careful to get this kind of matching right.  You can see the evolution of my OID pattern, which, if I hadn't written it out, might very well have let patterns like 00.1 through incorrectly.

When I use patterns like these in code targeted for production use, I'm very careful to document what the code is doing, because it sure as hell isn't obvious.  If you use these tricks, do the same for the poor slob who has to maintain your code after you have moved on.

     Keith

P.S.  Why is recognizing OID or UUID important in FHIR translations?  I'll leave that to your imagination until I cover the bigger challenge (FHIR to CDA) in detail.

0 comments:

Post a Comment