Wednesday, February 08, 2006

XSLT performance when mapping large documents in BizTalk

Recently I had to map a document with many thousand rows. I could not split the document because before I could split it, the document’s nodes had to be sorted.

With such large files you generally test it using a small subset to avoid waiting for maps to complete, I built an XSLT which worked great, I thought.

When you use a select filter such as "not(KeyValue=preceding-sibling::row/ KeyValue)" you end up with a huge performance hit the larger the document gets. My map went from 2 seconds for 50 rows to 10 minutes for a few thousand.

How to improve performance when you have large XML files to map that you can’t split? Try using xsl:key instead, which builds an index of keys from which you can much more efficiently select.

Here is a sample XSLT that demonstrates how to use the xsl:key:


<?xml version="1.0" encoding="UTF-8" ?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:ns0="http://Conversion.schemas">

    <xsl:output method="xml" indent="no" />

    <xsl:key name="NumberKey" match="/*[local-name()='top' and namespace-uri()='http://biztalk/Conversion.schemas']/*[local-name()='row' and namespace-uri()='']"

        use="keyValue" />

    <xsl:template match="/">

        <ns0:Rows>

            <xsl:for-each select="/*[local-name()='top' and namespace-uri()='http://biztalk/Conversion.schemas']/*[local-name()='row' and namespace-uri()='' and generate-id(.) = generate-id(key('NumberKey', keyValue)[1])]">

                <xsl:variable name="current_Number" select="keyValue" />

                <Data>

                    <keyValue>

                        <xsl:value-of select="$current_Number" />

                    </keyValue>

                    <xsl:for-each select="//row[keyValue=$current_Number]">

                        <Part>

                            <PartID>

                                <xsl:value-of select="nr_data" />

                            </PartID>

                        </Part>

                    </xsl:for-each>

                </Data>

            </xsl:for-each>

        </ns0:Rows>

    </xsl:template>

</xsl:stylesheet>

2 comments:

Patrick Wellink said...

Could you elaborate a little bit more....

How does your input looks like, and how does the output look like.

Isaac Ferreira said...

Patrick,

If you copied and pasted this xml it probably wouldnt be an exact fit but this should elaborate on what exactly I had to do:

Input:

<top>
  <row>
    <KeyValue>1</KeyValue>
    <subKey>1</subKey>
    <otherData>input</otherData>
  </row>
  <row>
    <KeyValue>2</KeyValue>
    <subKey>1</subKey>
    <otherData>input</otherData>
  </row>
  <row>
    <KeyValue>1</KeyValue>
    <subKey>2</subKey>
    <otherData>input</otherData>
  </row>
  <row>
    <KeyValue>2</KeyValue>
    <subKey>2</subKey>
    <otherData>input</otherData>
  </row>
  <row>
    <KeyValue>1</KeyValue>
    <subKey>3</subKey>
    <otherData>input</otherData>
  </row>
  ...
</top>

Output:

<Rows>
  <Data>
    <keyValue>1</keyValue>
    <Part>
          <PartID>1</PartID>
    </Part>
    <Part>
          <PartID>2</PartID>
    </Part>
    <Part>
          <PartID>3</PartID>
    </Part>
  </Data>
  <Data>
    <keyValue>2</keyValue>
    <Part>
          <PartID>1</PartID>
    </Part>
    <Part>
          <PartID>2</PartID>
    </Part>
  </Data>
  ...
</Rows>