2005.01.13 05:08 PM

Create Excel's SpreadsheetML using XSLT

In a prior post, I described a way to normalize Excel's SpreadsheetML XML using XSLT. More accurately, I described a way to flatten SpreadsheetML XML in an effort to illustrate the calculation of absolute rows and columns from SpreadsheetML's relative rows and columns. Anyhow, Andy Elvey left a comment on that post asking whether it was possible to reverse the simple XML I generated to produce SpreadsheetML XML that could be opened by Excel. Apparently, Mr. Elvey doesn't need to flatten existing Excel workbooks, as my post described, but rather needs to programmatically author valid Excel workbooks from scratch, and he doesn't want to (or perhaps can't) target the rather complex SpreadsheetML XML schema directly. (One can assume that he needs to generate something more complex than a CSV, which can only represent a single sheet and cannot contain formulas.) So, what he wants to do is target my simple XML format instead of SpreadsheetML XML, and so was asking whether my simple XML format could be converted back into valid SpreadsheetML XML using XSLT.

Good question. As Reverend Lovejoy might say, "Short answer: No, with an 'if', long answer: Yes, with a 'but'."

No, if you want more than simple workbooks containing static unformatted string values.

Yes, but to do anything more than simple workbooks containing static unformatted string values my simple XML will have to be de-simplified a bit.

First, my simple XML stores sheet names as cell element attributes. While we could, with some nasty XSLT, turn this back into the hierarchical ss:Workbook/ss:Worksheet elements expected by SpreadsheetML, we would have to either assume that the source XML represents the expected order of the sheets within the workbook (a potential drag for the author), or sort the sheet names, which would eliminate any explicit ordering by the author. We could introduce a new element containing explicitly ordered child elements representing the sheets, and then refer to this when evaluating the flattened cell elements, but I think this destroys the simplicity of the original format. So, I instead promoted the sheet attribute to a full element, within the book element, containing cell elements as children. I probably should have done this in the first place.

Second, my simple XML doesn't capture or provide for representing cell value data types. SpreadsheetML XML requires every cell to have an explicit data type, so without them we would be forced to either hardcode a data type (like String) in the XSLT, or evaluate the type of each data value ourselves in the XSLT. The first option clearly won't do, and the latter would be insufficient for handling many situations, such as differentiating Boolean 1s and 0s from simple Numbers, or identifying the data type of formula-containing cell elements for which no value is provided. So, I added a type attribute to the cell element. This new type attribute may contain one of the following Excel data type values: String, DateTime, Boolean, Number, Error. Remember, case matters.

Third, my simple XML doesn't capture or provide for representing cell formulas. This would seem to be a requirement for programmatically authoring non-trivial workbooks, so I added a formula attribute to the cell element. When a cell element has a formula attribute, it does not need a value, as Excel will calculate the value anyway when the SpreadsheetML XML is opened. Note that SpreadsheetML requires formulas to express their range references using R1C1-style addresses.

Finally, my simple XML doesn't capture or provide for representing the visual format of cell data values. After all, it was intended to support workbook analysis, not presentation. But, if one wants to generate reasonable workbooks from scratch without having to perform additional downstream processing on them using Excel, one probably must be able to assert at least a limited amount of formatting on its contents. By limited, I mean specify a NumberFormat and Font (there are hundreds of other styling options, which you should feel free to implement yourself).

I considered a number of ways to support adding limited styling. One way would be to add a style attribute to the cell element and then rely on the XSLT to "hook-up" the new attribute's value with a SpreadsheetML-compliant Style definition (either generated or hardcoded), but that results in an ugly tight coupling between the XML and XSLT (though it's really simple). Alternatively, I considered allowing my limited set of styling options to be specified as cell attributes. However, SpreadsheetML ultimately requires these sets of styling value be uniquely identified and set apart (and referenced by id) from the cells they pertain to, and I couldn't deal with the unnecessary repetition and nasty XSLT required to make this happen. So, I adopted a simple format similar to SpreadsheetML, which is to say, I added support for a uniquely identified style element containing one or more styling elements (font and format), and added a style attribute to the cell element for optionally naming the cell's applicable style set.

Note that my limited style element differs from SpreadhseetML's Style element in a number of ways. First, mine are not required to appear as children of a single Styles element - they can appear anywhere in the document (this may simplify authoring logic, as it isn't necessary to round-up all necessary styles in advance). Second, I only added XSLT support for the NumberFormat (as format) and Font (as font) styling elements, and only added support for a few Font attributes (name, size, bold). And, finally, I assume that the first style element in the document with no name attribute represents the default style for all cells.

So, now the question is whether my XML is still simple enough, yet powerful enough, that one would rather use it instead of SpreadsheetML. Don't know. Anyhow, here's a sample workbook using my new somewhat less simple XML:

<?xml version="1.0" ?>
<book>
  <style>
    <font name="Verdana" size="8" />
  </style>
  <style name="date">
    <format mask="General Date" />
  </style>
  <style name="mydate">
    <format mask="m/d/yyyy" />
  </style>
  <style name="mycurrency">
    <font name="Tahoma" size="10" bold="1" />
    <format mask="&quot;$&quot;#,##0.00" />
  </style>
  <sheet name="Sheet1">
    <cell row="1" col="1" type="Number">1</cell>
    <cell row="2" col="1" type="Boolean">1</cell>
    <cell row="3" col="1" type="DateTime">2005-01-01T13:54:15.000</cell>
    <cell row="3" col="2" type="DateTime" style="mydate">2005-01-02T13:54:15.000</cell>
    <cell row="3" col="3" type="DateTime" style="date">2005-01-03T13:54:15.000</cell>
    <cell row="4" col="1" type="String">abc</cell>
    <cell row="1" col="2" type="Number" formula="=R1C1+10">11</cell>
    <cell row="1" col="3" type="Number" formula="=R1C1+20" />
    <cell row="1" col="4" type="Number" style="mycurrency" formula="=SUM(R1C1:R1C3)" />
    <cell row="10" col="5" type="Error">#N/A</cell>
  </sheet>
  <sheet name="Sheet2">
    <cell row="1" col="1" type="String" style="bold" formula="=Sheet1!R1C1 &amp; &quot; plus text&quot;" />
  </sheet>
  <style name="bold">
    <font name="Verdana" bold="1" />
  </style>
</book>

Notice that my simple XML doesn't care what row-/column-order you use to define the cells within a sheet, again for ease of authoring.

Here's the same workbook in SpreadsheetML XML:

<?xml version="1.0" ?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
  <Styles>
    <Style ss:ID="Default" ss:Name="Normal">
      <Font ss:FontName="Verdana" ss:Size="8" />
    </Style>
    <Style ss:ID="bold">
      <Font ss:FontName="Verdana" ss:Bold="1" />
    </Style>
    <Style ss:ID="date">
      <NumberFormat ss:Format="General Date" />
    </Style>
    <Style ss:ID="mycurrency">
      <Font ss:FontName="Tahoma" ss:Size="10" ss:Bold="1" />
      <NumberFormat ss:Format="&quot;$&quot;#,##0.00" />
    </Style>
    <Style ss:ID="mydate">
      <NumberFormat ss:Format="m/d/yyyy" />
    </Style>
  </Styles>
  <Worksheet ss:Name="Sheet1">
    <Table>
      <Row ss:Index="1">
        <Cell ss:Index="1">
          <Data ss:Type="Number">1</Data>
        </Cell>
        <Cell ss:Index="2" ss:Formula="=R1C1+10">
          <Data ss:Type="Number">11</Data>
        </Cell>
        <Cell ss:Index="3" ss:Formula="=R1C1+20">
          <Data ss:Type="Number"></Data>
        </Cell>
        <Cell ss:Index="4" ss:Formula="=SUM(R1C1:R1C3)" ss:StyleID="mycurrency">
          <Data ss:Type="Number"></Data>
        </Cell>
      </Row>
      <Row ss:Index="2">
        <Cell ss:Index="1">
          <Data ss:Type="Boolean">1</Data>
        </Cell>
      </Row>
      <Row ss:Index="3">
        <Cell ss:Index="1">
          <Data ss:Type="DateTime">2005-01-01T13:54:15.000</Data>
        </Cell>
        <Cell ss:Index="2" ss:StyleID="mydate">
          <Data ss:Type="DateTime">2005-01-02T13:54:15.000</Data>
        </Cell>
        <Cell ss:Index="3" ss:StyleID="date">
          <Data ss:Type="DateTime">2005-01-03T13:54:15.000</Data>
        </Cell>
      </Row>
      <Row ss:Index="4">
        <Cell ss:Index="1">
          <Data ss:Type="String">abc</Data>
        </Cell>
      </Row>
      <Row ss:Index="10">
        <Cell ss:Index="5">
          <Data ss:Type="Error">#N/A</Data>
        </Cell>
      </Row>
    </Table>
  </Worksheet>
  <Worksheet ss:Name="Sheet2">
    <Table>
      <Row ss:Index="1">
        <Cell ss:Index="1" ss:Formula="=Sheet1!R1C1 &amp; &quot; plus text&quot;" ss:StyleID="bold">
          <Data ss:Type="String"></Data>
        </Cell>
      </Row>
    </Table>
  </Worksheet>
</Workbook>

And, finally, here's the XSLT style sheet that turned the former into the latter:

<?xml version="1.0"?>
<!--
This product may incorporate intellectual property owned by Microsoft Corporation. The 
terms and conditions upon which Microsoft is licensing such intellectual property may be 
found at http://msdn.microsoft.com/library/en-us/odcXMLRef/html/odcXMLRefLegalNotice.asp.
-->
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns="urn:schemas-microsoft-com:office:spreadsheet"
                xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">

  <xsl:output method="xml" indent="yes" encoding="UTF-8"/>

  <xsl:template match="/book">
  
    <xsl:processing-instruction name="mso-application">
      <xsl:text>progid="Excel.Sheet"</xsl:text>
    </xsl:processing-instruction>

    <Workbook>
    
      <!-- Write style definitions, if any exist -->
    
      <xsl:if test="count(//style) != 0">
      
        <Styles>
        
          <!-- Default style -->
        
          <xsl:apply-templates select="//style[not(@name)][position() = 1]"/>

          <!-- All other styles -->
      
          <xsl:apply-templates select="//style[@name]">
            <xsl:sort select="@name" data-type="text"/>
          </xsl:apply-templates>
      
        </Styles>
      
      </xsl:if>
    
      <!-- Write each sheet -->

      <xsl:for-each select="sheet">
      
        <Worksheet ss:Name="{@name}">
          <Table>
          
            <!-- Write each row. -->
            
            <xsl:apply-templates select="cell" mode="rows">
              <xsl:sort select="@row" data-type="number"/>
            </xsl:apply-templates>
            
          </Table>
        </Worksheet>
      </xsl:for-each>

    </Workbook>

  </xsl:template>
  
  <!-- Style template -->
  
  <xsl:template match="style">
  
    <Style>
    
      <xsl:attribute name="ss:ID">
        <xsl:choose>
          <xsl:when test="not(@name)"><xsl:value-of select="'Default'"/></xsl:when>
          <xsl:otherwise><xsl:value-of select="@name"/></xsl:otherwise>
        </xsl:choose>
      </xsl:attribute>
      
      <xsl:if test="not(@name)">
        <xsl:attribute name="ss:Name">
          <xsl:value-of select="'Normal'"/>
        </xsl:attribute>      
      </xsl:if>
      
      <xsl:apply-templates select="child::*"/>

    </Style>  
  
  </xsl:template>
  
  <!-- Style font template -->
  
  <xsl:template match="font">  
    <Font ss:FontName="{@name}">
      <xsl:if test="@size">
        <xsl:attribute name="ss:Size">
          <xsl:value-of select="@size"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:if test="@bold">
        <xsl:attribute name="ss:Bold">
          <xsl:value-of select="@bold"/>
        </xsl:attribute>
      </xsl:if>
    </Font>  
  </xsl:template>

  <!-- Style format template -->
  
  <xsl:template match="format">  
    <NumberFormat ss:Format="{@mask}"/>  
  </xsl:template>
  
  <!-- Row template -->
  
  <xsl:template match="cell" mode="rows">

    <xsl:if test="count(preceding-sibling::cell[@row = current()/@row]) = 0">

      <Row ss:Index="{@row}">
      
        <xsl:apply-templates select="/book/sheet[@name = current()/parent::sheet/@name]/cell[@row = current()/@row]" mode="cols">
          <xsl:sort select="@col" data-type="number"/>
        </xsl:apply-templates>

      </Row>

    </xsl:if>

  </xsl:template>
  
  <!-- Cell (i.e., column) template -->
  
  <xsl:template match="cell" mode="cols">

    <Cell ss:Index="{@col}">

      <xsl:if test="@formula">
        <xsl:attribute name="ss:Formula">
          <xsl:value-of select="@formula"/>
        </xsl:attribute>
      </xsl:if>

      <xsl:if test="@style">
        <xsl:attribute name="ss:StyleID">
          <xsl:value-of select="@style"/>
        </xsl:attribute>
      </xsl:if>

      <Data ss:Type="{@type}">
        <xsl:value-of select="."/>
      </Data>

    </Cell>

  </xsl:template>

</xsl:stylesheet>

Please keep in mind that this was cranked out in short order and has not been thoroughly tested. There's a good chance it's got a bug or two. Also, it was only tested using MSXML 4.0, not with .NET's (buggy) System.Xml.Xsl.XslTransform.

Here are some things I might consider adding:

  • An XSD schema (duh)
  • Support for named ranges (in SpreadsheetML these are just formulas that are given a name and stored in a separate part of the XML)
  • Some advanced formatting features, including Alignment and row/column spans
  • Array formulas
  • Workbook properties

Alternatively, you might want to consider using any one of a million other approachs for writing Excel files. Personally, I prefer using XML, but that's just me.

Anyhow, Mr. Elvey, I hope that answers your question. Leave me a comment if you'd like to know more.


Comments

This is ***GREAT***! :-)
Thank you **very very much** for doing this!
Just awesome - I'm sure that this will meet my requirements perfectly. :-)
Was just wondering - can I send you some money (via PayPal) - won't be a huge amount, but enough to give some recompense on the time that you've spent!
Looking forward to hearing back, and again -
**Thanks!!!*** Bye for now -
- Andy


Andy Elvey | 2005.01.14 11:32 AM

Hi Andy,

Glad it helped! The offer of money is nice, thank you, but I'm doing this as much for my own edification (and exposure, and enjoyment) as for profit.

In your email you asked about the possibility of me popping these on SourceForge. I probably won't be doing it (too many other priorities), but anyone who would like to should feel free to - these scripts, and all the other samples provided on this blog, are free for the taking, no strings attached. Perhaps if that happens, though, someone will follow through with the improvements I suggested. ;)

Good luck!

ewbi.develops | 2005.01.14 01:14 PM

Hi again!
Sure - understood ( the education and enjoyment :-) ). I've learned a huge amount as well in the last couple of days. This is really the first time I've looked at XSLT at a "deepish" level, and very interesting it is too!
Bye for now, and thanks again -
- Andy

Andy Elvey | 2005.01.14 10:57 PM

Hi all,

i have used this example to create excel
http://ewbi.blogs.com/develops/2005/01/create_excels_s.html

when Excel opens ..it shows nothing but blank sheet.
i saw one thing version header is not coming in tranformed xml..if i hardcore version xml in transformed xml then perfect output is being shown in excel 2003..

plz help me in this regard if possible.

Thanx

CS Gupta

CS Gupta | 2005.05.04 11:56 PM

CS Gupta,

What parser/XSLT engine did you use to perform the transform?

ewbi.develops | 2005.05.05 08:39 AM

More info and examples in my latest post:

http://ewbi.blogs.com/develops/2006/01/roundtripping_s.html

ewbi.develops | 2006.01.05 06:46 AM

Hi Folks,
I m new to this groups.I would like share all my knowledge and doubts .I would like to create the excel file through xsl file.Can you any one help me

rajendran | 2006.11.09 10:49 PM

Hi Rajendran,

Feel free to share what you've got. If you've got any specific questions, feel free to ask them. We'll see if we can help.

ewbi.develops | 2006.11.10 07:15 AM

Hi,

I created one simple XSLT to rename the sheet1, sheet2 to my name but it doesnt apply to the excel.

Using it in SSRS:
I have a report which has worksheets. When i create the XSLT and my report takes this as style sheet. It doesnt change the sheet1,sheet2...

XSLT Code:






False
False




/ -->




Please help me out.

Sheetal | 2008.02.08 09:44 PM

Hi Sheetal,

As you've probably seen by now, TypePad eats HTML/XML posted in the comments here, so please attach your XSLT to an email and send it to the address listed near the bottom of the left column (just above the copyright). Also, I could use some additional explanation regarding your use of XSLT on a SSRS report - how is that configured, what is it targeting, when is it applied - client- or server-side?

ewbi.develops | 2008.02.08 09:59 PM

Very nice Introduction

jasper | 2008.04.01 03:51 PM

Hello. This is great example. I have a question: how can I specify a file's name that was generated? I use System.Xml.Xsl.XslTransform classes. Thank you.

Victor | 2008.09.29 05:30 AM

Hi Victor,

Not sure I follow exactly what you're asking. If you're asking how to specify the name of a file that you wish to target with the results of an XlsTransform.Transform method call, you're going to want to look closer at file IO with streams in .NET, or just stick with one of the Transform method overloads that makes it easy to target a file in one step:

http://msdn.microsoft.com/en-us/library/x6e130yd.aspx
http://msdn.microsoft.com/en-us/library/5yk6w0ay.aspx

Good luck!

ewbi.develops | 2008.09.29 07:51 PM

Thanks!

Victor | 2008.09.29 10:10 PM

Hi,
I'm using XSLT to transform XML into XML Spreadsheet(Excel). In Excel, to insert new line with in a Cell it is required to use "".While Converting XML to XML Spreadsheet i am using the same string constant in XSLT. However, i am not able new line in Excel after conversion.Please help on this.

skc | 2009.03.20 01:38 AM

Hi,
I'm using XSLT to transform XML into XML Spreadsheet(Excel). In Excel, to insert new line with in a Cell it is required to use "#10;".While Converting XML to XML Spreadsheet i am using the same string constant in XSLT. However, i am not able new line in Excel after conversion.Please help on this.

skc | 2009.03.20 01:43 AM

Hi,
I want to insert a new line character in a Cell of Excel through XSLT, while converting XML to SpreadsheetML.

Please help me out in this....

Thnx
SKC

skc | 2009.03.20 02:37 AM

Hi:

I am new at xml/xslt, and I neet to put some xml data into a excel spreadsheet using xslt, I tried to use the files above but no luck, they just appear in IE as >>
Not sure if you can help me letting me know how I can test them.

Maria | 2009.08.05 09:08 AM

Hi Maria,

I'm afraid helping you actually use the XML/XSLT described in this post may prove more difficult than authoring the post itself, as there are a number of ways to do it and it depends on so many factors that are hard to communicate via these comments.

Minimally, you need to organize your XML data into the format described above. Then, either by referencing the XSLT in the XML directly (thus allowing the transform to occur when the XML is opened in IE, which seems to be your approach), or by using an XSL/Transform utility of some sort, or perhaps coding the transform yourself in VBA/.NET or whatever, you need to apply the XSLT above to your XML. Assuming you've organized your XML correctly, the end result should be valid SpreadsheetML, which can be saved and opened in Excel. If after doing that it doesn't work, then you'll need to debug the transformation to determine where it (or, more likely, the source XML data) is going wrong.

I sure hope this helps. If you need further help, I do offer consulting services - the contact email address can be found at the bottom of the margin on the left.

Good luck!

ewbi.develops | 2009.08.05 10:36 AM

Hi I wonder if some one Can help me to create XSLT file, I just want to rename sheet1(Table1) and sheet(Table2)

simam | 2009.11.16 05:48 PM

simam, do you mean that you have an Excel workbook in SpreadsheetML format, and you are interested in XSLT that you can use to programmatically rename one or more worksheets within the workbook?

ewbi.develops | 2009.11.16 06:35 PM

sorry for some reason I am not able to paste my XSLT style sheet

simam | 2009.12.06 05:53 PM

simam, you can attach it to an email and send it to me at info@ewbi.com.

ewbi.develops | 2009.12.06 06:03 PM

Hi,

I wonder if you have received my email.

Thanx

simam | 2009.12.13 05:52 PM

Yes, I did, no time to look at it today, though. Perhaps tomorrow or Wednesday.

ewbi.develops | 2009.12.14 01:20 PM

I need some example to generate a print report document from the xml document using special attributes, xsl stylesheet(I have it), and want to bold (high lite) each column headeras well as at the end of each error code, break to the new page.

Thank You
Please reply me asap.
gsp

gsp | 2010.12.10 08:31 AM

Sorry, gsp, but I've no time to help. Perhaps someone else will see your comment and offer advice, but I wouldn't expect it to happen soon. Good luck.

ewbi.develops | 2010.12.10 09:00 AM

Your code renders well and everything but I have a problem regarding the formulas option. How can I render a column with formulas embedded written in A1 notation and not in R1C1 notation? My user dislikes R1C1 notation; I have tried rendering the formula as a string but in excel it ends up treated as a text and formula is not calculated. Thanks.

OlivG | 2011.03.02 03:11 PM

Hi OlivG,

I'm afraid SpreadsheetML XML requires range addresses in its formulas be in R1C1 format. This means that if you are going from SpreadsheetML to my simplified XML, you'll see R1C1-style formulas. Note, though, that when going back to SpreadsheetML from my simplified XML the formula ranges should be represented in A1 style in Excel (and if they aren't you can switch them pretty easily).

Hope this helps. Good luck.

ewbi.develops | 2011.03.02 04:16 PM



Post a Comment

 
  (optional)
  (no html)
 
   


TrackBack

TrackBack URL:  http://www.typepad.com/services/trackback/6a00d8341c7bd453ef00d83435544a53ef

Listed below are links to weblogs that reference Create Excel's SpreadsheetML using XSLT: