Tuesday, August 16, 2011

about MsOffice::Word::HTML::Writer

Yesterday I gave a lightning talk on MsOffice::Word::HTML::Writer, a module to produce documents for MsWord from HTML content. To anybody interested, here is some more info :


  • the module only needs Perl, and therefore can run on a server; it doesn't need MsWord to be installed (actually, that's the whole purpose of that module).

  • it was written for the needs of Geneva courts of law, where we needed to generate thousands of documents every day, from hundreds of models, with libraries of reusable content parts, with fusion of complex datastructures ... but the resulting docs must be editable in MsWord.

Other technologies considered, but rejected, were :



  • remote control of an MsWord instance, through a OLE connection. That's what most other courts do, but it only works on a local workstation : you could hardly do the same on a server, first because you would need to install MsOffice, and second because this architecture is not reliable enough : the MsWord instance is a bottleneck, it might break, or it might suddenly put a popup to ask the user for something ... except there is nobody to answer if running on a server. So this approach can be considered if you have a heavy application, deployed on everybody's PC; but not for a Web application.

  • generating RTF, using some templating tool : this works on a server, but authoring models can be tricky (esp. it is not obvious to factorize reusable parts).

  • generating XML, either in ODF (open format used by OpenOffice), or in OOXML (Microsoft proprietary format) : here the main problem is authoring, because both XML specs are huge, so you really need specialists to produce your models

  • generating ODF from http://search.cpan.org/dist/OpenOffice-OODoc/ : well, ODF can be read by MsWord, but many features are missing. Besides, OODoc uses a programming approach, not a templating approach, so it's harder to delegate authoring of models to people who are not familiar with Perl.

MsOffice::Word::HTML::Writer works by assembling HTML content (the main document, the headers/footers, other resources like images or CSS stylesheets), and then packaging the whole thing as a single MIME/multipart file. Model authors just need to know HTML, so it's easy to build an authoring team; and if you have a good templating framework, it's also easy to build a library of reusable content parts.

6 comments:

  1. I have to say that all the information you have provided here is very useful. I had no idea abut this concept before reading your post. This post is brilliantly written. Your effort is clearly visible in this post. This is excellent post. Keep it up.

    ReplyDelete
  2. This module has been really useful. Thanks man.

    ReplyDelete