How can I style the resulting Microsoft Word output?

Tip
Summary
  • There is no quick way of doing this.

  • Everything you can do in Word, you can do in Word HTML. Save Word documents as Word HTML to see how.

  • Clone the metanorma-sample gem: https://github.com/riboseinc/metanorma-sample.

  • Edit the word_sample_titlepage.html and word_sample_intro.html pages to match your organization’s branding. With lots of iterations of saving Word documents as HTML, for trial and error.

    • Leave the Liquid Template instructions alone ({{, {%) unless you know what you’re doing with them: they are how the pages are populated with metadata.

  • Edit the default_fonts() method in your IsoDoc::…​::WordConvert class, to match your desired fonts.

  • Edit the default_file_locations() method in your IsoDoc::…​::WordConvert class, to match your desired stylesheets and file templates.

  • Edit the wordstyle.scss and sample.scss stylesheets to match your organization’s branding. With lots of iterations of saving Word documents as HTML, for trial and error.

Word output in the document toolset is generated through Word HTML, the variant of HTML that you get when you save a Word document as HTML. (That is why documents are saved in .doc, not .docx.) This has the advantage over OOXML, the native markup of DOCX, of using a well-known markup language, with a low barrier to entry: if you want to work out how to do something in Word HTML, do it in Word, save the document as HTML, and open up the HTML in a text editor. (For more on the choice of using Word HTML, see https://github.com/riboseinc/html2doc/wiki/Why-not-docx%3F.)

However Word HTML is not quite the HTML you are used to: it is a restricted, syntactically idiosyncratic variant of HTML 4, with a non-standard and weakened form of CSS. Doing any styling in Word HTML involves lots of trial and error, and paying close attention to how Word HTML does things in its CSS. We have documented a few of the clearer gotchas in https://github.com/riboseinc/html2doc/blob/master/README.adoc.

It’s still better than learning OOXML.

The process for generating Word output is fairly similar to that for generating HTML, since both processes are generating a form of HTML; as we already noted, the two processes share a substantial amount of code. The main differences are in the handling of page-media features that CSS has lagged in (footnotes, pagination, headers and footers), and in the styling of lists, for which Word HTML uses custom (and undocumented) CSS classes prefixed with @, specifying inter alia the numbering for nine levels of nesting of the same list.

  • Styling information is stored in the …​/lib/isodoc/html folder of the gem, and applies to both Word and HTML content. For Word content, the relevant files are word_…​titlepage.html (title page HTML template), word…​_intro.html (introductory HTML template, typically restricted to Table of Contents), wordstyle.scss and {name_of_standard}.scss (the Word stylesheets), and header.html (document headers, footers, and endnote/footnote separators, referenced from the stylesheets).

  • The title and introductory files may contain VML graphics, the tags for which are prefixed in Word HTML by v:; this applies in particular to text boxes (e.g. <v:textbox>). For any such prefixed tags to be recognized properly by the XML processor in Metanorma, the HTML content of the files must be in XHTML. Unfortunately, the Word output is anything but XHTML. To make it compliant:

    • Any HTML escapes must be replaced by Unicode entities; e.g.   becomes  

    • Any standalone tags must be closed; e.g. <br> becomes <br/>

    • Any attributes must be quoted; e.g. <p class=MsoNormal> becomes <p class="MsoNormal">

    • Idiosyncratic Word HTML comments, used for fallback HTML if the HTML is not rendered in Word, are not well-formed XML and must be removed (e.g. <![if !mso]>, <![endif]>)

  • The styling files to be loaded in are set in the default_file_locations() method of IsoDoc::…​::WordConvert.

  • As with HTML generation, additional files (e.g. logos) can be loaded in the initialize() method of IsoDoc::…​::WordConvert. The initialize() method also sets the @ styles in the stylesheet to be used for unordered and ordered lists; a single such style is intended to capture the behaviour of all levels of indentation.

  • As with HTML output, the HTML templates are populated through Liquid Templates: variables in {{ correspond to the hash keys for metadata extracted in IsoDoc::…​::Metadata, and its superclass IsoDoc::Metadata in the isodoc gem.

  • As with HTML, the SCSS stylesheets treat fonts as variables, and are set in the default_fonts() method of IsoDoc::…​::WordConvert.

  • Document headers and footers are set in the header.html file. This is also an HTML template, which is populated with metadata attributes through Liquid Template. The structure of header.html is determined by Word, and elements of header.html need to be crossreferenced from the Word stylesheet. To inspect Word header.html files, save a Word document as HTML, and look inside the {document_name}.fld folder generated alongside the HTML output.

A Word HTML document is populated as follows: * HTML Head wrapper (in IsoDoc::WordConvert) @wordstylesheet CSS stylesheet (generated from SCSS through the generate_css() method of Isodoc::WordConvert); corresponds to wordstyle.scss. @standstylesheet CSS stylesheet (generated from SCSS through the generate_css() method of Isodoc::WordConvert); intended to override any generic CSS in @wordstylesheet. Optional, corresponds to {name_of_standard}.scss. * HTML Body @wordcoverpage HTML template (optional, corresponds to word_…​titlepage.html). Included in <div class="WordSection1">. @htmlintropage HTML template (optional, corresponds to word…​_intro.html). Included in <div class="WordSection2">. In the existing gems, WordSection2 is paginated with roman numerals. ** Document proper (converted from Metanorma XML). Included in <div class="WordSection2"> (prefatory material) and <div class="WordSection3"> (main document). In the existing gems, WordSection3 is paginated with roman numerals.