Outputs

The metanorma toolset currently outputs documents in four formats.

Metanorma XML

The Metanorma XML output is the intermediate format which marks up the semantic content of the standards document, and is used to drive the other formats. The Metanorma XML file is also the file which is used for validation of the standards document: line numbers in the validation output refer to this file.

HTML

The HTML output is in HTML 5. It has optional Data-URI encoding of local images; if images in the output are are not Data-URI encoded, they are moved to a folder called {filename}_images, and renamed with GUID names, to prevent collisions. Audio and video files are not supported. All HTML output has a sidebar with a Javascript-generated Table of Contents, which is two section levels deep.

PDF

PDF output, for those standards gems that support it, is generated from the HTML output via Google Puppeteer (which runs in Node.js). The PDF output generation takes advantage of the print mode in the HTML CSS stylesheet, so much of the browser-like styling of the HTML is rendered as a more print-like document. Because it is generated from HTML, the PDF output does not support page numbers in its Table of Contents. Nor does it support advanced paragraph formatting, such as Keep With Next or Widow/Orphan control.

Word

The Word output is output as a DOC format rather than DOCX (i.e. the pre-2007 version of Word), and it is generated using the Microsoft Office flavour of HTML 4, as a Multipart HTML Word Document (MHT). (This is a MIME-encoded counterpart to the HTML obtained when you save a Word document as HTML.)

Using DOC HTML makes it much easier to generate documents with the advanced formatting requirements of Metanorma (including complex tables, formulas, footnotes, headers and footers, nested list numbering and crossreferences) than generating either native DOCX (in OOXML), or the DOCX flavour of MHT. For more on the choice to use DOC, see https://github.com/metanorma/html2doc/wiki/Why-not-docx%3F

The constraint on using DOC, however, imposes some constraints.

  • SVG images are not supported. (Word internally converts them into PNG files to render them in Word HTML.)

  • DOC files are a legacy format of Word.

  • DOC files cannot be processed by Pages or LibreOffice: they can only be processed by Microsoft Word. To open the Word output in LibreOffice in particular, you will need to convert the DOC file as a DOCX file, taking the following steps.

    • Open with MS Word

    • Save it once by changing the Extension to .doc. (When it asks you to overwrite, say Yes.)

  • "Save As" as a .docx file.