HTML and MS Word HTML Output

In order to create CSS stylesheets for the HTML and Microsoft Word HTML output of Metanorma, it is necessary to understand the structure of the HTML it generates.

HTML

Top-Level Structure

The head of the HTML document contains a single stylesheet (the :htmlstylesheet parameter of HtmlConvert.new()), and a few script calls that are embedded in the Ruby code (initialising jQuery, including webfonts).

The body of the HTML document is divided into the following parts:

  • A title section (<div class="title-section">), comprising identifying information about the document, such as appears in a title page in print.

    • The section is populated with an HTML template (the :htmlcoverpage parameter of HtmlConvert.new()). The information in this section is sourced from document metadata, rather than document content proper; the gem uses Liquid Template to populate the HTML template. Different fields usually have distinct class names for CSS styling; these can vary by gem.

    • For example, ISO documents have coverpage_docnumber (for the document ID), coverpage_techcommittee (for the technical committee responsible for the document), doctitle-en (for the English-language title of the document), doctitle-fr (for the French title), title, subtitle, part (for the three components of the document title), and coverpage_docstage (for the stage of publication of the document).

  • A prefatory section (<div class="prefatory-section">), comprising boilerplate information which also does not come from document content proper. This is typically restricted to a copyright statement (<div class="copyright">), contact details, and a table of contents <div id="toc">.

    • The section is also populated with a Liquid HTML template (the :htmlintropage parameter of HtmlConvert.new()).

    • The table of contents in the HTML template is a placeholder; it is populated by a table of contents script included among the scripts loaded into the HTML body.

  • The main section of the document (<main class="main-section">), which is populated with the document content.

  • Optionally, a colophon (<div class="colophon">), which is populated with boilerplate information and/or document metadata. (Currently colophons in Metanorma gems appear only in Word output.)

  • Scripts. These are populated from a static file (the :scripts parameter of HtmlConvert.new()). These are expected to include MathJax, a Table of Contents generator, and a script for handling footnotes.

Body markup

Within the body of the document, different blocks and inline spans of the Metanorma document model (Standoc XML, BasicDoc XML) are represented by different CSS classes, as follows:

Sections

Symbols and abbreviated terms

<div class="Symbols"> (contents are a definition list)

Appendix title

<h1 class="Annex">

Appendix, Bibliography, Introduction

<div class="Section3">

Introduction title

<h1 class="IntroTitle">

Foreword title

<h1 class="ForewordTitle">

Deprecated term

<p class="DeprecatedTerms">

Alternative term

<p class="AltTerms">

Primary term

<p class="Terms">

Term header

<p class="TermNum">

Document title (in body)

<p class="zzSTDTitle1">

Blocks

Note

<div class="Note">

Note label

<span class="note_label">

Figure

<div class="figure">

Figure title

<span class="FigureTitle">

Example

<table class="example"> or <div class="example">

Example label

<span class="example_label">

Sourcecode

<p class="Sourcecode">

Admonition

<div class="Admonition">

Formula

<div class="formula">

Blockquote

<div class="Quote">

Blockquote attribution

<p class="QuoteAttribution">

Footnote

<aside class="footnote">

Ordered list

<ol>

Unordered list

<ul>

Definition list

<dl>

Normative reference

<p class="NormRef">

Informative reference

<p class="Biblio">

Table

<table>

Table title

<p class="TableTitle">

Table head

<thead>

Table body

<tbody>

Table foot

<tfoot>

Inline

Hyperlink

<a>

Cross-Reference

<a>

Stem expression

<span class="stem">

Small caps

<span style="font-variant:small-caps;">

Emphasis

<i>

Strong

<b>

Superscript

<sup>

Subscript

<sub>

Monospace

<tt>

Strikethrough

<s>

Line Break

<br>

Horizontal Rule

<hr>

Page Break

<br> (realised as page break in Word HTML)

Images

All images for an HTML document {filename}.html are moved to the folder {filename}_htmlimages, and renamed with GUIDs. This is to ensure that all images are available in the one location, making it easier to package the HTML output and upload it elsewhere.

Word HTML

Read a general intro to Microsoft Word output format.

Top-Level Structure

The headers and footers of a Word document are defined in Word HTML in a separate file, header.html (the :header parameter of WordConvert.new()), which is included in the file manifest for the document. The header.html file is cross-referenced to the Word HTML CSS file, and contains a separate div for each header and footer type; refer to the instances in the gems for illustration.

The head of the Word HTML document contains two stylesheets (the :wordstylesheet and :standardsheet parameter of WordConvert.new()). The :wordstylesheet is intended as generic Word markup, while :standardsheet is intended to contain styling specific to the standard. No scripts are supported in Word HTML.

The other elements of the Word HTML head are populated by the html2doc gem: a reference to a manifest of included files (specifically images and the header file), and settings to open the document in Print View at 100% magnification.

The body of the Word HTML document is divided into the following parts:

  • A title section (<div class="WordSection1">), comprising identifying information about the document, such as appears in a title page in print.

    • The section is populated with an HTML template (the :wordcoverpage parameter of WordConvert.new()). As with HTML, the information in this section is sourced from document metadata, rather than document content proper; and the gem uses Liquid Template to populate the HTML template.

  • A prefatory section (<div class="WordSection2">), comprising boilerplate information which does not come from document content proper (such as a Table of Contents shell), as well as prefatory material from the document content. The prefatory section is set in the CSS stylesheet to have Roman numerals for its pagination.

    • Because of the requirement for Roman numerals, prefatory material from the document is sent to this section, whereas all document content in the HTML document is sent to the main section.

  • The main section of the document (<div class="WordSection3">), which is populated with the remaining document content. The main section is set in the CSS stylesheet to have Arabic numerals for its pagination.

  • Optionally, a colophon (<div class="colophon">), which is populated with boilerplate information and/or document metadata.

Body markup

With the exception of the top-level document sections, discussed above, the Word HTML generated by the gem use the same CSS classes as the HTML proper. As already noted, the quirks of Word HTML CSS mean that classes need to be repeated on descendant elements that are not required in normal CSS.

The handling of footnotes and comments in Word HTML uses idiosyncratic Word HTML markup, including custom CSS, and is generated separately their the HTML counterparts in the gems.