Metadata and Boilerplate

The front material of documents generated by Metanorma routinely involves templated text, including both the front page, and “boilerplate” about legal and other obligations surrounding the document. Those text templates in turn are routinely populated using metadata extracted from the document.

Metadata

The bibdata element in a Metanorma document contains various metadata elements about the document, as a bibliographic description.

These elements are populated either from the document attributes in the Metanorma AsciiDoc input, or with default values.

Specifically, the bibdata element is populated through the Asciidoctor::Standoc::Converter.metadata method, and its inheritors.

The bibdata element is not rendered directly as the document front page. Instead, the document front page, and other templated texts, are populated wth elements extracted from the bibdata element. That extraction takes place using the Isodoc.info method and its inheritors, which invoke the Isodoc::Metadata class and its inheritors. The extraction results in a Hash of metadata keys and values, which is used to populate any templated text.

For example, in the Metanorma ISO flavour, the document header

= This title is overriden by :title-main-en:
:docnumber: 33032
:edition: 1
:technical-committee: TC
:technical-committee-number: 399
:technical-committee-type: TC
:docstage: 10
:docsubstage: 20
:title-intro-en: Cybernetics
:title-main-en: Neuro-information interchange interface

generates the following bibdata element:

<bibdata type="standard">
  <title language="en" format="text/plain" type="main">Cybernetics — Neuro-information interchange interface</title>
  <title language="en" format="text/plain" type="title-intro">Cybernetics</title>
  <title language="en" format="text/plain" type="title-main">Neuro-information interchange interface</title>
  <docidentifier type="iso">ISO/NWIP 33032</docidentifier>
  <docidentifier type="iso-with-lang">ISO/NWIP 33032 (E)</docidentifier>
  <docnumber>1000</docnumber>
  <contributor>
    <role type="author"/>
    <organization>
      <name>International Organization for Standardization</name>
      <abbreviation>ISO</abbreviation>
    </organization>
  </contributor>
  <contributor>
    <role type="publisher"/>
    <organization>
      <name>International Organization for Standardization</name>
      <abbreviation>ISO</abbreviation>
    </organization>
  </contributor>
  <edition>1</edition>
  <language>en</language>
  <script>Latn</script>
  <status>
    <stage>10</stage>
    <substage>20</substage>
  </status>
  <copyright>
    <from>2020</from>
    <owner>
      <organization>
        <name>International Organization for Standardization</name>
        <abbreviation>ISO</abbreviation>
      </organization>
    </owner>
  </copyright>
  <ext>
    <doctype>article</doctype>
    <editorialgroup>
      <technical-committee number="1" type="TC">TC</technical-committee>
      <subcommittee/>
      <workgroup/>
    </editorialgroup>
    <structuredidentifier>
      <project-number>ISO 33032</project-number>
    </structuredidentifier>
  </ext>
</bibdata>

In turn, that generates the following metadata Hash:

{
  :agency => "ISO",
  :authors => [],
  :authors_affiliations => {},
  :docnumber => "ISO/NWIP 33032",
  :docnumeric => "33032",
  :docsubtitle => "",
  :docsubtitlemain => "",
  :docsubtitlepartlabel => "Partie&nbsp;",
  :doctitle => "Cybernetics&#x2009;&#x2014;&#x2009;Neuro-information interchange interface",
  :doctitlemain => "Neuro-information interchange interface",
  :doctitlepartlabel => "Part&nbsp;",
  :doctype => "Article",
  :docyear => "2020",
  :draft => nil,
  :draftinfo => "",
  :edition => "2",
  :editorialgroup => ["TC 399"],
  :ics => "XXX",
  :obsoletes => nil,
  :obsoletes_part => nil,
  :revdate => nil,
  :sc => "XXXX",
  :secretariat => "XXXX",
  :stage => "10",
  :stage_int => 10,
  :statusabbr => "NWIP",
  :tc => "TC 399",
  :tc_docnumber => [],
  :unpublished => true,
  :wg => "XXXX"
}

Some metadata hash values are normalized, especially as the contents of the hash are intended for display; dates, for example, are often resolved from the ISO 8601-1 and ISO 8601-2 formats to formats with the month spelled out.

Default metadata values

Each gem can customise its own metadata values.

These are the default metadata values extracted by the base Isodoc::Metadata class, and the corresponding Metanorma XML locations they are populated from:

authors

an array of personal author names, each name extracted from //bibdata/contributor[role/@type = 'author' or xmlns:role/@type = 'editor']/person, and being either ./name/completename or ./name/forename + " " ./name/surname.

authors_affiliations

a hash of affiliations that personal authors have, each personal affiliation mapping to the array of personal names of authors working there. The affiliations are extracted from the personal author names (see above) as ./affiliation/organization/name plus ./affiliation/organization/address/formattedAddress, comma-delimited, or else either the name or the address. So for example, { "CSIRO" ⇒ ["Fred Nerk", "Joe Bloggs"], "University of Auckland" ⇒ ["John Doe"] }.

{type}date

The date at which the {type} event occurred. The {type} is the name of the lifecycle event modelled by Relaton, including published accessed created implemented obsoleted confirmed updated issued received transmitted copied unchanged circulated. The date is extracted from //bidata/date[@type = {type}].

doctype

Flavour-specific document type, from //bibdata/ext/doctype.

agency

A concatenation of all the agency abbreviations (or, if that is unavailable) agency names responsible for publishing the document. Extracted from //bibdata/contributor[xmlns:role/@type = 'publisher']/organization, using either ./abbreviation or ./name. E.g. "ISO/IEC".

unpublished

Boolean value of whether the document is considered to be an unpublished draft or published, based on the status of the document.

stage

The stage of the document, extracted from //bibdata/status/stage.

stageabbr

The abbreviation of the stage of the document, as extracted from //bibdata/status/stage. By default, this is the initials of the stage if the document is unpublished, and nil if the document is published.

substage

The substage of the document, extracted from //bibdata/status/substage.

iteration

The iteration of the document stage, extracted from //bibdata/status/iteration.

docnumber

The first document identifier given in the XML for the document, extracted from //bibdata/docidentifier.

docnumeric

The numeric identifier for the document, extracted from //bibdata/docnumber. The canonical document identifier in docnumber is typically the docnumeric value, preceded by an agency abbreviation and/or a document type.

edition

The document edition, extracted from //bibdata/edition.

docyear

The document copyright year, extracted from //bibdata/copyright/from.

draft

The document draft number, extracted from //bibdata/version/draft.

revdate

The document revision date, extracted from //bibdata/version/revision-date.

draftinfo

The draft number and revision date, preceded with the local label for DRAFT.

title

The document title, extracted from the first //bibdata/title[@language='en'] found in the document.

partof

The identifier of the document this document is part of, extracted from //bibdata/relation[@type = 'partOf']//docidentifier.

obsoletes

The identifier of the document this document obsoletes, extracted from //bibdata/relation[@type = 'obsoletes']//docidentifier.

obsoletes_part

The part of this document that has been obsoleted, extracted from //bibdata/relation[@type = 'obsoletes']//locality.

html

The URL for an HTML version of this document, extracted from //bibdata/uri[@type = 'html'].

xml

The URL for an XML version of this document, extracted from //bibdata/uri[@type = 'xml'].

pdf

The URL for an PDF version of this document, extracted from //bibdata/uri[@type = 'pdf'].

doc

The URL for a DOC version of this document, extracted from //bibdata/uri[@type = 'doc'].

url

The URL for an unspecified version of this document, extracted from //bibdata/uri[not(@type)].

Boilerplate processing

The metadata hash is used by the Isodoc::Convert.populate method, to populate all templated text. Templated text is expected to be in Liquid template language.

The keys of the metadata hash are the variable names passed into Liquid.

Given given the metadata Hash above, the following templated text:

<div class="doctitle-en">
  <div>
    <span class="title">{{ doctitleintro }}{% if doctitleintro and doctitlemain %} — {% endif %}</span><span class="subtitle">{{ doctitlemain }}{% if doctitlemain and doctitlepart %} —{% endif %}</span>
{% if doctitlepart %}
  </div>
  <div class="doctitle-part">
    {% if doctitlepartlabel %}
    <span class="partlabel">{{ doctitlepartlabel }}:</span>
    {% endif %}
    <span class="part">{{ doctitlepart }}</span>
{% endif %}
  </div>
</div>

is populated as:

<div class="doctitle-en">
  <div>
    <span class="title"></span><span class="subtitle">Main Title&#x2009;&#x2014;&#x2009;Title</span>
  </div>
</div>

and all the conditional output is ignored, because the document has neither a part component nor an introductory component to its title: only {{ doctitlemain}} ends up populated.

The Isodoc::Convert.populate method merges the metadata Hash with the @labels hash used for internationalisation (see Localization how-to guide). This is so that any templated text can also access localised labels defined for the current language.

The metadata hash for a flavour is also populated with the absolute file locations of the gem’s copy of any logo images. That means that any logos are populated in templated text using the metadata hash.

For example, the HTML and Word logo images for the Metanorma M3D flavour are defined in IsoDoc::M3d::Metadata.initialize as:

def initialize(lang, script, labels)
  super
  here = File.dirname(__FILE__)
  set(:logo_html,
      File.expand_path(File.join(here, "html", "m3-logo.png")))
  set(:logo_word,
      File.expand_path(File.join(here, "html", "logo.jpg")))
end

That means that the HTML logo image is populated in the HTML cover page for M3D through a Liquid variable:

<img src="{{ logo_html }}" alt="m3 logo"/>
Note
Although the absolute file location of the image inside the gem is used, postprocessing replaces this with either a local copy or a Data URI, in the case of HTML, and a MIME embedded attachment containing the image, in the case of Word.

The templated text populated through metadata can include:

  • Under the isodoc/*/html directory of the gem:

    • The HTML cover page (html_*titlepage.html) and Word cover page (word*_titlepage.html), which are the main destination for bibdata metadata.

    • The introductory page for HTML and Word (html_*intro.html, word*_intro.html), although this is usually populated instead via Metanorma boilerplate (see below).

    • The Word header (header.html).

    • The HTML and Word Stylesheets (*.scss). This is in case any variables are used to either populate the stylesheet, or to conditionally include text; NIST and IEC use the current document status to turn line numbering on or off in the Word stylesheet. (Draft documents are line-numbered, and whether a document is in draft or not depends on the value of bibdata/status.)

  • Under the asciidoctor/* directory of the gem:

    • The Metanorma boilerplate file (boilerplate.xml)

Boilerplate

The boilerplate element in Metanorma XML follows after bibdata, and contains text that is repeatedly included in each instance of the document class, and that outlines the rules under which the document may be used.

By default, the boilerplate element contains up to four elements:

  • copyright-statement,

  • license-statement,

  • legal-statement, and

  • feedback-statement.

Each of those statements is a Metanorma clause, which can contain a title, multiple paragraphs, and subclauses.

Because the boilerplate content is repeated for each document in its class, it is not expected to be supplied by the user (although the user can supply their own boilerplate file using the :boilerplate-authority: document attribute). Instead, the boilerplate content is included as a Metanorma XML file within the gem; by default, it is called boilerplate.xml.

Some of the boilerplate may be populate with metadata specific to the current document, so the boilerplate file is a Liquid template, populated with variables from the current flavour metadata Hash as with other templated text.

The content in the boilerplate element is processed as part of the document preface, and converted to HTML or Word like the rest of the Metanorma XML. However, boilerplate content usually ends up in the cover page or introductory page of the document instead. The following are the default conventions in Metanorma, although they can be overridden in the IsoDoc::*::Converter.authority_cleanup method (as is currently done in NIST):

  • Content in the copyright-statement element is rendered in a <div class="boilerplate-copyright"> container.

  • The authority_cleanup method, defined in postprocessing for both the HTML and the Word converters, looks for a single element with id attribute boilerplate-copyright-destination.

  • If it finds such an element, it moves the <div class="boilerplate-copyright"> container and its contents to replace that element. This is how boilerplate content can populate the cover page or introductory page, instead of occurring within the document body.

  • This is repeated for each of license-statement, legal-statement, and feedback-statement.

For example, in the Metanorma ISO flavor:

  • the copyright statement for ISO occurs on the second page:

    • <div id="boilerplate-copyright-destination"/> appears accordingly in the introductory page template;

  • the license statement is the warning present, if the document is in draft:

    • <div id="boilerplate-license-destination"/> appears in the title page template for the flavour;

    • the CSS styling for the front page draft warning is styled as boilerplate-license.