GLEP:68

Abstract
This GLEP specifies the format of files used to describe category and package metadata (metadata.xml).

Motivation
At the moment of writing this GLEP, category and package metadata.xml lacked proper specification. PMS Appendix A specified that the format of this file is beyond its scope, deferring the specification to the DTD file.

The original metadata.dtd file (the version before cleanups related to this spec) did not serve well as the specification. Due to the technical limitations on DTD format, it was both unable to enforce the specification fully and explain it in a readable form. Furthermore, it lacked some important details such as the format of   entries.

Besides that, there were numerous alterations to the format. GLEP 34 added metadata files for category descriptions, GLEP 46 added upstream information, GLEP 56 added USE flag descriptions, GLEP 67 altered the maintainer descriptions. Furthermore, there were additions and removals done without a formal specification, e.g. addition of slot descriptions.

Sadly, some of those GLEPs are partially in conflict with other specifications — for example, the   element as described in GLEP 56 is different than the one originally proposed and used in metadata.xml.

Therefore, the motivation for this GLEP is to provide unified, clear and complete specification for both category-wide and package-wide metadata.xml files. It is meant to combine previous GLEPs, relevant discussions and implementation in order to provide the specification that is closest to the originally intended meaning while preserving best compatibility with existing tools and data.

Metadata files
This specification provides two kinds of metadata files: category metadata files and package metadata files. Both kinds of files use XML file format with structure defined in this GLEP. The XML structure does not use a namespace and must not contain any elements outside the scope of this specification.

Category metadata files are named metadata.xml and located inside category directories in an ebuild repository. Their structure is described in category metadata section.

Package metadata files are named metadata.xml and located inside package directories in an ebuild repository. Their structure is described in package metadata section.

Text data
The following text data types are used:
 * text data,
 * multi-line text data.

In case of text data, all whitespace inside the element is normalized (consecutive whitespace sequences are replaced by a single SP). Trailing and leading whitespace is stripped.

In case of multi-line text data, all whitespace except for newline characters is normalized. Newlines are used to delimit lines of text. Leading and trailing lines of text that are either empty or consist purely of whitespace are stripped. Afterwards, the whitespace belonging to the indentation common to all non-empty lines of text is stripped.

Optionally, interspersing text with   and   elements can be allowed. In this case,   element is used to reference a category inside the repository, and must contain a valid category name.   is used to reference a package, and must contain a valid qualified package name.

Common attributes
The following common attributes are allowed on multiple elements:
 * language specifiers,
 * restriction specifiers.

Language specifiers are used whenever an element supports variants in different languages. In this case, each occurrence of the element may contain an optional lang="" attribute that contains a ISO 639-1 language code. In case no lang="" attribute is provided, an implicit default of en is assumed.

Restriction specifiers are used whenever an element supports restricting to specific package versions. In this case, each occurence of the element may contain an optional restrict="" attribute that contains an EAPI 0 dependency specification that has to match one or more versions of the package. In this case, the metadata provided by the element applies only to the package versions matching the restriction.

Category metadata
The category metadata file uses   top-level element. This element can contain, in any order:
 * zero or more   elements containing category descriptions in different languages (at most one for each language). The category description is formed of multi-line text, optionally interspersed with   and   elements.

Top-level structure
The package metadata file uses   top-level element. This element can contain, in any order:
 * zero or more   elements containing package descriptions in different languages, possibly restricted to specific package versions (at most one for each combination of language and package version). The package description is formed of multi-line text, optionally interspersed with   and   elements.
 * zero or more   elements listing package maintainers, optionally restricted to specific package versions. The maintainer format is detailed in maintainer descriptions.
 * zero or more   elements containing slot descriptions in different languages (at most one for each language), as detailed in slot descriptions.
 * zero or more   elements containing USE flag descriptions in different languages (at most one for each language), as detailed in USE flag descriptions.
 * at most one   element providing information on upstream of the package, as detailed in upstream descriptions.

Maintainer descriptions
Each   element describes a single maintainer.

The   element has an obligatory type="" attribute whose value can be either person or project.

The   element contains the following elements, in any order:
 * exactly one   element that contains the maintainer's e-mail address (used as unique identifier),
 * at most one   element that contains the maintainer's human-readable name (real name or nickname),
 * zero or more   elements that explain the role of the maintainer in different languages (at most one   for each language).

Slot descriptions
Each   element describes slots of a package (in specific language).

The   element can contain the following elements:
 * zero or more   elements describing specific ebuild slots (at most one for each slot name). The   element contains an obligatory name="" attribute stating the slot to which the description applies, and contains slot description as text. Alternatively, a slot name of * can be used to indicate a single description applying to all slots (no other   elements may be used in this case).
 * at most one   element describing the role of subslots (all of them) as text.

USE flag descriptions
Each   element describes USE flags of a package (in specific language).

The   element can contain the following elements:
 * zero or more   elements describing specific USE flags, optionally restricted to specific package versions (at most one entry for a combination of USE flag name and package version). The   element contains an obligatory name="" attribute stating the name of the USE flag to which the description applies, and contains text, optionally interspersed with   and   elements.

Upstream descriptions
The   element provides information on the upstream of a package. It contains the following elements:
 * zero or more   elements listing package's upstream maintainers, as described in upstream maintainer descriptions,
 * at most one   element containing URL to an on-line copy of upstream changelog,
 * zero or more   elements containing URLs to on-line copies of upstream documentation in different languages (at most one for each language),
 * at most one  element containing upstream bug reporting URL, that can optionally be a mailto: URL,
 * zero or more  elements listing package identities on package identification trackers. Each of those elements has an obligatory type="" attribute that matches a pre-defined name of package identification tracker, and a value that is an identifier specific to the tracker. The list of available trackers and their specific identifiers are outside scope of this specification.

Upstream maintainer descriptions
Each   element inside   describes a single upstream maintainer.

The   element has an optional status="" attribute whose value can be either active or inactive. If not specified, an implicit unknown value is assumed.

The   element has the following attributes, in any order:
 * at most one   element that contains the maintainer's e-mail address,
 * exactly one   element that contains the maintainer's human-readable name (real name or nickname).

Information sources
The basic source of information on current metadata.xml format was the metadata.dtd as of 2016-03-02. Whenever the DTD was unclear, appropriate GLEPs were referenced in order to deduce the original intent. Whenever the GLEPs were unclear or the elements missed GLEPs, original mailing list discussions were referenced.

Removed elements
Compared to the original DTD, the following elements were removed (both in the spec and in the updated DTD file):
 * package-scope   element was removed. It dates back to the original metadata.xml proposal but was never implemented — instead, plain text ChangeLogs were used. Furthermore, GLEP 46 introduced   inside   with different type which collided with the global declaration due to DTD limitations.
 * package-scope  element was removed. It was available for 1.5yr and after that time, it reached four packages providing it and no known tool supporting/using it. It was used only to provide a copy of package name with correct case (e.g. libessl -> LibreSSL), therefore the information provided by it was considered redundant.
 * top-level   variant was removed. It was never used and it was really unclear what its use would be. In any case, this made the DTD simpler.

value format
A debate on valid format of   element values preceded the writing of this GLEP. The DTD did not specify a value format restriction on this, only suggested that it is used for cross-linking. Further on, GLEP 56 redefined its value to a valid CP or CPV. The practical uses did not include the latter case; however, it was common to include EAPI 1 slot specifiers or even EAPI 5 slot operators following the qualified package names.

After finding the Doug Goldstein's blog post on introduction of elements, it turned out that the original intent was to allow cross-linking/referencing from packages.gentoo.org. Since the latter uses qualified package names as identifiers, it was decided to restrict   elements to reference those. For entries that include slot specifiers, it is recommended to move the slot specifiers out of   element.

Language identifiers
Originally, the DTD used implicit default value of C. However, this value was not in line with real language specifiers found in metadata.xml. The latter usually took form of ISO 639-1 language codes which do not form a valid (complete) locale identifiers, while the former is not a valid language identifier in any of the considered standards. Furthermore, since en was commonly used to identify English in metadata.xml files, and no tools relied on the implicit default defined in the DTD, it was decided to change the implicit default to en.

Package restrictions
Originally, the DTD described the restrict="" attribute as: the format of this attribute is equal to the format of DEPEND lines in ebuilds. This specification is based upon this definition. However, for practical reasons it added three clarifications to it:
 * 1) only package dependency specifications are allowed (i.e. no USE-conditionals or multiple dependency specifications),
 * 2) only EAPI=0 dependency specifications are allowed, since metadata.xml provides no EAPI identification mechanism and it predates EAPI,
 * 3) only dependencies referencing the same package are allowed.

Furthermore, DTD added a special case for * value that applies if there are no other tags that apply. This behavior was not used at all, and being at least a bit confusing (compared to the common use of * to imply matching everything), it was removed.

Upstream block
The upstream block was defined by GLEP 46. However, this GLEP is ambiguous at the best. Tiziano Müller (one of the original authors) has explained the intent behind most of the elements of the GLEP.

In particular, he confirmed that the GLEP lists all elements that are allowed explicitly, and no implicit inclusions were meant to be allowed. This means that the   element does not allow a  .

He also confirmed that unless noted otherwise, elements were not allowed to be used more than once. This affects  and   elements. Repetitions of   were only allowed because DTD technically didn't permit restricting them while allowing uses of different languages.

At the time of writing this GLEP, only a single Gentoo package was using multiple  elements, and no packages were using multiple   or   elements (or non-English docs). For this reason, this GLEP enforces the original intent of at most one element.

Upstream maintainer descriptions
The proper contents of the   elements in   blocks were unclear in the DTD since the technical file format limitation implied that all elements and attributes added for the Gentoo maintainers also applied to upstream maintainers, and vice versa.

The comments in the DTD clearly separated attributes between the two — i.e. stated that the type attribute is used only for Gentoo maintainers, while the status attribute is used only for upstream maintainers. However, package version restrictions and maintainer descriptions were also implicitly allowed on them. Since neither of the two was allowed by GLEP 46, this specifications disallows them.

Backwards Compatibility
This specification does not introduce any new elements or attributes compared to the current DTD. Therefore, all metadata.xml files created in its compliance will be read correctly by the existing tools and will conform to the current DTD.

However, this specification is more strict than the rules enforced by the DTD. Therefore, not all existing metadata.xml will be conforming to the spec, even though they would be correct according to the DTD. New tools will consider the files incorrect and request developers to fix them.

Parsing metadata.xml
Since the metadata.xml format provided by this specification is compatible with existing tool, no new implementation is required for reading those files.

Checking metadata.xml validity
To provide more strict checking of metadata.xml files, XML schema file is provided in gentoo-xml-schema repository. This schema provides:
 * element structure checks,
 * data duplication checks (e.g. multiple descriptions for the same flag but see below),
 * partial value correctness checks.

The limitations of the schema are:
 * values are verified using simple regular expressions, so not all format violations will be caught (e.g. the rule will consider app-foo/bar-1 a valid qualified package name when the version suffix is disallowed),
 * cross-references can not be checked (package references, category references, URLs, project identifiers),
 *   correctness can not be checked,
 * data duplication checks are done per restrict="" value rather than per every package version matched by the restriction. Therefore, multiple definitions that are applied to a single package by two different restrict="" rules will not be caught.

Example metadata.xml file
German translations provided by User:Tamiko.

Copyright
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.