GLEP:68

Abstract
This GLEP specifies the format of files used to describe category and package metadata (metadata.xml).

Motivation
At the moment of writing this GLEP, category and package metadata.xml lacked proper specification. PMS Appendix A specified that the format of this file is beyond its scope, deferring the specification to the DTD file.

The original metadata.dtd file (the version before cleanups related to this spec) did not serve well as the specification. Due to the technical limitations on DTD format, it was both unable to enforce the specification fully and explain it in a readable form. Furthermore, it lacked some important details such as the format of   entries.

Besides that, there were numerous alterations to the format. GLEP 34 added metadata files for category descriptions, GLEP 46 added upstream information, GLEP 56 added USE flag descriptions, GLEP 67 altered the maintainer descriptions. Furthermore, there were additions and removals done without a formal specification, e.g. addition of slot descriptions.

Sadly, some of those GLEPs are partially in conflict with other specifications — for example, the   element as described in GLEP 56 is different than the one originally proposed and used in metadata.xml.

Therefore, the motivation for this GLEP is to provide unified, clear and complete specification for both category-wide and package-wide metadata.xml files. It is meant to combine previous GLEPs, relevant discussions and implementation in order to provide the specification that is closest to the originally intended meaning while preserving best compatibility with existing tools and data.

Metadata files
This specification provides two kinds of metadata files: category metadata files and package metadata files. Both kinds of files use XML file format with structure defined in this GLEP. The XML structure does not use a namespace and must not contain any elements outside the scope of this specification.

Category metadata files are named metadata.xml and located inside category directories in an ebuild repository. Their structure is described in category metadata section.

Package metadata files are named metadata.xml and located inside package directories in an ebuild repository. Their structure is described in package metadata section.

Text data
The following text data types are used:
 * text data,
 * multi-line text data.

In case of text data, all whitespace inside the element is normalized (consecutive whitespace sequences are replaced by a single SP). Trailing and leading whitespace is stripped.

In case of multi-line text data, all whitespace except for newline characters is normalized. Newlines are used to delimit lines of text. Leading and trailing lines of text that are either empty or consist purely of whitespace are stripped. Afterwards, the whitespace belonging to the indentation common to all non-empty lines of text is stripped.

Optionally, interspersing text with   and   elements can be allowed. In this case,   element is used to reference a category inside the repository, and must contain a valid category name.   is used to reference a package, and must contain a valid qualified package name.

Common attributes
The following common attributes are allowed on multiple elements:
 * language specifiers,
 * restriction specifiers.

Language specifiers are used whenever an element supports variants in different languages. In this case, each occurrence of the element may contain an optional lang="" attribute that contains a ISO 639-1 language code. In case no lang="" attribute is provided, an implicit default of en is assumed.

Restriction specifiers are used whenever an element supports restricting to specific package versions. In this case, each occurence of the element may contain an optional restrict="" attribute that contains an EAPI 0 dependency specification that has to match one or more versions of the package. In this case, the metadata provided by the element applies only to the package versions matching the restriction.

Category metadata
The category metadata file uses   top-level element. This element can contain, in any order:
 * zero or more   elements containing category descriptions in different languages (at most one for each language). The category description is formed of multi-line text, optionally interspersed with   and   elements.

Top-level structure
The package metadata file uses   top-level element. This element can contain, in any order:
 * zero or more   elements containing package descriptions in different languages, possibly restricted to specific package versions (at most one for each combination of language and package version). The package description is formed of multi-line text, optionally interspersed with   and   elements.
 * zero or more   elements listing package maintainers, optionally restricted to specific package versions. The maintainer format is detailed in maintainer descriptions.
 * at most one  element describing the package's natural name, as text.
 * zero or more   elements containing slot descriptions in different languages (at most one for each language), as detailed in slot descriptions.
 * zero or more   elements containing USE flag descriptions in different languages (at most one for each language), as detailed in USE flag descriptions.
 * at most one   element providing information on upstream of the package, as detailed in upstream descriptions.

Maintainer descriptions
Each   element describes a single maintainer.

The   element has an obligatory type="" attribute whose value can be either person or project.

The   element contains the following elements, in any order:
 * exactly one   element that contains the maintainer's e-mail address (used as unique identifier),
 * at most one   element that contains the maintainer's human-readable name (real name or nickname),
 * zero or more   elements that explain the role of the maintainer in different languages (at most one   for each language).

Slot descriptions
Each   element describes slots of a package (in specific language).

The   element can contain the following elements:
 * zero or more   elements describing specific ebuild slots, optionally restricted to specific package versions (at most one entry for a combination of slot specification and package version). The   element contains an obligatory name="" attribute stating the slot to which the description applies, and contains slot description as text.
 * at most one   element describing the role of subslots (all of them) as text.

USE flag descriptions
Each   element describes USE flags of a package (in specific language).

The   element can contain the following elements:
 * zero or more   elements describing specific USE flags, optionally restricted to specific package versions (at most one entry for a combination of USE flag name and package version). The   element contains an obligatory name="" attribute stating the name of the USE flag to which the description applies, and contains text, optionally interspersed with   and   elements.

Upstream descriptions
The   element provides information on the upstream of a package. It contains the following elements:
 * zero or more   elements listing package's upstream maintainers, as described in upstream maintainer descriptions,
 * zero or more   element containing URL to an on-line copy of upstream changelog,
 * zero or more   elements containing URLs to on-line copies of upstream documentation in different languages,
 * zero or more  element containing upstream bug reporting URL, that can optionally be a mailto: URL,
 * zero or more  elements listing package identities on package identification trackers. Each of those elements has an obligatory type="" attribute that matches a pre-defined name of package identification tracker, and a value that is an identifier specific to the tracker. The list of available trackers and their specific identifiers are outside scope of this specification.

Upstream maintainer descriptions
Each   element inside   describes a single upstream maintainer.

The   element has an optional status="" attribute whose value can be either active or inactive. If not specified, an implicit unknown value is assumed.

The   element has the following attributes, in any order:
 * at most one   element that contains the maintainer's e-mail address,
 * at most one   element that contains the maintainer's human-readable name (real name or nickname).

Information sources
The basic source of information on current metadata.xml format was the metadata.dtd as of 2016-03-02. Whenever the DTD was unclear, appropriate GLEPs were referenced in order to deduce the original intent. Whenever the GLEPs were unclear or the elements missed GLEPs, original mailing list discussions were referenced.

Removed elements
Compared to the original DTD, the following elements were removed (both in the spec and in the updated DTD file):
 * global-scope   element was removed. It dates back to the original metadata.xml proposal but was never implemented — instead, plain text ChangeLogs were used. Furthermore, GLEP 46 introduced   inside   with different type which collided with the global declaration due to DTD limitations.
 * top-level   variant was removed. It was never used and it was really unclear what its use would be. In any case, this made the DTD simpler.

value format
A debate on valid format of   element values preceded the writing of this GLEP. The DTD did not specify a value format restriction on this, only suggested that it is used for cross-linking. Further on, GLEP 56 redefined its value to a valid CP or CPV. The practical uses did not include the latter case; however, it was common to include EAPI 1 slot specifiers or even EAPI 5 slot operators following the qualified package names.

After finding the Doug Goldstein's blog post on introduction of elements, it turned out that the original intent was to allow cross-linking/referencing from packages.gentoo.org. Since the latter uses qualified package names as identifiers, it was decided to restrict   elements to reference those. For entries that include slot specifiers, it is recommended to move the slot specifiers out of   element.

Language identifiers
Originally, the DTD used implicit default value of C. However, this value was not in line with real language specifiers found in metadata.xml. The latter usually took form of ISO 639-1 language codes which do not form a valid (complete) locale identifiers, while the former is not a valid language identifier in any of the considered standards. Furthermore, since en was commonly used to identify English in metadata.xml files, and no tools relied on the implicit default defined in the DTD, it was decided to change the implicit default to en.

Package restrictions
Originally, the DTD described the restrict="" attribute as: the format of this attribute is          equal to the format of DEPEND lines in ebuilds. This specification is based upon this definition. However, for practical reasons it added three clarifications to it:
 * 1) only package dependency specifications are allowed (i.e. no USE-conditionals or multiple dependency specifications),
 * 2) only EAPI=0 dependency specifications are allowed, since metadata.xml provides no EAPI identification mechanism and it predates EAPI,
 * 3) only dependencies referencing the same package are allowed.

Furthermore, DTD added a special case for * value that applies if there are no other tags that apply. This behavior was not used at all, and being at least a bit confusing (compared to the common use of * to imply matching everything), it was removed.

Copyright
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.