Gesmes/TS-XML Proposal

Reference	ECB/SIS/2003/063
	http://groups.yahoo.com/group/cbXML/files/proposal/proposal.html
	https://forum.europa.eu.int/Public/irc/dsis/eeg6/library?l=/xml_uml/gesmes_cb_xml/proposal_html/_EN_1.0_&a=d
	https://www-vox.imf.org/eRoom/SDMX/GESMESTSWorkingGroupGMWG/Task%204.6~2FXML%20Proposal%20for%20GESMES~2FCB/XML/proposal.html
Date	1 October 2003
Status	Draft 8.0
Author	Bernhard Bodenstorfer

Content

Scope and Terms of Reference
Aims
1. User requirements
2. Metrics
XML Terminology Primer
Gesmes/TS Terminology Primer
1. Data model
2. EDIFACT syntax
Implementation Model
Syntax implementation
Process model
1. Parsing and generating Gesmes/TS objects
  1. Pushdown model for processing Cubes
  2. Overriding
2. Transactions
Alternatives
Profiles
1. EDIFACT
  1. Additional information
  2. Compatibility restrictions
2. SDMX-ML
Recommended URI structures
1. Directories
2. Files
Comparison with other formats
Open Issues and Questions
References
Change History
1. Draft 8.0
2. Draft 7.0

1. Scope and Terms of Reference

1.1. Purpose

This document proposes a compact implementation of the Gesmes/TS data model in XML. It is designed for efficient validation using DTD, XML Schema, or XSLT techniques.

The proposal reflects the current state of discussion in the group cbXML. It should guide the further debate and is envisaged to evolve to become the Gesmes/TS-XML format specification. Still, it has no formal status.

1.2. Copyright

Redistribution and use of this document are permitted provided that the above copyright notice, this list of conditions, and the following disclaimer are retained.

1.3. Disclaimer

This document is provided as is and without any warranties. In no event can the author or an institution he is affiliated with be made responsible for any kind of damages connected to the use of this document, however such may be caused.

This document contains links to destinations which are not under the control of the author. Neither the author nor any institution that he is affiliated with can take responsibility for the content of any resources linked to. It is the plight of the user of this document to make a deliberate and appropriate selection when following links and be aware of the risk that the content linked to does not live up to the expectations.

This document may contain registered words and trademarks without further mentioning. This fact cannot be construed as a claim that these expressions are free of rights of third parties.

You must not use this document and must immediately delete it unless you accept this disclaimer.

2. Aims

2.1. User requirements

User requirements have been collected using a Questionnaire, which was made available to an open range of users of Gesmes/TS. All responses were collected in the Stories Database. Knowing the audience, it is not surprising that most emphasis was given to requirements which help leveraging the benefits of Gesmes/TS-XML in the context of the existing business process, which can be summarised as the batch data exchange model. The aims derived from these requirements are:

Allow translation from and to Gesmes/TS-EDIFACT.
Avoid an explosion of file sizes.
Allow easy access to envelope information also to non-Gesmes/TS applications.

EEG6/WG1 expressed their interest in the development of Gesmes/TS-XML having in mind an XML version of the Gesmes standard. This calls for:

Design for compatibility with possible XML version of "plain" Gesmes.

XML nowadays is a very popular data exchange format (the second most prominent next to plain text maybe) and is supported by a diverse set of tools.

Fit existing XML tools where appropriate (e.g. XML-Schema validation).

XML is widely renowned as the data format for the Web. Requirements to be prepared for the challenges from this side are:

Allow easy processing by XSLT.
Exploit XML mechanisms for cross-reference (to structural definitions and style sheets).
Support national languages and character sets.

XML has earned the reputation of being readable for humans, at least better in this respect than e.g. EDIFACT. Once a document is readable and can be understood, users may even be tempted to modify the contents here and there in a text editor. Note that with Gesmes/TS-EDIFACT, one has among other things to be aware of the segment count, matching time range and observation count in ARR segments, and splitting strings into components of lengths not exceeding 70 characters. This motivates the requirements to:

Be human readable.
Allow easy modifications.
Avoid redundancies and unnecessary dependencies.

Finally, many people are deterred from Gesmes/TS because it is infamous for its apparent complexity; therefore:

Be simple.
Provide readable documentation.

2.2. Metrics

To judge proposals, the following metrics have been proposed:

File Size
File Complexity
Processing Time
Processing Complexity
Memory consumption while processing
Compatibility with Gesmes/TS-EDIFACT
Flexibility for future evolution
Ease of processing with XSLT
Level of validation possible using DTD and XML-Schema
Human readability: Can business people understand Gesmes/TS-XML?
Human modifiability: Can business people easily modify existing Gesmes/TS-XML files without risking to break the rules?
Can Gesmes/TS-EDIFACT native speakers understand Gesmes/TS-XML?
Can XML native speakers understand Gesmes/TS-XML?
Comparison with ebXML
Comparison with MDDL
Comparison with XBRL
Comparison with ABSXML
Comparison with CWM
Compatibility with Dublin Core
Compatibility with NAWWE

Other known resources dealing with XML for meta data or multi-dimensional data:

3. XML Terminology Primer

XML, the eXtensible Markup Language, is a recommendation of the World Wide Web Consortium (W3C), where abundant material about XML and related technologies is held available. In this section you find a very brief description of some basic terminology used in the context of XML.

A document is a stream of characters, typically the text contained in a file, but there are transient documents, too, e.g. a message between networked applications.

A tag is a piece of markup in a document. In XML, all markup is enclosed in angle brackets "<" and ">". There are other kinds of markup than tags, too, but those are of less interest here.

An element is denoted by a pair of tags, one start tag and one end tag: <prepared>...</prepared>. The token immediately following the opening "<" is the element name (here "prepared"). The text between opening and closing tags (here symbolised by three dots) is the element content. If an element has no content (i.e. is empty), an abbreviation using only one tag can be written in place of the two tags: <test/>.

The document element is the outermost element in a document. E.g. XHTML defines the document element html, which corresponds to the well-known tags <html> and </html> at the beginning and end of an HTML-document, respectively.

An attribute is a name-value pair provided inside an opening element tag: <Envelope id="IREF000001">...</Envelope>. The attribute value must be enclosed in double or single quotes. Empty elements may still have attributes.

To avoid ambiguity, some syntax characters must be escaped when they appear as normal data. This is done using predefined entities which represent them.

A namespace collects a set of elements and attributes. It is represented by a URI, which can be, but not necessarily is a URL. The important thing about the URI is its global uniqueness and thus it is a convention for an organisation to use a URL in the web-space it owns. Within a document, a namespace is assigned to a namespace prefix by a namespace declaration, which comes in a form syntactically similar to an attribute: xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01". Once declared, a namespace prefix can label element and attribute names: <gesmes:Envelope>...</gesmes:Envelope>. A namespace declared without prefix (e.g. xmlns="http://www.ecb.int/vocabulary/2002-08-03/microbop") becomes the default namespace for the respective element and its content. It applies recursively to elements (but not attributes!) which carry no namespace prefix as far as the default is not overridden.

The XSLT (Extensible Stylesheet Language Translation) provides a standard mechanism to translate XML documents from one format to another. The target format need not be XML, but often it is. Particularly popular is the translation to XHTML, which allows viewing XML documents in an HTML browser. The Gesmes/TS-XML Proposal was written in XML and an XSLT stylesheet allows viewing it in a browser. The table of contents is generated only by the stylesheet.

An XML document can contain a document type declaration, which specifies the document's grammar. This grammar is referred to as the DTD (Document Type Definition). A DTD can restrict allowed nesting relationships between elements and the available attributes for an element. It can specify types for element content and attribute values. However, DTDs can only express a very limited set of built-in primitive types.

An XML Schema, like a DTD, specifies the structure of an XML document. Actually, XML Schema is a modern alternative to DTD and overcomes many of DTD's weaknesses. Schemas are themselves XML documents and can therefore be processed using standard XML tools. In contrast to DTDs, Schemas can handle namespaces. They provide also more built-in data types, e.g. dateTime.

A document is well-formed if (among other requirements) every opening tag is closed by a suitable closing tag and the nesting order is maintained: Last opened, first closed.

A document is valid if it is structured according to the DTD specified in its DOCTYPE declaration at the start. Similarly, a document can be (Schema-)valid with respect to a relevant XML-Schema (e.g. indicated by the schemaLocation mechanism).

4. Gesmes/TS Terminology Primer

Gesmes, the Generic Statistical Message, is an EDIFACT standard format maintained by the UN/ECE. The format Gesmes/TS was defined as a profile of Gesmes. This means that Gesmes/TS data is also valid Gesmes but does not use all features available in Gesmes. Therefore, processing Gesmes/TS is significantly simpler than processing plain Gesmes. Additionally, Gesmes/TS uses a well-defined data model on top of the bare message syntax. It is this data model which is most interesting in the context of Gesmes/TS-XML since it describes the objects that Gesmes/TS deals with and their relationships.

In this section you find a very brief description of some basic terminology used in the context of Gesmes/TS, particularly focusing on the data model.

4.1. Data model

The analysis model of Gesmes/TS was formulated in the UML (Unified Modelling Language). The latest available such model is called ResultDraft5. That document is quite technical, however. The core of the model is also presented in a pattern-oriented fashion at http://www.unece.org/stats/documents/ces/sem.47/22.e.pdf. This document is easier to read and close in its presentation to this Gesmes/TS-XML proposal.

One important paradigm of Gesmes/TS is the split between structure and data. Structure tells about the schematic organisation of data; data specifies the actual values.

4.1.1. Operational level: data

The fundamental mechanism behind Gesmes/TS data can be derived from the minimalist idea of name-value-pairs that a name identifies a value. Only, since Gesmes/TS is about multi-dimensional data, it is several names together (called Dimension Values) which identify one or more values (called Characteristic Values). (For Dimension and Characteristic also see Knowledge level: structure.) The following diagram illustrates these relationships: Three Dimension Values on the left together identify one Characteristic Value for each Characteristic on the far right.

Figure 1: Fundamental Gesmes/TS data model

The data model expresses this simple but fundamental relationship in three steps. The Dimension Values together form a Key. This Key then identifies a Cube. Finally, the Cube carries the Characteristic Values.

Figure 2: Gesmes/TS data model using Key and Cube

This division into steps structures the relationships more orderly because the identification relationships are now centrally handled between Key and Cube. Nevertheless, Key and Cube remain imaginary notions only introduced to ease thought, communication, and possibly processing. Be reminded of the well-known observation from mathematics:

Jacques Hadamard: The shortest path between two truths in the real domain passes through the complex domain.

Gesmes/TS defines four classes of Cubes:

An Observation is a Cube with a Key with Dimension Values specified for all available Dimensions.
A Time Series is a Cube with a Key with Dimension Values specified for all available Dimensions except Time. No Dimension Value for Time being specified in a certain sense means Time is wildcarded.
A Sibling Group is a Cube with a Key with Dimension Values specified for all available Dimensions except Time and Frequency.
A Data Set is a Cube with an empty Key.

In a certain sense, one can say that a Time Series includes a number of Observations, a Sibling Group includes a number of Time Series, and a Data Set includes a number of Sibling Groups. Summing up, there can be an inclusion relationship between Cubes.

Characteristics can take values only for one class of Cubes each. For instance, the "Observation Value" (a Characteristic usually called OBS_VALUE) can take values only for Observations while the TITLE may only apply to Sibling Groups. This restriction is expressed as the attachment level of a Characteristic.

Data is grouped into Data Sets. The idea is that a Key identifies a Cube from within a Data Set, so a Data Set provides a mapping from Keys to Cubes. Physically, a Data Set could be a Gesmes/TS-EDIFACT file or a database, even a database view. The structure of a Data Set is determined by a Key Family. This Key Family specifies the available Dimensions, Characteristics, and admissible Values for these.

4.1.2. Knowledge level: structure

Structural information is expressed in terms of three kinds of Structural Definitions:

Code Lists define a list of strings, the Codes, which then can be used as the only admissible values for data. Each Code can carry a textual description.
Statistical Concepts document a notion. They can be used as Dimensions or Characteristics. Dimensions are used to identify objects upon which Characteristics then take values. In a certain sense, Statistical Concepts describe input and output channels in the context of multi-dimensional data. Gesmes/TS-EDIFACT defines two subtypes of Characteristics: Array Cells and Attributes. In Gesmes/TS-XML there is no such difference and all Characteristics are referred to as "attributes", (which, however, means something different than XML attributes, because the latter are also used for dimensions).
Key Families determine the Dimensions and Characteristics available within one multi-dimensional data aggregate ("Cube", see Operational level: data). When assigning Statistical Concepts as Dimensions or Characteristics, a Key Family can also specify restrictions on the allowed values for them in terms of Code Lists and Text Formats. A Text Format restricts the range of characters and string lengths.

4.1.3. Transport level: envelope

Gesmes/TS originally was conceived as a message format and thus specifies envelope information like about sender, receiver, subject, preparation time stamp. A Gesmes/TS-EDIFACT data therefore always comes in an Interchange carrying basic such information and wrapping one or more Messages carrying more such information but also Structural Definitions or Cubes depending on the message type. A significant part of the data in the envelope is just textual. But there are also structured data types used to transport sender contact information.

4.2. EDIFACT syntax

Like every EDIFACT message, Gesmes/TS-EDIFACT comes as a sequence of segments. Each segment starts with a three letter tag and continues until a closing apostrophe. The tag defines the syntax and basic meaning of the segment. In Gesmes/TS-EDIFACT, the most important tag is ARR. An ARR-segment could look like

ARR++M:TT:G:200007200009:710:13.6:A+14.8:A+15.1:A'

In this example, the tag is followed by two +-signs, which function as separators. The sequence M:TT:G codes the (Time Series-) Key, i.e. the Dimension Values, separated by colons. The sub-string 200007200009:710 specifies the Time as a range of months. Then follow tuples, in which again the colon is used as a separator. There is one tuple for each Observation: 13.6:A, 14.8:A, and 15.1:A. The portion before the colon is the Characteristic Value for the Characteristic usually called OBS_VALUE. The string A is the Characteristic Value for the Characteristic usually called OBS_STATUS. The tuples could also contain optional confidentiality and pre-break values. These four Characteristics (usually called OBS_VALUE, OBS_STATUS, OBS_CONF, OBS_PRE_BREAK) are referred to as Array Cells because the values for them are transported in ARR-segments (this is in the array section of a Message).

The values for all other Characteristics are transported in the footnote section of a Message, which consists of segments following the FNS-segment. There can only be one FNS segment per Message. The footnote section contains further ARR-segments, which carry the Keys, IDE-segments which identify the Characteristics, and either CDV or FTX segments which carry the Characteristic Values for these Characteristics:

FNS+Attributes:10'
...
ARR+3+M:TT:G'
IDE+Z10+UNIT'
CDV+KILO'
...

These lines would mean that the Time Series identified by the Key M.TT.G has a Characteristic Value KILO for the Characteristic UNIT; in short: "M.TT.G's UNIT is KILO." If you remember the Fundamental Gesmes/TS data model, then you see that this syntax follows quite closely the pattern indicated: Specify Dimension Values, Characteristic, and the Characteristic Value for it. Only the Dimensions are not explicitly mentioned. Gesmes/TS-EDIFACT uses an ordered set of Dimensions and an ordered Key. The orders match and thus it is not necessary to refer to the Dimensions explicitly when a Key like M.TT.G is written. The reference is implicit by the position in the Key. In the same fashion, Array Cells are ordered so that there the Characteristics need not be made explicit in the array section, either.

More information and links about Gesmes/TS-EDIFACT can be found in the Gesmes/TS User Guide and at http://forum.europa.eu.int/irc/dsis/eeg6/info/data/bop/gesmescb.htm.

5. Implementation Model

A reader less acquainted with the UML or less interested in modelling issues may choose to skip this section and come back after reading the Section Syntax implementation.

The implementation model of Gesmes/TS-XML is based on the analysis model as laid down in ResultDraft5.

While the analysis model of Gesmes/TS reflects human perception of Gesmes/TS and describes the objects of Gesmes/TS independent of the syntax, the implementation model of Gesmes/TS-XML defines the classes used in the XML syntax implementation. Since performance and ease of processing are issues constraining the implementation model, the two models differ in a few points. Most differences simplify the implementation model; for instance, the implementation model makes no explicit distinction between the subclasses of Cube: Hence one class to replace four. Also see Differences from the analysis model.

5.1. Package overview

The implementation model of Gesmes/TS-XML consists of five packages: data package, structure package, activity package, envelope package, and extension package. The following diagram shows their dependencies by dashed arrows. E.g. the data package relies on the structure package.

Figure 3: Overview of all packages

The data package provides the class Cube, which is at the heart of the mechanism for transport of multi-dimensional data. The structure package furnishes the data package with the necessary structural (meta) information. The activity package captures some information from the Process model. The envelope package contains the class Envelope for the XML document element, which can be wrapped around the content to be transferred. The extension package defines a simple mechanism to furnish model elements in the other packages with additional information needed in particular environments.

The following sections describe the packages of the implementation model of Gesmes/TS-XML in detail.

5.2. data package

The data package defines the skeleton necessary for the exchange of Cubes: The classes Cube and Value. The small size of the data package reflects the desire to allow an easy implementation of Gesmes/TS-XML for all parties interested in the exchange of multi-dimensional data.

Figure 4: The data package

The essence of the package is the mechanism that Cubes carry Values for Concepts. A Key Family determines which Concepts are available. This Key Family also specifies which of these Concepts are used as dimensions and which as attributes. The combination of Values for the dimensions (the key) then identifies a Cube. The Values for attributes provide the information to this Cube.

XML has nesting as its foremost built-in association mechanism. This is modelled using the nesting association. Nesting is interpreted to imply inheritance of Values from parent to child. For more on the treatment of nesting see Pushdown model for processing Cubes.

Though in principle, the data package is simple, certain constraints are applicable. The Cube's data layout is defined by a Key Family which specifies all available Concepts. In addition to that, not all Cubes can take Values for all Concepts. When a Concept is assigned as an attribute, then it may have a Value for a Cube only if this Cube has (or inherits) Values exactly for the dimensions that the Concept depends on. This is meant by the Cube's type being "compatible" to the attribute's dimension dependence. For those familiar with Gesmes/TS-EDIFACT: The dimension dependence codes an "attachment level". The dimensions for which a Cube has Values determine the "type" of Cube (e.g. Time Series or Observation).

The Gesmes/TS model of multi-dimensional data is compatible with, though not identical to, the multi-dimensional package of OMG's Common Warehouse Metamodel (CWM). What in CWM are the values of the dimension "measure", corresponds to the Concepts used as attributes in Gesmes/TS-XML.

5.3. structure package

Gesmes/TS owes its flexibility to its capacity to transport not only data but also structural definitions; these define the data layout. The structure package assembles the classes which provide this functionality: Code List, Concept, Key Family, and a couple of others used by these classes.

It is common in Gesmes/TS to define the basic properties of a Concept in one place, and later, when a Key Family assigns the Concept, refer to this prototype and fill in any necessary additional information, particularly about allowed values. In general, prototyping allows building Structural Definitions on the basis of previously defined ones of the same class. Also references are implemented using the prototyping relationship: Then simply no information is added to the referenced prototype's.

Figure 5: Types of Structural Definitions

From the user perspective, the entry point into the structure package is the class Key Family. It tells an application about the Concepts which are available for building data Cubes and whether this is as dimensions or as attributes.

Figure 6: A Key Family assigns Concepts

Concepts can play two roles. When selected as dimensions, they participate in the mechanism to identify a Cube. When selected as attributes, they help to furnish Cubes with descriptive Values.

An attribute Value may be logically determined by only a subset of all available dimensions. For instance, a time series title would typically not vary along the time dimension. The class Level describes this type of restriction. Each Level in a Key Family selects a bunch of dimensions and a bunch of attributes; and this means that within a Data Set, to have Values for the selected dimensions is necessary and sufficient to uniquely address the Value of each of the selected attributes.

The structure package also provides a simple framework to validate and interpret Values. To this end, a Concept can choose a Value Set from a multitude of possibilities.

Figure 7: Restricting Values

A Real Number Set restricts the allowed value range to double precision floating point numbers.

A Time Point Set restricts to dateTime values or to truncated such values where truncation may only proceed from the least significant end, i.e. the seconds, towards the years. The interpretation of truncated values is by filling in the "default value". E.g. 2002-06-15 would be interpreted as 2002-06-05T00:00:00 local time. (If it is not sufficiently clear which local time to use in a distributed system, the use of default values for the time zone should be avoided.) In this way date values are allowed. But note that plain time values do not permit a canonical extension to dateTime values and so cannot be used.

To support time series with a regular time grid, the Time Point Set can determine its periodicity from a Concept which has a valuation restriction to a Time Duration Set. The duration Values for this Concept then will determine the intervals between the available points in time. As you see, time intervals are determined by the values of two Concepts: The first to determine the starting time, the second for the duration.

A Time Duration Set restricts to duration values.

A Text Format restricts the set of allowed characters and string length.

5.4. activity package

The activity package tries to camouflage the absence of an agreed-upon overall process model for Gesmes/TS. Therefore, it just provides the minimum which is necessary to fulfill the requirements of Gesmes/TS-EDIFACT but after that waits for future elaboration.

Figure 8: The activity package

The Activity class has two properties. The data set identifier identifies the relevant data resource for a Party receiving data, e.g. the target database. The action then tells about the kind of operation which should be performed on the target. The available action commands so far are "update" and "delete".

The Usage Status determines whether a Value for an attribute must be present in a stable database if the Cube carries any Values at all (meaning it is present).

5.5. envelope package

XML documents need a single document element. Moreover, current Gesmes/TS-EDIFACT is a message format and thus defines information relevant for this type of data exchange. In principle, Gesmes/TS-XML is designed to be useful also outside this traditional pattern. It should be useable also as data or structure information alone without wrapping or with a wrapping coming from a different tradition (e.g. SOAP) and serving different needs.

However, one requirement for Gesmes/TS-XML is to provide all features known from Gesmes/TS-EDIFACT so as to be useful already in the established business processes. Therefore, together with the core Gesmes/TS-XML packages data and structure, a basic enveloping functionality is proposed, too. This is the purpose of the envelope package.

The central class in the envelope package is the class Envelope, which can be used to represent the document element. An Envelope can be wrapped around data and structure objects and even around nested Envelopes. All of its children are optional.

Figure 9: The envelope package

An Envelope may specify a sending Party and a number of receiving Parties together with Contact information for these.

5.6. extension package

As already mentioned, one requirement for Gesmes/TS-XML was to provide all features known from Gesmes/TS-EDIFACT. However, a couple of facets only related to EDIFACT have been dropped already in the analysis model of Gesmes/TS, most importantly the distinction between array cell and footnote section attribute.

To offer a possibility to also transport such information, though not used in XML, a simple extension mechanism is provided by the extension package. It contains a single class Extension, which can carry any additional information for particular environments. The principle is similar to the concept of TaggedValues for ModelElements in the CWM, but values may be more than just strings.

Figure 10: The extension package

The class Extension can carry an arbitrary list of objects of application specific type. Gesmes/TS applications may ignore these application-specific objects.

5.7. Differences from the analysis model

To aid experts who are familiar with the analysis model, this section tries to list all differences between the analysis model of Gesmes/TS and the implementation model of Gesmes/TS-XML. Most changes are merges of classes with similar structure. These merges reduce the number of used classes and hence the complexity of Gesmes/TS-XML applications. Also note that the analysis model formulates the constraints in quite some detail. The implementation model for brevity does not attempt to do so.

The classes Cube, Key and their subclasses have been merged into the single class Cube. Mapping between the analysis model and the implementation model is possible:
- The semantics of a Key for a Cube is captured by the set of dimension Values for the Cube.
- The type of Key for a Cube is determined by the set of dimensions for which the Cube has Values.
- The type of Cube is determined by the set of dimensions for which the Cube has Values.
The class Key Family Component and its subclasses have been merged with the class Statistical Concept into the class Concept. Mapping between the analysis model and the implementation model is possible:
- The former assignment association is captured by prototyping.
- Being a Dimension or Characteristic is decided by the Level objects in a Key Family.
- The distinction between the Dimension subclasses is semantically possible via the valuation restriction.
- The distinction between the Characteristic subclasses is purely semantic and thus should be made by the interpreting application, if required at all. In the context of Gesmes/TS-EDIFACT, the distinction could also be guided by the valuation restriction to Real Number Set. Alternatively, an Extension may be considered.
The classes Dimension Value and Characteristic Value have been merged into the single class Value. Their semantic distinction is possible via the Concept used and its role as dimension or attribute.
Attachment levels are explicitly defined by the Level objects, which refer to the determining dimensions.
The clocking association has been replaced by periodicity lookup which associates a Time Point Set (instead of a Time Concept) with the Concept used for base period duration.
A Key Family has a URI associated with it.
A Contact has a URI associated with it in place of an email-address. An e-mail address can be written there with the prefix "mailto:".
The class Value Set has new subclasses Time Duration Set and Real Number Set.
The class Period Set has been renamed to Time Point Set.
The property mnemonic of the class Structural Definition has been renamed to identifier.
The properties type and isLimit of the class Text Format have been renamed to character type and variable length, respectively.
A couple of classes from the administrative part of Gesmes/TS have not been included in the implementation model: ETS, Party, Data Exchange Party, Maintenance Agency, Data Exchange Context. The reason is that these classes mean physical real-world objects and not message content. Note that the class Party in the envelope package is different from the class Party in the analysis model in this sense.
The activity and extension packages are new and the envelope package has been re-engineered.

6. Syntax implementation

To specify the syntactic representation of the objects of Gesmes/TS-XML, it suffices to give a syntactic expression for the classes, their properties and associations in the Implementation model.

The namespace URI for all Gesmes/TS-XML elements is http://www.gesmes.org/xml/2002-08-01 unless something else is stated (which is particularly the case for the Key Family specific namespaces for Cubes). All attributes except xml:lang are defined in the default namespace, i.e. without namespace prefix.

The naming convention is that all elements (and attributes anyway) which represent simple types (void, id, string, boolean, number, date) have lowercase names. The other elements have names with uppercase first letter and contain element content. There are no elements defined with mixed content. Also the Extension element (see extension implementation) should carry only element content, though in theory it could carry any.

An exception to the uppercase/lowercase naming convention may be the Value attributes since their names are derived from Concept identifiers. In Gesmes/TS-EDIFACT, these identifiers are traditionally capitalised.

Another exception is the value-of element, which contains a reference to a Concept but still is written in lower case because the semantics are that it evaluates to a string derived from data.

6.1. data implementation

6.1.1. Value

A Value is represented by an XML attribute which holds its value as a string. The attribute's name is the Concept identifier of the evaluating Concept. The attribute's namespace is the void default.

6.1.1.1. Example

Values for the Concepts with identifiers "TIME_PERIOD" and "OBS_VALUE" could be represented by the following attributes:

TIME_PERIOD="2002-07-26" OBS_VALUE="72.7"

6.1.2. Cube

A Cube is represented by an element with an arbitrary name from a namespace which fulfills the Cube contract. It is possible to use "Cube" as the only element name. However, see Section SDMX-ML for guidance with naming. The Cube contract postulates that nested elements except Extension from the extension package are Cubes, their attributes hold Values, and Values are semantically inherited down the nesting hierarchy. All Cube elements should be based on a single base type provided by the Gesmes/TS-XML namespace.

The structure selection association is implemented via the Cube element's namespace. This namespace is bound to the structuring Key Family's URI. It is recommended to set the default namespace to this in the outermost Cube element.

Cube nesting parallels the element nesting in XML. The Values associated with a Cube are represented as XML attributes to the Cube element.

6.1.2.1. Example

The following shows some data concerning a time series which would be called M.TT.G in Gesmes/TS-EDIFACT. The outermost DataSet binds the default namespace to http://www.ecb.int/vocabluary/2002-08-03/microbop, which is the URI for ECB's (imagined) Key Family "microbop".

<DataSet xmlns="http://www.ecb.int/vocabluary/2002-08-03/microbop" OBS_STATUS="A">
  <Series FREQ="M" REF_AREA="TT" ITEM="G" TIME_FORMAT="P1M" BREAKS="Jan 2000 Redefinition of time">
    <Obs TIME_PERIOD="2000-07" OBS_VALUE="1348"/>
    <Obs TIME_PERIOD="2000-08" OBS_VALUE="1259"/>
  </Series>
  <Series FREQ="M" REF_AREA="TT" ITEM="S" BTIME_FORMAT="P1M" BREAKS="Jan 2000 Redefinition of time">
    <Obs TIME_PERIOD="2000-07" OBS_VALUE="4577"/>
    <Obs TIME_PERIOD="2000-08" OBS_VALUE="4580"/>
  </Series>
</DataSet>

Note that OBS_STATUS is specified only once, in the outer Cube (the element DataSet) for all observations of the whole time series. The inheritance of Values along the nesting hierarchy allows saving space this way. The same applies to the dimension Values for FREQ, REF_AREA, and ITEM so that the inner Cubes (the Obs elements) must only specify the Values for the time dimension together with those for the observation value. However, see Section SDMX-ML for guidance on using this freedom.

To interpret this XML fragment, it is eventually necessary to know which elements denote dimensions and which attributes. This information is present in the referenced Key Family. It would be redundant in the data file itself. See Represent dimension Values and attribute Values differently.

For those less familiar with Gesmes/TS-EDIFACT, it should be noted that the information about the relevant time period is split there into TIME_PERIOD holding the position on the time line and TIME_FORMAT holding information about the duration of the time period. Safe translation between Gesmes/TS-XML and Gesmes/TS-EDIFACT may well require a similar split to be made. In principle, there is nothing wrong about having just one concept TIME holding a dateTime and nothing to contain duration information. That is likely more readable; it may just afterwards not cleanly map to EDIFACT syntax. Also see EDIFACT.

6.2. structure implementation

This package is the most complex one. Fortunately, exchange of multi-dimensional data is already possible with only a limited understanding of the classes Key Family and Concept. In the end, the Key Families decide whether a Concept is used as a dimension or as an attribute.

6.2.1. Structural Definition

The identifier is expressed by the attribute id.

Name and description are expressed by elements name and description, of which there can be zero or more occurrences, but more than one only if all occurrences of each such element carry xml:lang attributes with different values.

The prototyping relationship is expressed by the attribute ref. This attribute contains the identifier of the prototype to use. In front of the identifier there can be a URI and a hash mark. This gives a simple XPointer expression of the form URI#identifier. Such expressions are well known from HTML linking. The possibility to use a URI allows linking to Structural Definitions in other documents. Applications can distinguish the plain identifier and the simple XPointer case by the hash mark, which is not allowed in IDREFs and must not be used in identifiers, which are at least NMTOKENs.

The semantics is that the referencing Structural Definition takes all values from its prototype as default. Prototyping can forward defaults along multiple generations. Also the identifier is inherited from the prototype by default if it is not specified for the referencing Structural Definition.

The context of Gesmes/TS-EDIFACT requires real prototyping only once: When a Concept is assigned by a Key Family, it refers to a globally defined Concept to take the name and description from there. All other applications of the attribute ref are simple references to a previously defined element.

If in future the need arises to allow multiple prototypes, then the logical move could be to allow space-separated lists of identifiers. Possible ambiguities would, however, then need resolution; probably this would be achieved based on the ordering. In this case proxies for Structural Definition prototypes may be a handy tool to control the ordering or ambiguities.

After all name and description elements and other elements added by the subclasses of Structural Definition, there can be an optional Extension element.

6.2.2. Code List

A Code List is a Structural Definition which can also carry Codes. It is implemented as CodeList element, which after the elements specified for its superclass Structural Definition can carry any number of Code elements.

Each Code is represented by an element Code, which has a required attribute value and an optional text element description. The latter can occur zero to more times, but more than once only if occurrences are with different values for the attribute xml:lang.

The values of all Codes must be unique within one CodeList

6.2.2.1. Example

<CodeList id="CL_OBS_STATUS">
  <name>Observation status codes</name>
  <Code value="A">
    <description>Normal value</description>
  </Code>
  <Code value="B">
    <description>Break</description>
  </Code>
</CodeList>
<CodeList id="CL_OBS_CONF">
  <name>Observation confidentiality codes</name>
  <Code value="F">
    <description>Free</description>
  </Code>
  <Code value="C">
    <description>Confidential</description>
  </Code>
</CodeList>
<CodeList id="CL_REF_AREA">
  <name>Area codes</name>
  <description>ISO two letter codes<description>
  <Code value="AT">
    <description xml:lang="en">Austria</description>
    <description xml:lang="de">Österreich</description>
  </Code>
  <Code value="BB">
    <description>Barbados</description>
  </Code>
  <Code value="CH">
    <description xml:lang="en">Switzerland</description>
  </Code>
  <Code value="EE">
    <description xml:lang="en">Estonia</description>
    <description xml:lang="de">Estland</description>
    <description xml:lang="ee">Eesti</description>
  </Code>
  <Code value="TT">
    <description>Trinidad and Tobago</description>
  </Code>
</CodeList>

6.2.3. Concept

A Concept is a Structural Definition which documents the meaning of Values and can also restrict allowed Values to a domain. It is implemented as a Concept element, which can carry after the elements specified for its superclass Structural Definition zero or one element usage and then elements describing allowed Values.

The element usage is part of the activity package; see Usage Status for details.

After the element usage, an arbitrary number of elements CodeList, TextFormat, TimePointSet, TimeDurationSet, and RealNumberSet follows. The range of allowed Values for the Concept then is the intersection of the allowed Values for each of the listed Value Sets.

The elements TimeDurationSet and RealNumberSet must be empty.

A TextFormat can have two elements which restrict the allowed Values: The optional element ctype can contain a code "numeric" or "alphabetic" specifying the allowed character type. If ctype is missing, then no such restriction applies. The optional element length contains a nonnegative integer specifying the maximum string length. If its required boolean attribute exact is true, the string length must match exactly rather than be only bounded. If no element length is present, no length restriction applies. In the presence of named entities, the string length is measured after their expansion into the replacement text. As to escaping special characters like "<" and "&" this means the string before escaping. Also see XML syntax.

A TimePointSet can carry a periodicity element, which must contain a value-of element, which must have a concept attribute containing the identifier of the Concept used to determine the time grid width. Such a Concept should have a valuation restriction to a Time Duration Set. The value-of element must not be used outside a KeyFamily element (see Key Family. This somewhat intricate nesting structure is due to the indirection involved in the relationship between time and time format as used in Gesmes/TS.

6.2.3.1. Example

<Concept id="FREQ">
  <name>Frequency</name>
  <name xml:lang="fr">Fréquence</name>
  <name xml:lang="de">Frequenz</name>
  <CodeList ref="CL_FREQ"/>
</Concept>
<Concept id="REF_AREA">
  <name>Reference Area</name>
  <description>Geographical area of the observed phenomenon</description>
</Concept>
<Concept id="TIME_PERIOD">
  <name>Time</name>
  <name xml:lang="fr">Temps</name>
  <name xml:lang="de">Zeit</name>
  <TimePointSet>
    <periodicity>
      <value-of concept="TIME_FORMAT"/>
    </periodicity>
  </TimePointSet>
</Concept>
<Concept id="TIME_FORMAT">
  <name>Duration</name>
  <TimeDurationSet/>
</Concept>
<Concept id="OBS_VALUE">
  <name>Observation value</name>
  <usage>mandatory</usage>
  <RealNumberSet/>
</Concept>
<Concept id="OBS_STATUS">
  <name>Observation status</name>
  <usage>mandatory</usage>
  <CodeList ref="CL_OBS_STATUS"/>
</Concept>
<Concept id="OBS_CONF">
  <name>Observation confidentiality</name>
  <usage>conditional</usage>
  <CodeList ref="CL_OBS_CONF"/>
</Concept>
<Concept id="OBS_PRE_BREAK">
  <name>Pre break observation value</name>
  <usage>conditional</usage>
  <RealNumberSet/>
</Concept>
<Concept id="OBS_COM">
  <name>Observation comment</name>
  <usage>conditional</usage>
  <TextFormat>
    <length exact="false">350</length>
  </TextFormat>
</Concept>
<Concept id="BREAKS">
  <name>Series breaks</name>
  <usage>conditional</usage>
  <description>Overview list of historic breaks</description>
</Concept>

6.2.4. Key Family

A Key Family is a Structural Definition which can also assign Concepts as dimensions or attributes. It is implemented as the KeyFamily element, which must carry after the elements specified for its superclass Structural Definition a uri element to determine the desired namespace for referring Cube elements. Next follows an arbitrary number of Concept elements and then an arbitrary number of Level elements. The optional Extension element inherited from Structural Definition follows after this.

The Level element has two required attributes listing space-separated sets of Concept identifiers: dimensions and attributes. The Concepts listed in dimensions are the dimensions necessary to navigate to the Values of the Concepts listed in attributes. In other words, for each Level, the listed attributes vary along the listed dimensions.

To ensure that the Key Family remains operational, a few restrictions are imposed:

Concept identifiers must be unique within a Key Family.
For each Level, the list in dimensions must be disjoint with the list in attributes.
For different Levels, the Concept lists in the attributes must be disjoint. (To express multiple primary keys, this requirement could in future be relaxed to: If a Concept identifier appears in the attributes of two Levels, then the union sets of dimensions plus attributes must be equal for both Levels.)

6.2.4.1. Example

The following example shows a Key Family which defines five dimensions and six attributes. The Level elements specify the attachment levels as follows: The attributes OBS_VALUE, OBS_STATUS, OBS_CONF, OBS_PRE_BREAK, and OBS_COM vary with all dimensions, whereas BREAKS is constant along the TIME_PERIOD dimension. (This is an example and should not be misunderstood as a recommendation to use such a setting when series can have many breaks.)

<KeyFamily id="MICROBOP">
  <name>Exemplary Key Family</name>
  <description>This is an unrealistically tiny example</description>
  <uri>http://www.ecb.int/vocabulary/2002-08-03/microbop</uri>
  <Concept ref="FREQ"/>
  <Concept ref="REF_AREA">
    <CodeList ref="CL_REF_AREA"/>
  </Concept>
  <Concept id="ITEM">
    <name>Economic item category</name>
    <CodeList ref="CL_ITEM"/>
  </Concept>
  <Concept ref="TIME_PERIOD">
    <TimePointSet>
      <periodicity>
        <value-of concept="TIME_FORMAT"/>
      </periodicity>
    </TimePointSet>
  </Concept>
  <Concept ref="TIME_FORMAT">
    <TimeDurationSet/>
  </Concept>
  <Concept ref="OBS_VALUE"/>
  <Concept ref="OBS_STATUS"/>
  <Concept ref="OBS_CONF"/>
  <Concept ref="OBS_PRE_BREAK"/>
  <Concept ref="OBS_COM"/>
  <Concept id="BREAKS">
    <name>Breaks in series</name>
    <TextFormat>
      <length exact="false">350</length>
    </TextFormat>
  </Concept>
  <Level dimensions="FREQ REF_AREA ITEM TIME_PERIOD TIME_FORMAT" attributes="OBS_VALUE OBS_STATUS OBS_CONF OBS_PRE_BREAK OBS_COM"/>
  <Level dimensions="FREQ REF_AREA ITEM TIME_FORMAT" attributes="BREAKS"/>
</KeyFamily>

6.3. activity implementation

Given the absence of an agreed process model, future evolution may likely affect the activity package. This is the reason why only very basic syntax constructs are used for its currently proposed implementation.

The class Activity has two properties, which are implemented as separate elements.

6.3.1. data set identifier

The data set identifier is represented by the element dataSetId, which carries a string. In future, the association to a data set may be redesigned and become more powerful once Data Sets have a stronger semantics. Actually, a clean design might also use URIs for data sets.

6.3.1.1. Example

<dataSetId>MICROBOP</dataSetId>

6.3.2. action

The class Action is represented by a textual element action, which can take either the value "update" or "delete". In future, this element could be replaced by an enhanced mechanism to specify operations, perhaps an element called Action with first letter upper case. Also see activity alternatives.

6.3.2.1. Example

<action>update</action>

6.3.3. Usage Status

The class Usage Status is implemented as an element usage. It may contain either one of the strings "mandatory" or "conditional". It is only relevant when the Concept is assigned as an attribute. The element usage is used in the implementation of Concept.

6.4. envelope implementation

6.4.1. Envelope

The main element of this package is Envelope. This is a possible document element. It can carry an id attribute.

The content of an Envelope element can consist of a variety of child elements, all of which are optional and some of which can occur several times. An Envelope can contain in this ordering:

An optional empty element test to flag test data.
Zero or more name elements. If more than one, then these must be differentiated amongst each other by different values of the attribute xml:lang.
Zero or one prepared element containing a dateTime.
Zero or one Sender element.
Zero or more Receiver elements.
Zero or one dataSetId element.
Zero or one action element.
Zero or one reportingBegin element containing a dateTime.
Zero or one reportingEnd element containing a dateTime.
Zero or one Extension element.
Any number of CodeList elements.
Any number of Concept elements.
Any number of KeyFamily elements.
Any number of Cube elements from Key Family specific namespaces.
Any number of Envelope elements.
Zero or one Extension element.

The reporting time interval extends from the time in reportingBegin, inclusive, to the time in reportingEnd, exclusive, if both elements are present. This way, a sequence of documents can take reporting responsibility for a set of adjacent and non-overlapping time intervals.

6.4.2. Sender and Receiver

The elements Sender and Receiver have the same type: They can have an id attribute and can contain an arbitrary number of Contact elements. Each of those must have elements name, department, and function, followed by any number of elements telephone, fax, x400, and uri in order of preference.

6.4.3. Example

This example sketches a test message carrying a Cube element called DataSet which carries multi-dimensional data (not elaborated).

<Envelope id="IREF000001">
  <test/>
  <name>Micro dissemination</name>
  <prepared>2002-08-28T16:36+01:00</prepared>
  <Sender id="4F0">
    <Contact>
      <name>SIS External Hotline</name>
      <department>SIS</department>
      <function>Responsible for data processing</function>
      <uri>mailto:sis.external@ecb.int</uri>
    </Contact>
  </Sender>
  <Receiver id="U42"/>
  <dataSetId>MICROBOP</dataSetId>
  <action>update</action>
  <reportingBegin>2001-01-01T00:00+01:00</reportingBegin>
  <reportingEnd>2001-04-01T00:00+01:00</reportingEnd>
  <DataSet xmlns="http://www.ecb.int/vocabulary/2002-08-03/microbop">
    ...
  </DataSet>
</Envelope>

6.5. extension implementation

This package defines only the element Extension, which can hold any content. This provides inside Gesmes/TS-XML messages a means of transport for additional data needed by specific environments.

In general, the elements contained in Extension will not be Gesmes/TS-XML elements but from some other namespace. An application which is aware of the namespace can then use the additional information. It may occur that within one Extension there are elements from several different namespaces. This situation may (but need not) emerge if information should be supplied to a variety of different applications.

6.5.1. Example

The following example shows extension information given for a Concept, as an example of a Structural Definition.

<Concept ref="OBS_VALUE">
  <name>Observation value</name>
  <RealNumberSet/>
  <Extension xmlns:edifact="http://www.gesmes.org/xml/edifact/2002-08-01">
    <edifact:conceptType>array cell</edifact:conceptType>
  </Extension>
</Concept>

6.5.2. Example

The following example shows how it is possible to extend the Gesmes/TS-XML model in order to share ad-hoc comments for Time Series. Comments which are not ad-hoc are better collected in a code list. Then they can be transferred using the Gesmes/TS model without a need of extensions.

<Envelope id="IREF000001" xmlns="http://www.gesmes.org/xml/2002-08-01" xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" xmlns:linkednotes="http://www.gesmesuser.org/xml/link/2002-08-06">
  <name xml:lang="en">MICROBOP report update</name>
  <dataSetId>MICROBOP</dataSetId>
  <action>update</action>
  <DataSet xmlns="http://www.ecb.int/vocabulary/2002-08-03/microbop">
    <Series FREQ="A" REF_AREA="TT" ITEM="G" TIME_FORMAT="P1Y" BREAKS="note1">
      <Obs TIME_PERIOD="2000" OBS_VALUE="17442"/>
      <Obs TIME_PERIOD="2001" OBS_VALUE="19609"/>
    </Series>
    <Series FREQ="A" REF_AREA="TT" ITEM="S" TIME_FORMAT="P1Y" BREAKS="note1 note2">
      <Obs TIME_PERIOD="2000" OBS_VALUE="94260"/>
      <Obs TIME_PERIOD="2001" OBS_VALUE="91366"/>
    </Series>
    ...
    <gesmes:Extension>
      <linkednotes:Note id="note1">
        <linkednotes:content xml:lang="en">Switch from national calculation to OECD method since 2001.</linkednotes:content>
      </linkednotes:Note>
      <linkednotes:Note id="note2">
        <linkednotes:content xml:lang="en">Governor commented in the press release from 2002-02-14.</linkednotes:content>
      </linkednotes:Note>
      <linkednotes:Note id="note3">
        <linkednotes:content xml:lang="en">Published in the Monthly Bulletin.</linkednotes:content>
      </linkednotes:Note>
      ...
    </gesmes:Extension>
  </DataSet>
</Envelope>

It is clear that standard Gesmes/TS applications will not be able to understand the link semantics used to refer from BREAKS entries to the linked notes. However, tailor-made stylesheets and applications can easily present these links to footnotes as such.

6.5.3. Aliases

Gesmes/TS-XML uses Concept identifiers as attribute names for Values. This fosters readability, but can create a problem when translating Gesmes/TS-EDIFACT to Gesmes/TS-XML, since the former does not restrict the allowable identifiers to valid XML attribute names. E.g. XML forbids attribute names to begin with a number and reserves those beginning with "xml".

Though the problem can be assumed to come up only in very rare cases, there is a simple solution: Use a different identifier in XML and provide the original one in an Extension for the EDIFACT environment.

6.5.3.1. Example

<Concept id="mxl">
  <name>Xion mean lifespan</name>
  <Extension xmlns:edifact="http://www.gesmes.org/xml/edifact/2002-08-01">
    <edifact:id>xml</edifact:id>
  </Extension>
</Concept>

6.6. Differences from EDIFACT syntax

To aid experts who are familiar with Gesmes/TS-EDIFACT, this section tries to list all differences between the syntax implementations of Gesmes/TS-EDIFACT and of Gesmes/TS-XML. The consequences of differences in the implementation models are not repeated here, e.g. that a KeyFamily element has a uri for Gesmes/TS-XML, but not so for Gesmes/TS-EDIFACT. For these see ResultDraft5 and Differences from the analysis model.

Time is represented using the XML Schema type dateTime.
Time interval length is represented using the XML Schema type duration. The duration length specifies the sampling interval length, which is indirectly related to the frequency. Weekly, quarterly, and semi-annual lengths are represented in terms of days or months: P7D, P3M, and P6M, respectively. Such duration expressions replace the three-digit codes in the time format code list of Gesmes/TS-EDIFACT.
If an Interchange contains only one Message, it often can be represented as only one Envelope element, which directly contains the information, rather than an Envelope holding another nested Envelope for the single Message. This reduces not only nesting depth but may also help to diminish application complexity in the long run.
Gesmes/TS-EDIFACT had the following redundancies, which have been removed:
- Specification of syntax characters. Gesmes/TS-EDIFACT uses the usual XML escaping rules. The decimal separator is always the dot as prescribed by XML for double precision floating point.
- Repetition of the interchange reference number.
- The message count in interchanges.
- Repetition of the message reference number.
- The segment count in messages.
- A Code indicating the message content (Structural Definition or Cubes).
- In data, when specifying an attribute key, the position of the last dimension.
- In data, when listing attributes, any statement about the attachment level.
- The distinction between coded and uncoded attributes when reporting attribute Values: The object type in the IDE segment and then the use of CDV or FTX are determined by the Concept's properties.
No exceptions when assigning the first array cell (usually called OBS_VALUE) in a Key Family. Gesmes/TS-EDIFACT does not require values for usage status and attachment level when assigning this Concept.
The reference to the maintenance agency and Key Family is implemented using the namespace URI of Cube elements instead of special codes. But see EDIFACT.
Colon ":", plus sign "+", and apostrophe "'" need not be escaped. XML syntax prescribes escaping of the less-than sign "<" as < and of the ampersand "&" as & in element content. In attributes, on top of these, escaping of quotation """ as ", or apostrophe "'" as ' are prescribed when otherwise a conflict would ensue with the quotation used as border of the attribute value. The greater-than sign ">" can be escaped as > but this is required only in rare cases if CDATA sections are used.
No truncation rules are used. In Gesmes/TS-EDIFACT, variable-length strings needed to be stripped of trailing whitespace.

7. Process model

There is no established process model for Gesmes/TS which could be compared with the data model formulated in ResultDraft5. Actually, it is not at all clear whether there is one such model fitting all existing business processes. This section does not provide one either, but it points out a few components which a process model could deal with.

7.1. Parsing and generating Gesmes/TS objects

Reading and parsing Gesmes/TS-XML to the objects that Gesmes/TS describes is certainly one part of processing Gesmes/TS-XML data. Due to the hierarchical nesting provided by XML and not known from EDIFACT, it is worthwhile to have a look on how to read and interpret Gesmes/TS-XML data. (Parsing Structural Definitions is straightforward and not deemed worth any extra consideration.)

The recommended API to process Gesmes/TS-XML is the SAX (Simple API for XML) in its current version 2 (SAX2). All major XML libraries support this open standard API. Also compare Sun's popular JAXP (Java API for XML Processing). In contrast to the DOM (Document Object Model), SAX does not require an image of the whole document to be held in memory during processing, but reads the incoming data stream piece-wise and is therefore much better suited to cope with large files.

In short, SAX translates a document into a sequence of events like "element Cube open", "attribute TIME_PERIOD with value 2002-07-26", "attribute OBS_VALUE with value 72.7", "element Cube close", and so on. The application will use each of these events to modify its internal state and possibly generate some output. In short, it can behave as an automaton.

7.1.1. Pushdown model for processing Cubes

This section specifies the reference model for how to convert the sequence of SAX events triggered by Cube data input to Gesmes/TS objects. It is inspired by pushdown automata and dictionary stacks. The main task is to support inheritance of Values by Cubes from the respective parent Cubes. This is solved using a stack to store Values of parent Cubes. The stack depth parallels the nesting depth of Cubes when the nesting hierarchy is parsed.

Every level of the stack can hold a map mapping Concepts to Values. Processing now goes as sketched in the following:

Upon entry, the stack is empty.
When a Cube element is opened, then push a clone of the stack's top element or, if there is none, an empty map to the stack.
All attributes of the Cube element in the default namespace have names matching a Concept identifier and carry as content a Value for this Concept. Store in the current top map the mapping from the Concept to the Value.
When a Cube element is closed, then pop the current top map off the stack.
When an Extension element is encountered, ignore it and all its content or use application specific logic to extract needed information.

It is easy to see that if the parsed Cube element is well-formed XML, then upon exit the stack will be empty again.

At any point in time, the top map holds all the Values so far associated with the current Cube. These values can be looked up and used to build Gesmes/TS objects.

The automaton can produce output each time all attributes of a Cube element have been read, i.e. after an opening Cube element tag has been processed. At these times, the reader checks all of the Key Family's Levels for which a Concept in the attributes list appears with a Value in the current top map. For each such Level, it compares the dimensions list with the set of dimensions for which there are Values in the map.

If that set contains dimensions which are not in the dimensions, the reader signals an error that a dimension is set from which a reported attribute does not depend.
If that set contains exactly the same dimensions as listed in dimensions, the reader reports the dimension Values and the attribute Value. Then it marks the mapping for the attribute in the stack's top map not to be reported again.
Otherwise, i.e. when that set is a proper subset of the listed dimensions, the reader does nothing and keeps the mapping for the attribute in the map for later reporting in descendant Cube elements.

To set up robust checking, the parsing automaton should warn about attribute Values contained in the input which were never reported to the rest of the application. This happens if an outer Cube element sets attribute Values but contains no descendants suiting the attribute's dimension dependence. This checking can be implemented if the maps on the stack store and maintain information about when a Value was read when it was reported in output. When a value is reported, it is then necessary to update all maps from the top of the stack downwards until the map where the Value was originally read into.

If the current action is delete, at each closing Cube element tag, it is checked whether the Cube element contained other Cube elements. If not, it is deleted. Note that this is a very basic deletion strategy. Ideas for possible future finer selection can be found in activity alternatives.

7.1.2. Overriding

Descendant Cube elements usually would not carry Values for Concepts that have already been set by an ancestor Cube element. Actually, this could be forbidden by some data exchange contexts, see Profiles. Nevertheless, changing already set values is possible in principle and will be processed correctly by the Pushdown model for processing Cubes. This technique is called overriding.

In order to maintain the correspondence between logical Cube inclusion and syntactic Cube element nesting, dimension Values should never be overridden. For attribute Values, however, overriding can be a powerful technique to reduce the file size. Still some end users may not be able to process overridden Values easily. It may be best to include a configuration option for reader and writer modules which tells which restrictions should be enforced/obeyed.

7.1.2.1. Example overriding

To express one exceptional OBS_STATUS code, the following example uses overriding. From the perspective of inner Cube elements (Obs), the Value for OBS_STATUS set for the outer Cube element (Series) becomes the default value.

<Series xmlns="http://www.ecb.int/vocabulary/2002-08-03/microbop" FREQ="M" REF_AREA="TT" ITEM="G" TIME_FORMAT="P1M" OBS_STATUS="A">
  <Obs TIME_PERIOD="1999-12" OBS_VALUE="1448"/>
  <Obs TIME_PERIOD="2000-01" OBS_VALUE="1545" OBS_STATUS="B" OBS_PRE_BREAK="1544" OBS_COM="Redefinition of time"/>
  <Obs TIME_PERIOD="2000-02" OBS_VALUE="1467"/>
  <Obs TIME_PERIOD="2000-03" OBS_VALUE="1459"/>
</Series>

7.2. Transactions

In the current batch data exchange of Gesmes/TS-EDIFACT, the transactions on databases are the interchanges, which coincide with the files transferred. To be a transaction means that if an interchange is corrupted, it is rejected as a whole and none of its data is loaded into the target database.

Gesmes/TS-XML should meet all requirements of the currently used (not yet normatively specified) batch data exchange process model. It is the easiest choice to let transactions coincide with files (documents) like in Gesmes/TS-EDIFACT.

What follows next is a major digression of a mathematician, please ignore it and forgive that it is here:

Gesmes/TS transactions are idempotent, meaning that twice application in sequence of the same interchange to the database leaves the database in the same state as it was after the first application.

Gesmes/TS transactions unfortunately are not commutative, which is the background behind the infamous "dependent files problem": If a file B intended to be loaded only after a file A somehow surpasses that so that A is loaded after B, A can overwrite more recent data previously loaded from B.

One solution to this problem is to avoid surpassing, another is to make transactions commutative. The extraction time stamp transported in the prepared element could be used to distinguish more recent data from older one and overwrite or delete only data extracted earlier. Transactions would then become commutative. However, then each entry in the database needs to be associated with the respective extraction timestamp. Also, getting rid of values loaded with an erroneous future prepared-time might need special procedures or potentially much time. Therefore, receiving applications should reliably check that the prepared value must not be in the future. (Or at least not too far ahead; a little tolerance could be necessary to avoid problems with poorly synchronised clocks.)

8. Alternatives

Not only one design of Gesmes/TS-XML can be thought of. This section gives a brief overview about possible alternative approaches and the considerations which have led to the choices made.

The syntaxes presented in the examples are in many cases not in line or not completely in line with the current proposal. This is necessary to illustrate options which were not pursued in the end. In other cases, variations become clearer when the overall style does not match the final one.

8.1. data alternatives

8.1.1. Represent dimension Values and attribute Values differently

There are many ways to do so. A few of them are highlighted as "mixed" alternatives in the following sections.

uniform representation	different representation
Structural information needed when reading data. No redundancy because distinction is made only in Key Family. Simpler for generic Gesmes/TS applications. Open evolution path for future mixed use of Concepts as both, dimensions and attributes, within one Key Family. (E.g. to attach footnotes to Code List codes.)	Structural information needed when writing data. A parsing application can do more without knowing the Structural Definitions. Can look more intuitive to humans.

The choice was made for uniform representation because this is structurally simpler and thus leaves more freedom for evolution.

8.1.2. Transport Values in elements or attributes

	elements	attributes	mixed
Example	<Cube> <REF_AREA>AT</REF_AREA> <ITEM>S</ITEM> <Cube> <FREQ>M</FREQ> <Cube> <TIME_PERIOD>2002-10</TIME_PERIOD> <OBS_VALUE>80.4</OBS_VALUE> </Cube> </Cube> </Cube>	<Cube REF_AREA="AT" ITEM="S"> <Cube FREQ="M"> <Cube TIME_PERIOD="2002-10" OBS_VALUE="80.4"/> </Cube> </Cube>	<Cube REF_AREA="AT" ITEM="S"> <Cube FREQ="M"> <Cube TIME_PERIOD="2002-10"> <OBS_VALUE>80.4</OBS_VALUE> </Cube> </Cube> </Cube>
Arguments	Can transport more than string values. Could use the attribute `xml:lang` to specify Values in different languages. Allows sequential Overriding; that means overriding within one Cube element. No long Cube element in case of long Values. Needs less escaping. Simple applications may not need to parse XML attributes. The Key Family specific namespace must contain also elements for the Concepts; therefore, conflicts with Concept identifiers could necessitate Aliases.	Significantly smaller files. Allows better validation with DTD. No ordering of XML attributes. This is closer to the Gesmes/TS model and less redundant with the Key Family. However, some reading applications may like ordering. A concept name `Cube` does not need escaping if specific naming is used; see Specific or generic syntax. The structure of data is simpler and thus more flexible. E.g. using Cube elements with arbitrary names would not be so easy otherwise. Interpretation of parsed data is simpler. Values in different languages can still be transported if `xml:lang` is an attribute to the Cube element, similar to a dimension.	Similar to the use of elements, but file size could be smaller.

After having seen the dramatic difference in file sizes on real-world files, attributes became the choice. Size reduction in tests was about 40-50% when no overriding is used, and still 25-30% with overriding.

Much simpler processing and more freedom in terms of other options is another decisive argument.

8.1.3. Use a Cube element or nested Value elements

This would be only relevant if elements had been favoured in Transport Values in elements or attributes.

	Use Cube element	Use nested Values
Example	<Cube> <REF_AREA>AT</REF_AREA> <ITEM>S</ITEM> <Cube> <FREQ>M</FREQ> <Cube> <TIME>2002-10</TIME> <OBS_VALUE>80.4</OBS_VALUE> </Cube> </Cube> </Cube>	<REF_AREA><value>AT</value> <ITEM><value>S</value> <FREQ><value>M</value> <TIME><value>2002-10</value> <OBS_VALUE> <value>80.4</value> </OBS_VALUE> </TIME> </FREQ> </ITEM> </REF_AREA>
Arguments	Allows better optimisation for file size. Lower nesting levels. Restricting order, attachment level, and overriding is only possible when Cube elements with different names than only `Cube` are used. To distinguish them, the Cube element contract for processing must be like: An element which allows element content is a Cube element. All others carry Values.	Can restrict the order of Dimensions. Can validate attachment levels with DTD and XML Schema. Can restrict Overriding.

8.1.4. Specific or generic syntax

XML syntax constructs like elements and attributes could be defined depending on the Concepts assigned by a Key Family; or a generic syntax construct could be defined which is suitable for all Concepts. This topic is often raised in the question: Many Key family specific XML Schemas or one meta-schema which fits all Key Families?

To transfer Values in a generic way, dedicated elements must be used inside Cube elements to pair up a Concept id with the associated Value. Therefore, the below example compares alternatives based on elements rather than attributes.

	specific	generic	generic mixed
Example	<Cube> <REF_AREA>AT</REF_AREA> <ITEM>S</ITEM> <Cube> <FREQ>M</FREQ> <Cube> <TIME>2002-10</TIME> <OBS_VALUE>80.4</OBS_VALUE> </Cube> </Cube> </Cube>	<Cube> <value id="REF_AREA">AT</value> <value id="ITEM">S</value> <Cube> <value id="FREQ">M</value> <Cube> <value id="TIME">2002-10</value> <value id="OBS_VALUE">80.4</value> </Cube> </Cube> </Cube>	<Cube> <dim id="REF_AREA">AT</dim> <dim id="ITEM">S</dim> <Cube> <dim id="FREQ">M</dim> <Cube> <dim id="TIME">2002-10</dim> <att id="OBS_VALUE">80.4</att> </Cube> </Cube> </Cube>
Arguments	Allows data validation with XML Schema. Smaller files. Can be used also if values are transported in XML attributes. See Transport Values in elements or attributes. This yields much smaller files and allows data validation with DTD, too. Faster (but two-step) validation possible: First compile a DTD or XML Schema from Structural Definitions from and then apply this to the data file. The first step is performed centrally by the maintenance agency once for all data users. Simpler for data exchange contexts which use constant Structural Definitions because those probably need to know only one constant DTD or XML Schema, which is specific for the domain. Better tool support for end users. Reads better for humans.	Neither DTD nor XML Schema can directly provide data validation. So applications must implement validation explicitly. In other words, standard XML tools will not validate data. Needs only one namespace and thus probably integrates better with legacy Gesmes/TS systems. Probably easier to write huge generic applications in a traditional XML programming style and thus less exposed to changes in XML standards.	Mostly like the generic approach. Just some redundancy is added for the human eye.

To leverage XML standard validation support and since only the specific approach allows the use of attributes for Values, this was chosen.

8.1.5. Use Statistical Concept identifiers as names

	identifiers as names	synthetic names	synthetic mixed names
Example	<Cube REF_AREA="AT" ITEM="S"> <Cube FREQ="M"> <Cube TIME_PERIOD="2002-10" OBS_VALUE="80.4"/> </Cube> </Cube>	<Cube value2="AT" value3="S"> <Cube value1="M"> <Cube value4="2002-10" value5="80.4"/> </Cube> </Cube>	<Cube dim2="AT" dim3="S"> <Cube dim1="M"> <Cube dim4="2002-10" att1="80.4"/> </Cube> </Cube>
Arguments	Readable for humans. Simple applications may not need to refer to Structural Definitions.	Needs no Aliases. Ordering concepts can be built into the naming convention. Some applications may like this in order to circumvent accessing the Structural Definitions. Smaller files when Concept identifiers are long.

Human readability would be almost completely lost if synthetic names were used. Therefore, identifiers are proposed.

8.2. structure alternatives

8.2.1. Use IDREF or XPointer when prototyping

IDREF and IDREFS provide a simple linking mechanism supported already by DTDs. One element in a document can refer to another element there with given ID attribute. IDs must therefore be unique within the document.

XPointer is a modern linking mechanism which does not rely on a unique ID attribute and can also link to elements stored in other documents. XPointer uses the hash mark "#" to separate the document URI from the XPointer expression.

	IDREF	XPointer
Example	<Concept id="FREQ"> <name>Frequency</name> </Concept> ... <KeyFamily id="MICROBOP"> ... <Concept ref="FREQ"/> ... </KeyFamily>	<Concept id="FREQ"> <name>Frequency</name> </Concept> ... <KeyFamily id="MICROBOP"> ... <Concept ref="#FREQ"/> ... </KeyFamily>
Arguments	IDREFs are already supported by applications based on XML, but to profit from this support, IDs must be unique in the document. Appearance of `ref` attribute is consistent with that of the `dimensions` and `attributes` attributes. Simpler to implement than a full XPointer solution. But a constrained one may be not much more difficult than IDREFs. Links to other documents can still be planned for. E.g. in future an attribute `xlink:href` could be introduced as an addressing mechanism parallel to `ref`. Also dedicated proxy elements could specialise on cross-linking documents.	Allows the use of Structural Definitions in other documents, e.g. from other maintenance agencies. The look can be familiar from HTML hyperlinks. In principle, XPointer allows prototyping over more than one step without forcing to re-state the `id`. Actually, full XPointer can do without `id`s.

Note that in the Level attributes dimensions and attributes, XPointer makes little sense because the Concepts must be chosen from those assigned by the respective Key Family.

The decisive argument against full XPointer was the poor support in nowadays' tools, particularly XSLT. Future may still go for more XPointer, XLink, or some other linking mechanism. The present solution, however, is tailored to fit present tools.

The decisive argument against the pure IDREF solution was that linking across documents is a feature highly desirable to share Structural Definitions. Given all that, a mixed approach is proposed: Some simple forms of XPointers are also permitted. See Structural Definition.

Finally, note that the current proposal technically does not require to truly use IDREFs referring to IDs, but NMTOKENs. As the main difference note that IDREFs must point to IDs which are unique in the whole document.

8.2.1.1. Example

To illustrate how simple XPointer can be used, the following example uses three different documents scattered across three different organisations. The example shows a prototyping chain of length 2.

<!-- SDMX Project 1 vocabulary for Gesmes/TS -->
<!-- http://www.sdmx.org/vocabulary/2002-08-02/gesmes.xml -->
<Concept id="REF_AREA">
  <name>Reference area</name>
  <description>Geographical area that the data refer to</description>
</Concept>

<!-- ECB Code Lists -->
<!-- http://www.ecb.int/vocabulary/2002-08-03/gesmes.xml -->
<CodeList id="CL_AREA_EE">
  <name xml:lang="en">Geographical area codes</name>
  ...
  <Code value="EE"/>
    <description xml:lang="en">Estonia</description>
  </Code>
  <Code value="LT"/>
    <description xml:lang="en">Lithuania</description>
  </Code>
  <Code value="LV"/>
    <description xml:lang="en">Latvia</description>
  </Code>
  ...
</Concept>

<!-- BIS Concepts -->
<!-- http://www.bis.org/vocabulary/2002-08-04/gesmes.xml -->
<!-- Extends SDMX core concept and uses ECB's Code List CL_AREA_EE -->
<Concept ref="http://www.sdmx.org/vocabulary/2002-08-02/gesmes.xml#REF_AREA">
  <CodeList ref="http://www.ecb.int/vocabulary/2002-08-03/gesmes.xml#CL_AREA_EE"/>
</Concept>

<!-- Eurostat Key Families -->
<!-- http://europa.eu.int/comm/eurostat/vocabulary/2002-08-05/gesmes.xml -->
<KeyFamily id="MINIBOP">
  <uri>http://europa.eu.int/comm/eurostat/vocabulary/2002-08-05/minibop</uri>
  <Concept ref="http://www.gesmes.org/vocabulary/2002-08-01/gesmes.xml#FREQ"/>
  <Concept ref="http://www.bis.org/vocabulary/2002-08-04/gesmes.xml#REF_AREA"/>
  <Concept ref="ITEM"/>
  <Concept ref="http://www.gesmes.org/vocabulary/2002-08-01/gesmes.xml#TIME_PERIOD"/>
  <Concept ref="http://www.gesmes.org/vocabulary/2002-08-01/gesmes.xml#TIME_FORMAT"/>
  ...
</KeyFamily>

<!-- Some user's data -->
<!-- file:/home/user1/data.xml -->
<!-- Uses definitions as packaged by Eurostat -->
<Envelope xmlns="http://www.gesmes.org/xml/2002-08-01"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.gesmes.org/xml/2002-08-01 http://www.gesmes.org/xml/2002-08-01/gesmes.xsd
    http://europa.eu.int/comm/eurostat/vocabulary/2002-08-05/minibop http://europa.eu.int/comm/eurostat/vocabulary/2002-08-05/minibop/gesmes.xsd"
>
  <name>End-year summary of quarters</name>
  <DataSet xmlns="http://europa.eu.int/comm/eurostat/vocabulary/2002-08-05/minibop">
    <Series FREQ="Q" REF_AREA="EE" ITEM="100" TIME_FORMAT="P3M">
      <Obs TIME_PERIOD="2002-01" OBS_VALUE="1" OBS_STATUS="A"/>
      <Obs TIME_PERIOD="2002-04" OBS_VALUE="2" OBS_STATUS="A"/>
      <Obs TIME_PERIOD="2002-07" OBS_VALUE="3" OBS_STATUS="A"/>
      <Obs TIME_PERIOD="2002-10" OBS_VALUE="4" OBS_STATUS="A"/>
    </Series>
    ...
  </DataSet>
</Envelope>

8.2.2. Use container elements

In the below alternative "containers", the elements Codes, Domain, and Dimensions do nothing more than contain their children. Without loss of information, they could be dropped and replaced by just their content. The Attributes element is similar to Dimensions, but it declares the determining dimensions for the contained Concepts. However, this information could also be carried elsewhere, in which case the Attributes can be replaced by its content like the Dimensions.

	no containers	containers
Example 1	<CodeList id="CL_ITEM"> <name>Item codes</name> <Code value="G"> <description>Goods</description> </Code> <Code value="S"> <description>Services</description> </Code> </CodeList>	<CodeList id="CL_ITEM"> <name>Item codes</name> <Codes> <Code value="G"> <description>Goods</description> </Code> <Code value="S"> <description>Services</description> </Code> </Codes> </CodeList>
Example 2	<Concept ref="ITEM"> <CodeList ref="CL_ITEM"/> <TextFormat> <length exact="true">1</length> </TextFormat> </Concept>	<Concept ref="ITEM"> <Domain> <CodeList ref="CL_ITEM"/> <TextFormat> <length exact="true">1</length> </TextFormat> </Domain> </Concept>
Example 3	<KeyFamily id="MICROBOP"> <name>Exemplary Key Family</name> <uri>http://www.ecb.int/vocabulary/2002-08-03/microbop</uri> <Concept ref="FREQ"/> <Concept ref="REF_AREA"> <CodeList ref="CL_REF_AREA"/> </Concept> ... <Concept ref="OBS_VALUE" dimensions="FREQ REF_AREA ITEM TIME"/> <Concept ref="OBS_STATUS" dimensions="FREQ REF_AREA ITEM TIME"/> ... <Concept id="BREAKS" dimensions="FREQ REF_AREA ITEM"> <name>Breaks in series</name> <TextFormat> <length exact="false">350</length> </TextFormat> </Concept> ... <Level dimensions="FREQ REF_AREA ITEM" attributes="BREAKS ..."/> <Level dimensions="FREQ REF_AREA ITEM TIME" attributes="OBS_VALUE OBS_STATUS ..."/> </KeyFamily>	<KeyFamily id="MICROBOP"> <name>Exemplary Key Family</name> <uri>http://www.ecb.int/vocabulary/2002-08-03/microbop</uri> <Dimensions> <Concept ref="FREQ"/> <Concept ref="REF_AREA"> <Domain> <CodeList ref="CL_REF_AREA"/> </Domain> </Concept> ... </Dimensions> <Attributes dimensions="FREQ REF_AREA ITEM TIME"> <Concept ref="OBS_VALUE"/> <Concept ref="OBS_STATUS"/> ... </Attributes> <Attributes dimensions="FREQ REF_AREA ITEM"> <Concept id="BREAKS"> <name>Breaks in series</name> <Domain> <TextFormat> <length exact="false">350</length> </TextFormat> </Domain> </Concept> ... </Attributes> ... </KeyFamily>
Arguments	Simpler implementation model. This may be easier to understand since less types are used. Smaller structural definition files. But ESCB structural definition size is reduced by only about 2%. Lower nesting hierarchy. Use of attributes as dimensions for other attributes is possible in principle since the role of a Concept is only determined by its own and other Concepts' `dimensions` attribute. Place to factor out commonalties and for future extensions must be added in additional elements like the `Level`.	Keeps place for future extensions. E.g. `Codes` could be handy when hierarchical Code Lists need to contain more information than Codes. Provides a place to factor out common properties of the contained elements, like `Attributes` identifies the determining dimensions. Having elements `Domain`, `Dimensions`, and `Attributes` may be easier to understand for a human reader. May be a bit faster to process with XSLT and reading applications.

8.2.3. Define Code values in elements or attributes

	elements	attributes
Example	<CodeList id="CL_ITEM"> <name>Item codes</name> <Codes> <Code> <value>G<value> <description>Goods</description> <description xml:lang="de">Waren</description> </Code> <Code> <value>S<value> <description>Services</description> <description xml:lang="de">Dienstleistungen</description> </Code> </Codes> </CodeList>	<CodeList id="CL_ITEM"> <name>Item codes</name> <Codes> <Code value="G"> <description>Goods</description> <description xml:lang="de">Waren</description> </Code> <Code value="S"> <description>Services</description> <description xml:lang="de">Dienstleistungen</description> </Code> </Codes> </CodeList>
Arguments	If needed in future, there could be more than one value per Code. This could be linked to multiple coding schemes. Codes could be more than string values.	Smaller structural definition files. If ever needed that values be other than strings, a linking mechanism could still be used. Since everything is read-only, this is no loss of generality, only it may be a bit more complicated. Since code values only provide links to the descriptions, there is most likely no use of multiple values per code, anyway.

elements

attributes

Example

<CodeList id="CL_ITEM">
  <name>Item codes</name>
  <Codes>
    <Code>
      <value>G<value>
      <description>Goods</description>
      <description xml:lang="de">Waren</description>
    </Code>
    <Code>
      <value>S<value>
      <description>Services</description>
      <description xml:lang="de">Dienstleistungen</description>
    </Code>
  </Codes>
</CodeList>

<CodeList id="CL_ITEM">
  <name>Item codes</name>
  <Codes>
    <Code value="G">
      <description>Goods</description>
      <description xml:lang="de">Waren</description>
    </Code>
    <Code value="S">
      <description>Services</description>
      <description xml:lang="de">Dienstleistungen</description>
    </Code>
  </Codes>
</CodeList>

Arguments

If needed in future, there could be more than one value per Code. This could be linked to multiple coding schemes.
Codes could be more than string values.

Smaller structural definition files.
If ever needed that values be other than strings, a linking mechanism could still be used. Since everything is read-only, this is no loss of generality, only it may be a bit more complicated.
Since code values only provide links to the descriptions, there is most likely no use of multiple values per code, anyway.

8.2.4. Sharing Text Formats

When specifying the domain of a Concept, Text Formats cannot be referred to like Code Lists but must be written out. It would save space in Structural Definition files if Text Formats, too, get an id and a ref attribute so as to allow prototyping relationships like for Structural Definitions.

This has not been included in the present proposal for simplicity.

8.2.5. Regular expressions in Text Format

XML Schema can restrict string formats by regular expressions. Particularly, these can provide all restrictions that Text Formats can define. Nevertheless, readability, simplicity, compatibility with other syntaxes, and maybe processing speed are issues which have led to the decision not to use them in Text Formats.

8.2.6. Conditions in Real Number Set

In future, a Real Number Set could constrain the values to nonnegative, for instance. At the moment this is not possible for simplicity.

Some contemporary Key Families use a concept called DECIMALS to state the precision of OBS_VALUE Values. This relationship could in future be expressed using a value-of construct similar to that in the TimePointSet. See Concept.

8.3. activity alternatives

8.3.1. Actions on Cube level

In Gesmes/TS-EDIFACT, updates and deletions cannot be mixed in one message. Gesmes/TS-XML could provide a choice on a per Cube basis. This would e.g. make it possible that within one Envelope, a number of time series would be updated while another few time series were deleted.

Even more fine-grained than actions per Cube would be actions per Value. However, these become only powerful in the context of Explicit action mode.

In the following comparison assume that the namespace prefix gesmes is bound to the Gesmes/TS-XML namespace URI.

	Envelope level	Cube level	Cube level using an attribute
Example	<gesmes:action>update</gesmes:action> <Cube REF_AREA="AT" ITEM="G" UNIT="KILO"/> ... <!-- next Envelope --> <gesmes:action>delete</gesmes:action> <Cube REF_AREA="AT" ITEM="G" FREQ="M"/>	<Cube REF_AREA="AT" ITEM="G" UNIT="KILO"> <gesmes:action>update</gesmes:action> <Cube FREQ="M"> <gesmes:action>delete</gesmes:action> </Cube> </Cube>	<Cube gesmes:action="update" REF_AREA="AT" ITEM="G" UNIT="KILO"> <Cube gesmes:action="delete" FREQ="M"/> </Cube>
Arguments	There is no agreed process model yet. Hence a complicated implementation solution at this stage may completely go into the wrong direction and block future progress more than serving it. Anyway, present applications would probably not become able to support per Cube deletions in a reasonable time. Using Profiles, this is therefore likely to be forbidden from the very start.	Deletions and updates could be grouped into one message if they semantically belong together. A number of semantic issues need to be resolved: What does it mean to delete a time series, but within the deletion update one of its observations? Should this be rejected as an error? In the end, the wrapping interchange known from Gesmes/TS-EDIFACT would be superfluous and could be discontinued. This would reduce application complexity.	Reads more intuitive. Needs attributes in non-empty namespace.

The concern about hindering a future process model is the decisive argument that this proposal does not define per Cube actions.

8.3.2. More action commands

At present, two kinds of action are proposed: update and delete. However there may be a reason to consider further options.

Possible further action commands

value	meaning
`new`	Register a new Cube. Error if the Cube already exists.
`change`	Update the information of an existing Cube. Error if the Cube does not yet exist.
`empty`	Delete all data stored in an existing Cube. Error if the Cube does not yet exist.
`replace`	Combination of `empty` and `change`.

The additional options are exemplary (and not complete) attempts to make registration and removal of a Cube explicit and independent of filling it with data or resetting the data. The distinction between change and update could serve in simple cases as a shield against unexpected objects. Change would only be possible on time series which are already existing. However, a prerequisite to make this useful could be support for Actions on Cube level; without this it could be difficult to modify a time series by inserting new observations unless insertion does not count as change or there is a special action insert.

But note that the plain multi-dimensional model has no notion of "existence" of a Cube. There, a Cube can be treated like the a-priori Cartesian space. So actions actually may only apply to some database implementations. This and cautiousness not to restrain a future process model too much by premature solutions, is the reason that no more action commands than "update" and "delete" are proposed here. And I think these alone are already more than what is healthy.

8.3.3. Explicit action mode

The deletion of a Cube primarily means the deletion of all Values stored for it. What happens to included Cubes is not clear so far. In the present understanding, though implementations could fall short of this expectation, deletion of a time series implies the deletion of all its observations i.e. all values stored for them (be it array cells or attributes in the footnote section in Gesmes/TS-EDIFACT). This could be called "recursive" mode.

On the other hand, the update of the Values of some Concept for all subcubes must be done by explicit iteration through all of them. This mode could be called "individual".

However, there is no compelling reason not to allow recursive updates and individual deletions. Actually, together with the action command, its mode could be stated to specify the behaviour concerning included Cubes.

mode	How to treat existing subcubes
`individual`	Do not touch subcubes
`recursive`	Execute same action on all included Cubes

	Implicit mode	Explicit mode
Example 1	<DataSet gesmes:dsi="MICROBOP" gesmes:action="update"> ... </DataSet>	<DataSet gesmes:dsi="MICROBOP" gesmes:action="update" gesmes:mode="individual"> ... </DataSet>
Example 2	<DataSet gesmes:dsi="MICROBOP" gesmes:action="delete"> ... </DataSet>	<DataSet gesmes:dsi="MICROBOP" gesmes:action="delete" gesmes:mode="recursive"> ... </DataSet>
Arguments	Closer to EDIFACT. Cheaper to implement.	More control over what should happen.

Implicit mode

Explicit mode

Example 1

<DataSet gesmes:dsi="MICROBOP" gesmes:action="update">
  ...
</DataSet>

<DataSet gesmes:dsi="MICROBOP" gesmes:action="update" gesmes:mode="individual">
  ...
</DataSet>

Example 2

<DataSet gesmes:dsi="MICROBOP" gesmes:action="delete">
  ...
</DataSet>

<DataSet gesmes:dsi="MICROBOP" gesmes:action="delete" gesmes:mode="recursive">
  ...
</DataSet>

Arguments

Closer to EDIFACT.
Cheaper to implement.

More control over what should happen.

The mode chooses the action on subcubes already stored in the database. Cubes which are individually specified in the message are always acted on. So even in individual mode the example given in Section Cube would correctly set the Value for OBS_STATUS to A in the observations for July and August 2000; but in contrast to recursive mode, individual mode would not touch other observations of the considered time series.

8.3.3.1. Example recursive update

Recursive updates could be a simple shortcut to set Values which are the same for a large number of Cubes. The update of OBS_CONF to free throughout a whole Data Set could be written as:

<DataSet xmlns="http://www.ecb.int/vocabulary/2002-08-03/microbop" gesmes:dsi="MICROBOP" gesmes:action="update" gesmes:mode="recursive" OBS_CONF="F"/>

8.3.4. Use processing instructions

To transport application-specific processing information, XML defines a syntax for processing instructions. Data Set and Action can be viewed as hints to Gesmes/TS-loaders. They are irrelevant for many other applications. Therefore, it could seem unreasonable to transport this information in the core data. It may be nicer to put it into processing instructions.

	Attributes	Elements	Processing instructions
Example	<DataSet gesmes:dsi="MICROBOP" gesmes:action="update"> ... </DataSet>	<gesmes:action>update</gesmes:action> <gesmes:dsi>MICROBOP</gesmes:dsi> <DataSet> ... </DataSet>	<?gesmesloader dsi="MICROBOP" action="update"?> <DataSet> ... </DataSet>
Arguments	Actions can be specified inside tags. This can in future be more readable for Actions on Cube level. Allows checking of action and data set identifier with DTD or XML Schema.	Allows checking of action and data set identifier with XML Schema but not DTD. Avoids attributes in the `gesmes` namespace.	Highlights the application-specific nature. Keeps the field completely open for future development of elements to represent activity. Uses a less common syntax construct, thus parsing may be more difficult. Uses no namespaces and hence conflicts between applications could in principle occur.

9. Profiles

A profile is the syntactic manifestation of a restriction environment in the sense of ResultDraft5.

Not all environments need to support all features of Gesmes/TS-XML. Every data exchange context is allowed to restrict the features used in the documents exchanged. Note that this is how Gesmes/TS was derived from plain Gesmes. Features in Gesmes/TS-XML which are particularly prone to be restricted are:

The allowed types and nesting relationships of Cube elements. See SDMX-ML.
Overriding, since this can make the data files close to useless for online data publishing.
Aliases, which only are necessary in rare occasions so that it may not pay off to develop application logic for these cases.
Processing Structural Definitions is simpler if the id attribute is of type ID. However, this is only possible if it can be guaranteed that all ids are unique within each XML document.
CDATA sections, because their content could mislead text-based applications like reception scripts which try to extract data from XML files by context free pattern matching like from flat text files.
Use of named entities (see Entity Declarations). These can seriously obfuscate the document content for human readers and simple text-based applications. A certain reduction in file size is possible by exploiting named entities, but after compression, this difference can be expected to go away. To supply large Gesmes/TS-XML documents, web servers should anyway use a Content-Encoding like gzip or x-gzip if the client's Accept-Encoding setting allows such compression.
Optional properties of some elements may be required or forbidden according to the needs of a data exchange context.
Deletions, since they may simply not be needed in a context; e.g. think of data for online viewing.
Support for Structural Definitions. Many primitive business contexts do not require the flexibility of Gesmes/TS to dynamically configure data base tables from Structural Definitions. If a data exchange context is interested only in one stable Key Family, applications may be hard-coded for this particular Key Family. Examples of such applications can be existing XML-enabled databases, which must be configured for each Key Family separately.
Support for xml:lang.
The set of allowed characters in strings. Many current applications only support single byte characters, for example iso-8859-1, "Latin-1".
Cube elements "hidden" in Extension elements. These can make stylesheets considerably more complex and significantly slower.
Allowed prototyping relationships for Structural Definitions.
The use of XPointer in the ref attribute when prototyping.
The use of XML Schema's capabilities to extend data types.

On the other hand, some data exchange contexts may need to exchange more data than specified in the Gesmes/TS-XML proposal. In this case, the extension package or processing instructions can be used.

The use of profiles will highlight the needs of the user community and is expected to fertilise the evolution of Gesmes/TS-XML.

9.1. EDIFACT

Adherence to the requirements of this profile makes it possible to run a mixed system using Gesmes/TS-XML and Gesmes/TS-EDIFACT. Particularly, it enforces that Gesmes/TS-XML transports all information necessary to allow reasonably easy translation into Gesmes/TS-EDIFACT.

9.1.1. Additional information

Gesmes/TS-EDIFACT requires a few pieces of information which normally are not supplied by Gesmes/TS-XML. To transport these, dedicated children of the Extension element are used.

The EDIFACT profile has the namespace URI http://www.gesmes.org/xml/edifact/2002-08-01. This namespace contains the following extension elements: id and conceptType may appear in the Extension element of a Concept; maintenanceAgencyId and keyFamilyId may appear in the first Extension element of an Envelope before any Structural Definitions or Cubes.

The element id is used for Aliases in the unlikely situation when the genuine identifier from Gesmes/TS-EDIFACT is not suitable as an identifier for Gesmes/TS-XML.

The element conceptType, if it appears, may contain either "frequency" or "array cell". The first possibility flags the frequency dimension. The second one flags an attribute where the Values should be transported in the main ARR section rather than in the FNS section of a Gesmes/TS-EDIFACT file. See the example in extension implementation.

Both elements, keyFamilyId and maintenanceAgencyId, have string content; the content is an identifier. This identifier codes a reference to the Key Family used in successive Cube elements or to its maintenance agency, respectively. Note that both information items are redundant in the sense that both can be inferred from the namespace URI of the used Cube elements. However, it may be too difficult for applications to extract this information from this URI. Practical use will show whether these extension elements are actually useful.

<Envelope id="IREF000001" xmlns="http://www.gesmes.org/xml/2002-08-01" xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" xmlns:edifact="http://www.gesmes.org/xml/edifact/2002-08-01">
  <name>Micro reporting June 2002</name>
  <prepared>2002-08-13T17:44+01:00</prepared>
  <Sender id="TT2"/>
  <Receiver id="4F0"/>
  <dataSetId>MICROBOP</dataSetId>
  <action>update</action>
  <Extension>
    <edifact:maintenanceAgencyId>ECB</edifact:maintenanceAgencyId>
    <edifact:keyFamilyId>MICROBOP</edifact:keyFamilyId>
  </Extension>
  <DataSet xmlns="http://www.ecb.int/vocabulary/2002-08-03/microbop">
    ...
  </DataSet>
</Envelope>

Gesmes/TS-EDIFACT specifies an ordering relationship of Concepts assigned as dimensions or array cells in a Key Family. The convention for translation is to use the sequential ordering of the of the Concept elements in the KeyFamily element.

To transport time intervals in a clean way, it is required to express time using two dimensions: One (usually called TIME_PERIOD) with validation restriction to a Time Point Set expresses the position of the interval in time, another (usually called TIME_FORMAT) with validation restriction to a Time Duration Set expresses the duration of the interval.

9.1.2. Compatibility restrictions

Translation from Gesmes/TS-XML to Gesmes/TS-EDIFACT requires consideration of some restrictions.

Gesmes/TS-EDIFACT defines upper bounds for the occurrences of all information items. E.g. there may only be up to three Contacts in the Sender and up to five communication channels for each of them. For details see the Gesmes/TS User Guide.

Generally, Gesmes/TS-EDIFACT accepts only limited string lengths. These vary according to the function of the string, e.g. whether it is an identifier or an attribute Value. For details see again the Gesmes/TS User Guide.

Also note the surprising constraint that Gesmes/TS-EDIFACT cannot transport attribute values with 350 or more consecutive whitespace characters.

Of the Envelope element, the children name, prepared, Sender, and Receiver are required.

There may be either Structural Definition or Cube elements, but not both inside one Envelope.

The Contact function may contain only one of four strings:

Responsible for Information Production
Head of Information Production
Responsible for Data Processing
Head of Data Processing

The Contact URI may only use the mailto: protocol.

9.2. SDMX-ML

For simplified processing of time series data, particularly moderately sized exchanges, the following restrictions are imposed in the SDMX-ML profile:

There are only four types of Cube element: DataSet, Group, Series, and Obs.
The Cube elements allow only the following nesting relationships: DataSet elements can contain Group elements, Group elements can contain Series elements, Series elements can contain Obs elements, and Obs elements are empty.
The DataSet element carries the namespace declaration. The Group element carries attributes for all dimensions which are not related to time and frequency. The Series element carries attributes for the dimensions related to frequency. (Depending on the Key Family this could be FREQ and perhaps TIME_FORMAT.) The Obs element carries attributes for the time dimension.
Inheritance of Values to nested Cubes must not be used for attributes.
The Envelope element must not contain nested Envelopes.

These restrictions are inspired by the findings of the SDMX Case Study Project.

10. Recommended URI structures

The use of URIs is new to Gesmes/TS. Since cool URIs don't change, considerable care should be taken by maintenance agencies and also the Gesmes/TS reference organisation when defining namespace URIs. Having longevity established as the foremost objective when designing URIs, it should also be easy to use for humans. To sum up: The URIs should be logical, independent of organisation structure (prone to change), and easy to recall for humans.

This section proposes a pattern to use when setting up Gesmes/TS-XML maintenance agency sites. If the Gesmes/TS user community accepts these or similar best practices, it may become significantly simpler to install and maintain data exchange contexts.

10.1. Directories

It is assumed that the Gesmes/TS reference organisation and all maintenance agencies own a web-site or at least a URI with a protocol like http that models a directory tree; i.e. https and ftp are fine, mailto will not work that well. All URIs of the organisation are then in its URI tree.

The pattern proposed here to define directory URIs is as follows. The Gesmes/TS reference organisation uses:

`http://www.gesmes.org`	Root URI for the syntax-independent data model.
`http://www.gesmes.org/xml`	Root URI for XML syntax.
`http://www.gesmes.org/xml/edifact`	Root URI for the EDIFACT profile in XML syntax.

All root URIs are then extended by a date to allow versioning. E.g. http://www.gesmes.org/xml/2002-08-01.

Similarly, a maintenance agency with, for example, the homepage URI http://www.sdmx.org, would use:

`http://www.sdmx.org/vocabulary`	Root URI for business vocabulary.
`http://www.sdmx.org/vocabulary/2002-08-02`	Root URI for business vocabulary in a particular version.
`http://www.sdmx.org/vocabulary/2002-08-02/bop`	Root URI for a business term in a particular version. The business terms used in Gesmes/TS-XML are essentially Key Families, but in general also Code Lists, for example, could be exhibited.

10.2. Files

In principle, Gesmes/TS-XML uses a Key Family's URI only to bind the namespace for Cube elements structured according to the Key Family. However, when the URI actually can be dereferenced, this can be exploited further for increased benefit of users of Gesmes/TS-XML. So the URIs dealt with in the section Directories should be populated with files.

The URI of a Key Family can be seen as pointing to the Key Family's homepage. Business terms like Code Lists could get similar URIs, but let us focus on Key Families here. Such a homepage should contain the following files:

(no filename)	Redirect to `index.html`. This would be a starting page to find documentation about the Key Family for humans and/or applications, particularly links to the following files in the directory.
`gesmes.xml`	The Gesmes/TS-XML representation of the Key Family.
`gesmes.html`	For the web-user's convenience, an HTML-representation of the Key Family. It may be the result of simply applying a style sheet to `gesmes.xml` or may have added fancy features.
`gesmes.dtd`	A DTD for Gesmes/TS-XML data files which contain data for the Key Family. Such a DTD is very useful for applications which are only aware of one Key Family. They can use this DTD in the `DOCTYPE` statement to perform standard documentation and validation of data files.
`gesmes.xsd`	An XML Schema for Gesmes/TS-XML data files which contain data for the Key Family. Applications can use this schema to perform standard validation of data files.
`gesmes.xsl`	An XSLT style sheet for Gesmes/TS-XML data files which contain data for the Key Family to convert them to HTML. XML data files could reference this style sheet in order to enable users to view data files with their browser like an HTML page. This could prove very practical. Dynamic linking to structural information is possible.

Look at the exemplary URI

http://www.sdmx.org/vocabulary/2002-08-02/bop/gesmes.xml

Read from left to right, this is SDMX's understanding as of 2 August 2002 of what bop means formulated in terms of the Gesmes/TS data model and in XML syntax. Longevity of the naming schema is achieved by putting the technology dependent part to the far right. If in future a different innovative data format would be used, the URI could become

http://www.sdmx.org/vocabulary/2022-12-24/bop/gosmos3.xom

for instance, and still be consistent. Of course, if the web-site itself changes its address, then too bad.

To aid maintenance agencies, www.gesmes.org as the Gesmes/TS reference organisation would make available in the Gesmes/TS-XML namespace URI on its site a similar directory containing:

(no filename)	Redirect to `index.html`. As a starting page to find documentation about Gesmes/TS-XML for humans and/or applications, particularly links to the following files in the directory.
`gesmes.dtd`	A DTD for Gesmes/TS-XML Structural Definitions files and a parameter entity as place holder for Cubes.
`gesmes.xsd`	An XML Schema for Gesmes/TS-XML Structural Definitions files and also the definition of the basic Cube type(s).
`gesmes.xsl`	An XSLT style sheet to translate Gesmes/TS-XML files to HTML. XML Structural Definitions files could reference this style sheet in order to enable users to view data files with their browser like an HTML page. Also data files could do so, but then the stylesheet for better presentation needs the Structural Definitions and thus would rely on a convention like the one described in this section concerning resources behind namespace URIs. Following this convention, the stylesheet could find what it needs on the maintenance agency's site.
`gesmes2xsd.xsl`	An XSLT style sheet to translate Gesmes/TS-XML Structural Definitions files into an XML Schema like `gesmes.xsd` in maintenance agencies' sites for validating data files.
`gesmes2xsl.xsl`	An XSLT style sheet to translate Gesmes/TS-XML Structural Definitions files into an XSLT style sheet like `gesmes.xsl` in maintenance agencies' sites for efficient viewing of data files.

11. Comparison with other formats

Here the proposed format Gesmes/TS-XML is compared to known formats from similar problem domains.

11.1. Gesmes

The Generic Statistical Message provided the starting point for the development of Gesmes/TS and thus also for Gesmes/TS-XML.

"Plain" Gesmes associated structural definitions with links and expiration dates. In Gesmes/TS-XML, links to other objects can easily be included in an Extension. Versioning of structural definitions is recommended to be done using the pattern proposed as Recommended URI structures.

Plain Gesmes allows data to be accompanied by "ad hoc" structural definitions. In a sense this is similar to an internal DTD within XML documents. However, Gesmes/TS allows to strictly separate operational level and knowledge level and thus does not encourage this possibility. This is more in line with the philosophy of XML Schema. But in principle, the uri of a KeyFamily can match the namespace URI in a Cube element in the same message. Well prepared applications could then grasp that data and structure come together that time. Using an exceptional namespace URI such as an empty one may be a more aggressive indication.

11.2. UTILTS

The Utilities Time Series Message seems not to foresee a multi-dimensional structure which would necessitate a separated knowledge level. On the other hand, a couple of segments and pre-defined codes provide semantics useful for the particular domain.

11.3. MDDL

The Market Data Definition Language does not know anything like structural definitions. It has only one level of data, the meaning of the elements and attributes is fixed by the specification. This makes MDDL less flexible, actually restricts its use to pre-defined domains. On the other hand, MDDL can express more semantic information because the notions are fixed and can be relied upon by applications.

A similarity that Gesmes/TS-XML Cube elements and MDDL share is what MDDL calls property inheritance. An outer product or compound element inherits values to the contained products and compounds. Similarly, in Gesmes/TS-XML an outer Cube element can carry a Value meant for its contained Cube elements.

11.4. XBRL

The Extensible Business Reporting Language has two levels, like Gesmes/TS. In XBRL, a taxonomy collects information about the structure of the transferred data. This is exactly what Gesmes/TS's structural definitions do.

However, XBRL taxonomies come in XML Schema syntax. This implies important differences between XBRL and Gesmes/TS-XML:

XBRL in some respects allows tighter control over the data exchanged. For instance, XML Schema can restrict string formats in terms of regular expressions.
On the downside, XML Schema is difficult to parse for reading applications which, for instance, need to build a column in a database table for each Concept used in a Key Family.
XBRL taxonomies can freely define complex user types. Of course, these types then require specific treatment by the application.
Gesmes/TS-XML structural definitions basically specify multi-dimensional Cube layouts. It is comparatively easy to write generic applications which can handle Cubes of arbitrary data layout. Hence, generic applications can already reach far in processing Gesmes/TS-XML.

XBRL uses XLink to link taxonomies to form a hierarchy of re-use. Such a rich mechanism is not available in Gesmes/TS-XML.

XBRL can express multi-dimensional data using the segment child of item elements. However, XBRL has no property inheritance like MDDL and thus multi-dimensional data tend to be file size monsters.

12. Open Issues and Questions

There are a number of issues to be addressed in future discussion. This document tries to provide a collection of these, possibly with links to the group cbXML.

Explicitly model a new package "edifact" where elements specific to the EDIFACT-environment are defined. Since there is a syntax implementation of this package in EDIFACT, there is naturally also a package in the model. However, the simplicity of the package hopefully does not require an explicit formulation in UML.
Think about a process model. This is anyway a good idea and may lead to some evolution concerning the activity package.
Clarify the default time zone. Is it always sender's local time? Or can it be agreed in the data exchange context? Or should there be some element to set the default local time for a document? Maybe TimePointSet could carry such default information. Actually this would facilitate sampling in the middle of a month and still not state the day for every observation. But what about prepared and other time data.
Should Sender come after Receiver element? This is the way it is done in EDIFACT. It may be easier to read if the sender has many contacts. But less so if there would once be many Receivers.

13. References

This section collects references to some relevant literature.

13.1. ResultDraft5

The UML formulation of Gesmes/TS can be retrieved from cbXML, Circa, or eRoom depending on your membership conditions.

13.2. Gesmes/TS User Guide

At the time of writing, electronic copies of the release 3.0 are available at SDMX and the ECB; gesmes.org holds the previous Gesmes/CB User Guide release 2.0.

14. Change History

This section points out noteworthy changes since the draft version 6.0 of this document.

14.1. Draft 8.0

The data model changed its name from Gesmes/CB to Gesmes/TS since its version 3.00. This change was taken over. What so far was called Gesmes/CB-XML now is Gesmes/TS-XML.
Split the type string into identifier (certain restrictions), string (printable character sequence), and text (possibly multi-lingual bundle). There is also the string type code (values chosen from some fixed finite list).
The terminus "Data Set" replaces "Data Family".
The elements name, department, and function now are required in Contact.
Without effect on the syntax implementation, the activity package is implemented using a class Activity instead of separate classes for its two properties.
Concept assignment no longer uses container elements Dimensions and Attributes. Instead, dedicated Level elements express the dimension dependencies of attributes.
The container elements Codes and Domain were dropped.
The Envelope property "subject" was renamed to "name".
Cubes can be represented by elements with arbitrary names, particularly DataSet, Group, Series, Obs.
The Extension element comes last when it is present. Only the Envelope allows an earlier Extension element for easier translation from and to EDIFACT.
On top of normal reference by identifier (or key), also some very simple XPointer expressions are permitted for the prototyping link in the attribute ref. Compare Use IDREF or XPointer when prototyping.
The element TextFormat was re-structured. It now has only one child: length.
The element usage moved out of the EDIFACT extension to the core namespace.
The element conceptType can also carry the code "frequency".

14.2. Draft 7.0

A normal reference by identifier (or key) is used instead of an XPointer expression for the prototyping link: Attribute refs was replaced by an attribute ref holding an identifier without hash mark. Compare Use IDREF or XPointer when prototyping but note that formally no IDREFs are required since identifiers could be re-used. See Profiles for the possibility to agree on the strict use of IDs and IDREFs; this may improve simplicity and performance.
The element frequency has been renamed to periodicity.