5 Data Structure Definition and Dataset

Last modified by Artur on 2025/07/14 10:19

5.1 Introduction

The DataStructureDefiniton is the class name for a structure definition for data. Some organisations know this type of definition as a “Key Family” and so the two names are synonymous. The term Data Structure Definition (also referred to as DSD) is used in this specification.

Many of the constructs in this layer of the model inherit from the SDMX Base Layer. Therefore, it is necessary to study both the inheritance and the relationship diagrams to understand the functionality of individual packages. In simple sub models these are shown in the same diagram, but are omitted from the more complex sub models for the sake of clarity. In these cases, the inheritance diagram below shows the full inheritance tree for the classes concerned with data structure definitions.

There are very few additional classes in this sub model other than those shown in the inheritance diagram below. In other words, the SDMX Base gives most of the structure of this sub model both in terms of associations and in terms of attributes. The relationship diagrams shown in this section show clearly when these associations are inherited from the SDMX Base (see the Appendix “A Short Guide to UML in the SDMX Information Model” to see the diagrammatic notation used to depict this).

The actual SDMX Base construct from which the concrete classes inherit depends upon the requirements of the class for:

  • Annotation - AnnotableArtefact
  • Identification - IdentifiableArtefact
  • Naming - NameableArtefact
  • Versioning – VersionableArtefact
  • Maintenance - MaintainableArtefact

5.2 Inheritance View

5.2.1 Class Diagram

SDMX_2-1_SECTION_2_InformationModel_2020-07_aa1923f1.png

Figure 22 Class inheritance in the Data Structure Definition and Data Set Packages

5.2.2 Explanation of the Diagram

5.2.2.1 Narrative

Those classes in the SDMX metamodel which require annotations inherit from AnnotableArtefact . These are:

  • IdentifiableArtefact
  • DataSet (and therefore StructureSpecificDataSet, GenericDataSet, GenericTimeSeriesDataSet StructureSpecificTimeSeriesDataSet)
  • Key (and therefore SeriesKey and GroupKey)

Those classes in the SDMX metamodel which require annotations and global identity are derived from IdentifiableArtefact . These are:

Those classes in the SDMX metamodel which require annotations, global identity, multilingual name and multilingual description are derived from NameableArtefact . These are:

  • VersionableArtefact
  • Item

The classes in the SDMX metamodel which require annotations, global identity, multilingual name and multilingual description, and versioning are derived from VersionableArtefact . These are:

  • MaintainableArtefact

Abstract classes which represent information that is maintained by Maintenance Agencies all inherit from MaintainableArtefact, they also inherit all the features of a VersionableArtefact, and are:

  • StructureUsage
  • Structure
  • ItemScheme

All the above classes are abstract. The key to understanding the class diagrams presented in this section are the concrete classes that inherit from these abstract classes.

Those concrete classes in the SDMX Data Structure Definition and Dataset packages of the metamodel which require to be maintained by Agencies all inherit (via other abstract classes) from MaintainableArtefact, these are:

  • DataflowDefinition
  • DataStructureDefinition

The component structures that are lists of lists, inherit directly from Structure. A Structure contains several lists of components. The concrete class that inherits from Structure is:

  • DataStructureDefinition

A DataStructureDefinition contains a list of dimensions, a list of measures and a list of attributes.

The concrete classes which inherit from ComponentList and are sub components of the DataStructureDefinition are:

  • DimensionDescriptor – content is Dimension, MeasureDimension and Time Dimension
  • DimensionGroupDescriptor – content is an association to Dimension, MeasureDimension, TimeDimension
  • MeasureDescriptor – content is PrimaryMeasure
  • AttributeDescriptor – content is DataAttribute

The classes that inherit from Component are:

  • PrimaryMeasure
  • DimensionComponent and thereby its sub classes of Dimension, MeasureDimension, and TimeDimension
  • DataAttribute

The class that inherit from DataAttribute is:

  • ReportingYearStartDay

The concrete classes identified above are the majority of the classes required to define the metamodel for the DataStructureDefinition. The diagrams and explanations in the rest of this section show how these concrete classes are related in order to support the functionality required.

5.3 Data Structure Definition – Relationship View

5.3.1 Class Diagram

1747897165713-725.png

Figure 23 Relationship class diagram of the Data Structure Definition excluding representation

5.3.2 Explanation of the Diagrams

5.3.2.1 Narrative

A DataStructureDefinition defines the Dimensions, MeasureDimension, TimeDimension, DataAttributes, and PrimaryMeasure, and associated Representation that comprise the valid structure of data and related attributes that are contained in a DataSet, which is defined by a DataflowDefinition.

The DataflowDefinition may also have additional metadata attached that defines qualitative information and Constraints on the use of the DataStructureDefinition such as the sub set of Codes used in a Dimension (this is covered later in this document – see “Data Constraints and Provisioning” section 9). Each DataflowDefinition has a maximum of one DataStructureDefinition specified which defines the structure of any DataSets to be reported/disseminated.

There are three types of dimension each having a common association to Concept:

Note that In the description here DimensionComponent can be oany or all of its sub classes i.e. Dimension, MeasureDimension, TimeDimension., and the term “DataAttribute” refers to both DataAttribute and its sub class ReportingYearStartDate.

The DimensionComponent, DataAttribute, and PrimaryMeasure link to the Concept that defines its name and semantic (/conceptIdentity association to Concept). The DataAttribute, Dimension, and MeasureDimension (but not TimeDimension) can optionally have a +conceptRole association with a Concept that identifies its role in the DataStructureDefinition. Therefore, the allowable roles of a Concept are maintained in a ConceptScheme. Examples of roles are: geography, entity, count, unit of measure. The use of these roles is to enable applications to process the data in a meaningful way (e.g. relating a dimension value to a mapping vector). It is expected that communities (such as the official statistics community) will harmonise these roles with their community so that data can be exchanged and shared in a meaningful way in the community.

The valid values for a DimensionComponent, PrimaryMeasure, or DataAttribute, when used in this DataStructureDefinition, are defined by the Representation. This Representation is taken from the Concept definition (coreRepresentation) unless it is overridden in this DataStructureDefinition (localRepresentation) – see Figure 23. Note that for the MeasureDimension the Representation must be a ConceptScheme and this must always be referenced from the MeasureDimension and cannot therefore be defaulted to the Representation of the Concept associated by the/conceptIdentity. Note also that TimeDimension and ReportingYearStartDate are constrained to specific FacetValueTypes

There will always be a DimensionDescriptor grouping that identifies all of the Dimension comprising the full key. Together the Dimensions specify the key of an Observation.

The DimensionComponent can optionally be grouped by multiple GroupDimensionDescriptors each of which identifies the group of Dimensions that can form a partial key. The GroupDimensionDescriptor must be identified (GroupDimensionDescriptor.id) and this is used in the GroupKey of the DataSet to declare which DataAttributes are reported at this group level in the DataSet.

There may be a maximum of one MeasureDimension specified in the DimensionDescriptor. The purpose of a MeasureDimension is to specify formally the meaning of the measures (because the PrimaryMeasure typically has a generic meaning e.g. observation value) and to enable multiple measures to be defined and reported in a StructureSpecificDataSet. Note that the MeasureDimension references a ConceptScheme as its Representation (see later) whereas a Dimension can have either an enumerated (Codelist) or non-enumerated (Facet) representation. For a MeasureDimension the Concepts in the ConceptScheme comprise the list of allowable measures. This enables the representation for each individual measure (Concept) to be declared as the coreRepresentation of the Concept, thus overriding the Representation specified for the PrimaryMeasure for the observation value of this MeasureDimension Concept.

There can be a maximum of one TimeDimension specified in the DimensionDescriptor. The TimeDimension is used to specify the Concept used to convey the time period of the observation in a data set. The TimeDimension must contain a valid representation of time and cannot be coded

The PrimaryMeasure is the observable phenomenon, and, although there can be only one PrimaryMeasure, for consistency with the ComponentList/Component pattern it is grouped by a MeasureDescriptor.

The DataAttribute defines a characteristic of data that are collected or disseminated and is grouped in the DataStructureDefinition by a single AttributeDescriptor. The DataAttribute can be specified as being mandatory, or conditional, as defined in usageStatus. The DataAttribute may play a specific role in the structure and this is specified by the +role association to the Concept that identifies its role.

A DataAttribute is specified as being +relatedTo an AttributeRelationship which defines the constructs to which the DataAttribute is to be reported present in a DataSet. The DataAttribute can be specified as being related to one of the following artefacts:

  • DataSet (NoSpecifiedRelationship)
  • Dimension or set of Dimensions (DimensionRelationship)
  • Set of Dimensions specified by a GroupKey (GroupRelationship – this is retained for compatibility reasons – or +groupKey of the DimensionRelationship)
  • Observation (PrimaryMeasureRelationship)

1747896950895-141.png

Figure 24: Attribute Attachment Defined in the Data Structure Definition

The following table details the possible relationships a DataAttribute may specify. Note that these relationships are mutually exclusive, and therefore only one of the following is possible.

RelationshipMeaningLocation in Data Set at which the Attribute is reported
NoneThe value of the attribute does not vary with the values of any other Component.The attribute is reported at the level of the Dataset Attribute.

Dimension

(1..n)

The value of the attribute will vary with the value(s) of the referenced Dimension(s). In this case, Group(s) to which the attribute should be attached may optionally be specified.The attribute is reported at the lowest level of the Dimension to which the Attribute is related, otherwise at the level of the Group if Attachment Group(s) is specified.
Group

The value of the Attribute varies with combination of values for all of the Dimensions contained in the Group. This is added as a convenience to listing all Dimensions and the attachment Group, but should only be used when the Attribute value varies based on all Group Dimension values.

The attribute is reported at the level of Group.
Primary MeasureThe value of the Attribute varies with the observed value.The attribute is reported at the level of Observation.

Figure 25: Representation of DSD Components

Each of Dimension, MeasureDimension, TimeDimension, PrimaryMeasure, and DataAttribute can have a Representation specified (using the localRepresentation association). If this is not specified in the DataStructureDefinition then the representation specified for Concept (coreRepresentation) is used. For the MeasureDimension the representation for the individual measures is specified for the Concept in the ConceptScheme referenced by the MeasureDimension.

A DataStructureDefinition can be extended to form a derived DataStructureDefinition. This is supported in the StructureMap.

5.3.2.2 Definitions

ClassFeatureDescription
StructureUsage See “SDMX Base”.
DataflowDefinition

Inherits from
StructureUsage

Abstract concept (i.e. the structure without any data) of a flow of data that providers will provide for different reference periods.
 /structureAssociates a Dataflow Definition to the Data Structure Definition.
DataStructureDefinition A collection of metadata concepts, their structure and usage when used to collect or disseminate data.
 /groupingAn association to a set of metadata concepts that have an identified structural role in a Data Structure Definition.

Group

DimensionDescriptor

Inherits from
ComponentList

A set metadata concepts that define a partial key derived from the Dimension Descriptor in a Data Structure Definition.
 +constraintIdentifies an Attachment Constraint that specifies the sub set of Dimension, Measure, or Attribute values to which an Attribute can be attached.
 /componentsAn association to the Dimension and Measure Dimension components that comprise the group.
DimensionDescriptor

Inherits from
ComponentList

An ordered set of metadata concepts that, combined, classify a statistical series, and whose values, when combined (the key) in an instance such as a data set, uniquely identify a specific observation.
 /componentsAn association to the Dimension, Measure Dimension, and Time Dimension comprising the Key Descriptor.
AttributeDescriptor

Inherits from
ComponentList

A set metadata concepts that define the attributes of a Data Structure Definition.
 /componentsAn association to a Data Attribute component.
MeasureDescriptor

Inherits from
ComponentList

A metadata concept that defines the measure of a Data Structure Definition.
 /componentsAn association to a measure component.
Dimension

Inherits from
Component

A metadata concept used (most probably together with other metadata concepts) to classify a statistical series, e.g. a statistical concept indicating a certain economic activity or a geographical reference area.
 /roleAssociation to the Сoncept that specifies the role that that the Dimension plays in the Data Structure Definition.
 /conceptIdentityAn association to the metadata concept which defines the semantic of the Dimension.
MeasureDimension

Inherits from

Dimension

A statistical concept that identifies the component in the key structure that has an enumerated list of measures. This dimension has, as its representation the Concept Scheme that enumerates the measure concepts.
DataAttribute

Inherits from
Component
Sub class
ReportingYear
StartDay

A characteristic of an object or entity.
 /roleAssociation to the Сoncept that specifies the role that that the Data Attribute plays in the Data Structure Definition.
 usageStatusDefines the usage status which is constrained by the data type Usage Status.
 +relatedToAssociation to a Attribute Relationship.
 /conceptIdentityAn association to the Сoncept which defines the semantic of the component.
TimeDimension

Inherits from
Dimension

A metadata сoncept that identifies the component in the key structure that has the role of “time”.
ReportingYearStartDay

Inherits from
DataAttribute

A specialised Data Attribute whose value is used in conjunction with the predefined reporting periods in the Time Dimension. If this is not present, then by default all reporting period values for the Time Dimension will be assumed to be based on a reporting year start day of January 1.
PrimaryMeasure

Inherits from
Component

The metadata сoncept that is the phenomenon to be measured in a data set. In a data set the instance of the measure is often called the observation.
 /conceptIdentityAn association to the Сoncept which carries the values of the measures.
AttributeRelationship

Abstract Class
Sub classes
NoSpecified
Relationship
PrimaryMeasure
Relationship
GroupRelationship
Dimension
Relationship

Specifies the type of artefact to which a Data Attribute can be attached in a Data Set.
NoSpecifiedRelationship The Data Attribute is not related to any specific construct.
PrimaryMeasure Relationship The Data Attribute is related to the Primary Measure construct.
GroupRelationship The Data Attribute is related to a Group Dimension Descriptor construct.
 +groupKeyAn association to the Group Dimension Descriptor
DimensionRelationship The Data Attribute is related to a set of Dimensions.
 +dimensionsAssociation to the set of Dimensions to which the Data Attribute is related.
 +groupKeyAssociation to the Group Dimension Descriptor which specifies the set of Dimensions to which the Data Attribute is attached.

The explanation of the classes, attributes, and associations comprising the Representation is described in the section on the SDMX Base.

5.4 Data Set – Relationship View

5.4.1 Context

A data set comprises the collection of data values and associated metadata that are collected or disseminated according to a known DataStructureDefinition.

5.4.2 Class Diagram

1747897072125-219.png

Figure 26 Class Diagram of the Data Set

5.4.3 Explanation of the Diagram

5.4.3.1 Narrative – Data Set

Note that the DataSet must conform to the DataStructureDefinition associated to the DataflowDefinition for which this DataSet is an “instance of data”. Whilst the model shows the association to the classes of the DataStructureDefinition, this is for conceptual purposes to show the link to the DataStructureDefinition. In the actual DataSet as exchanged there must, of course, be a reference to the DataStructureDefinition and optionally a DataflowDefinition, but the DataStructureDefinition is not necessarily exchanged with the data. Therefore, the DataStructureDefinition classes are shown in the grey areas, as these are not a part of the DataSet when the DataSet is exchanged. However, the structural metadata in the DataStructureDefinition can be used by an application to validate the contents of the DataSet in terms of the valid content of a KeyValue as defined by the Representation in the DataStructureDefinition.

An organisation playing the role of DataProvider can be responsible for one or more DataSet.

A DataSet can be formatted either as a generic data set (GenericDataSet, GenericTimeseriesDataSet) or a DataStructureDefinition specific data set (StructureSpecificDataSet, StructureSpecificTimeseriesDataSet). The generic data set is structured in exactly the same way no matter which DataStructureDefinition the DataSet expresses. The structured data set is structured according to one specific DataStructureDefinition. Depending on the syntax chosen for the implementation the structured data set should support better validation at the syntax level.

A DataSet is a collection of a set of Observations that share the same dimensionality, which is specified by a set of unique components (Dimension, MeasureDimension, TimeDimension) defined in the DimensionDescriptor of the DataStructureDefinition, together with associated AttributeValues that define specific characteristics about the artefact to which it is attached. - DataSet, Observation, set of Dimensions. It is structured in terms of a SeriesKey to which Observations are reported.

The Observation can be the value of the variable being measured for the Concept associated to the PrimaryMeasure in the MeasureDescriptor of the DataStructureDefinition. This is true when there is no MeasureDimension that specifies the precise meaning of each Observation. Each Observation associates an ObservationValue with a KeyValue (+observationDimension) which is the value for the “Dimension at the Observation Level”. Any dimension can be specified as being the “Dimension at the Observation Level”, and this specification is made at the level of the DataSet (i.e. it must be the same dimension for the entire DataSet).

If the “Dimension at the Observation Level” is the MeasureDimension it is possible (but not mandatory) that an Observation can be reported with an explicit identification of one or more Concept in the ConceptScheme referenced by the MeasureDimension as its Representation. In other words, the actual Concepts are explicitly stated in the Observation.

If it is required to specify explicitly that the DataSet is time series then one of GenericTimeSeriesDataSet or StructureSpecificTimeSeriesDataSet is used and the KeyValue for the +observationDimension must be a TimeKeyValue. In a GenericDataSet and a StructureSpecificDataSet it is permissible to have any dimension as the +observationDimension including the TimeDimension.

The KeyValue is a value for one of MeasureDimension, TimeDimension, or Dimension specified in the DataStructureDefinition. If it is a Dimension it can be coded (CodedKeyValue) or uncoded (UncodedKeyValue). If it is a MeasureDimension then it is MeasureKeyValue. If it is TimeDimension then it is a TimeKeyValue. The actual value that the CodedDimensionValue can take must be one of the Codes in the Codelist specified as the Representation of the Dimension in the DataStructureDefinition. The actual value that the MeasureDimensionValue can take must be a valid representation specified for the Concept in the ConceptScheme to which this MeasureDimensionValue is related (+valueFor).

The ObservationValue can be coded - this is the CodedObservation – or it can be uncoded – this is the UncodedObservation.

The GroupKey is a sub unit of the Key that has the same dimensionality as the SeriesKey, but defines a subset of the KeyValues of the SeriesKey. Its sub dimension structure is defined in the GroupDimensionDescriptor of the DataStructureDefinition identified by the same id as the GroupKey. The id identifies a “type” of group and the purpose of the

GroupKey is to report one or more AttributeValue that are contained at this group level. The GroupKey is present when the GroupDimensionDescriptor is related to the GroupRelationship in the DataStructureDefinition. There can be many types of groups in a DataSet. If the Group is related to the DimensionRelationship in the DataStructureDefinition then the AttributeValue will be reported with the appropriate dimension in the SeriesKey or Observation.

In this way each of DataSet, SeriesKey, GroupKey, and Observation can have zero or more AttributeValue that defines some metadata about the object to which it is associated. The allowable Concepts and the objects to which these metadata can be associated (attached) are defined in the DataStructureDefinition.

The AttributeValue links to the object type (DataSet, SeriesKey, GroupKey, Observation,) to which it is associated.

5.4.3.2 Definitions

ClassFeatureDescription
DataSet

Abstract Class
Sub classes
GenericDataSet
StructureSpecificDataSet
GenericTime
SeriesDataSet
StructureSpecificTime
SeriesDataSet

An organised collection of data.
 reportingBeginA specific time period in a known system of time periods that identifies the start period of a report.
 reportingEndA specific time period in a known system of time periods that identifies the end period of a report.
 dataExtractionDateA specific time period that identifies the date and time that the data are extracted from a data source.
 validFromIndicates the inclusive start time indicating the validity of the information in the data set.
 validToIndicates the inclusive end time indicating the validity of the information in the data set.
 publicationYearSpecifies the year of publication of the data or metadata in terms of whatever provisioning agreements might be in force.
 publicationPeriodSpecifies the period of publication of the data or metadata in terms of whatever provisioning agreements might be in force.
 setIdProvides an identification of the data set.
 actionDefines the action to be taken by the recipient system (update, append, delete)
 describedByAssociates a data flow definition and thereby a Data Structure Definition to the data set.
 +structuredByAssociates the Data Structure Definition that defines the structure of the Data Set. Note that the Data Structure Definition is the same as that associated (nonmandatory) to the Dataflow Definition.
 +publishedByAssociates the Data Provider that reports/publishes the data.
 +attachedAttributeAssociation to the Attribute Values relating to the Data Set
GenericDataSet A data format structure that is able to contain data corresponding to any Data Structure Definition.
StructureSpecific DataSet A data format structure that contains data corresponding to one specific Data Structure Definition.
GenericTimeseries DataSet A data format structure that is able to contain timeseries data corresponding to any Data Structure Definition.

StructureSpecific

TimeseriesDataSet

 A data format structure that contains timeseries data corresponding to one specific Data Structure Definition.
Key

Abstract class Sub classes
SeriesKey
GroupKey

Comprises the cross product of values of dimensions that identify uniquely an Observation.
 keyValuesAssociation to the individual Key Values that comprise the Key.
 +attachedAttributeAssociation to the Attribute Values relating to the Series Key or Group Key.
KeyValue

Abstract class
Sub classes
MeasureKeyValue
TimeKeyValue
CodedKeyValue
UncodedKeyValue

The value of a component of a key such as the value of the instance a Dimension in a Dimension Descriptor of a Data Structure Definition.
 +valueFor

Association to the key component in the Data Structure Definition for which this Key Value is a valid representation.
Note that this is conceptual association as the key component is identified explicitly in the data set.

MeasureKeyValue

Inherits from
KeyValue

The value of the Measure Dimension component of the key. The value is the Сoncept to which this class is associated.
 +value

Association to the Сoncept.
Note that this is a conceptual association showing that the Сoncept must exist in the Concept Scheme associated with the Measure Dimension in the Data Structure Definition. In the actual Data Set the value of the Сoncept is placed in the Key Value.

TimeKeyValue

Inherits from
KeyValue

The value of the Time Dimension component of the key.
CodedKeyValue

Inherits from
KeyValue

The value of a coded component of the key. The value is the Code to which this class is associated.
 +value

Association to the Code.
Note that this is a conceptual association showing that the Code must exist in the Code list associated with the Dimension in the Data Structure Definition. In the actual Data Set the value of the Code is placed in the Key Value.

UnCodedKeyValue

Inherits from
KeyValue

The value of an uncoded component of the key.
 valueThe value of the key component.
 startTimeThis attribute is only used if the textFormat of the attribute is of the Timespan type in the Data Structure Definition (in which case the value field takes a duration).
 +valueForAssociates Dimension, Measure Dimension, or Time Dimension to the Key Value, and thereby to the Сoncept that is the semantic of the Dimension, or Time Dimension.
GroupKey

Inherits from

Key

A set of Key Values that comprise a partial key, of the same dimensionality as the Time Series Key for the purpose of attaching Data Attributes.
 +describedByAssociates the Group Dimension Descriptor defined in the Data Structure Definition.
SeriesKey

Inherits from
Key

Comprises the cross product of values of all the Key Values that, together with the Key Value of the +observation Dimension identify uniquely an Observation.
 +describedByAssociates the Dimension Descriptor defined in the Data Structure Definition.
Observation The value of the observed phenomenon in the context of the Key Values comprising the key.
 +valueForAssociates the Primary Measure defined in the Data Structure Definition.
 +attachedAttributeAssociation to the Attribute Values relating to the Observation.
 +observationDimensionAssociation to the Key Value that holds the value of the “Dimension at the Observation Level”.
ObservationValue

Abstract class
Sub classes
UncodedObservation
CodedObservation

 
UncodedObservation

Inherits from
ObservationValue

An observation that has a text value.
 valueThe value of the Uncoded Observation.
CodedObservation

Inherits from
ObservationValue

An Observation that takes its value from a code in a Code list.
 +value

Association to the Code that is the value of the Observation.
Note that this is a conceptual association showing that the Code must exist in the Code list associated with the Primary Measure or the Сoncept of the Measure Dimension in the Data Structure Definition. In the actual Data Set the value of the Code is placed in the Observation.

AttributeValue

Abstract class
Sub classes
UncodedAttributeValue
CodedAttributeValue

The value of an attribute, such as the instance of a Coded Attribute or of an Uncoded Attribute in a structure such as a Data Structure Definition.
 valueThe value of the attribute.
 +valueFor

Association to the Data Attribute defined in the Data Structure Definition.
Note that this is conceptual association as the Сoncept is identified explicitly in the data set.

UncodedAttribute Value

Inherits from
AttributeValue

An attribute value that has a text value.
 startTimeThis attribute is only used if the textFormat of the attribute is of the Timespan type in the Data Structure Definition (in which case the value field takes a duration).

CodedAttribute

Value

Inherits from
AttributeValue

An attribute that takes it value from a Code in Code list.
 +value

Association to the Code that is the value of the Attribute Value.
Note that this is a conceptual association showing that the Code must exist in the Code list associated with the Data Attribute in the Data Structure Definition. In the actual Data Set the value of the Code is placed in the Attribute Value.