5 Data Structure Definition and Dataset
- Contents
5.1 Introduction
The DataStructureDefiniton is the class name for a structure definition for data. Some organisations know this type of definition as a “Key Family” and so the two names are synonymous. The term Data Structure Definition (also referred to as DSD) is used in this specification.
Many of the constructs in this layer of the model inherit from the SDMX Base Layer. Therefore, it is necessary to study both the inheritance and the relationship diagrams to understand the functionality of individual packages. In simple sub models these are shown in the same diagram, but are omitted from the more complex sub models for the sake of clarity. In these cases, the inheritance diagram below shows the full inheritance tree for the classes concerned with data structure definitions.
There are very few additional classes in this sub model other than those shown in the inheritance diagram below. In other words, the SDMX Base gives most of the structure of this sub model both in terms of associations and in terms of attributes. The relationship diagrams shown in this section show clearly when these associations are inherited from the SDMX Base (see the Appendix “A Short Guide to UML in the SDMX Information Model” to see the diagrammatic notation used to depict this).
The actual SDMX Base construct from which the concrete classes inherit depends upon the requirements of the class for:
- Annotation - AnnotableArtefact
- Identification - IdentifiableArtefact
- Naming - NameableArtefact
- Versioning – VersionableArtefact
- Maintenance - MaintainableArtefact
5.2 Inheritance View
5.2.1 Class Diagram
Figure 22 Class inheritance in the Data Structure Definition and Data Set Packages
5.2.2 Explanation of the Diagram
5.2.2.1 Narrative
Those classes in the SDMX metamodel which require annotations inherit from AnnotableArtefact . These are:
- IdentifiableArtefact
- DataSet (and therefore StructureSpecificDataSet, GenericDataSet, GenericTimeSeriesDataSet StructureSpecificTimeSeriesDataSet)
- Key (and therefore SeriesKey and GroupKey)
Those classes in the SDMX metamodel which require annotations and global identity are derived from IdentifiableArtefact . These are:
- NameableArtefact
- ComponentList
- Component
Those classes in the SDMX metamodel which require annotations, global identity, multilingual name and multilingual description are derived from NameableArtefact . These are:
- VersionableArtefact
- Item
The classes in the SDMX metamodel which require annotations, global identity, multilingual name and multilingual description, and versioning are derived from VersionableArtefact . These are:
- MaintainableArtefact
Abstract classes which represent information that is maintained by Maintenance Agencies all inherit from MaintainableArtefact, they also inherit all the features of a VersionableArtefact, and are:
- StructureUsage
- Structure
- ItemScheme
All the above classes are abstract. The key to understanding the class diagrams presented in this section are the concrete classes that inherit from these abstract classes.
Those concrete classes in the SDMX Data Structure Definition and Dataset packages of the metamodel which require to be maintained by Agencies all inherit (via other abstract classes) from MaintainableArtefact, these are:
- DataflowDefinition
- DataStructureDefinition
The component structures that are lists of lists, inherit directly from Structure. A Structure contains several lists of components. The concrete class that inherits from Structure is:
- DataStructureDefinition
A DataStructureDefinition contains a list of dimensions, a list of measures and a list of attributes.
The concrete classes which inherit from ComponentList and are sub components of the DataStructureDefinition are:
- DimensionDescriptor – content is Dimension, MeasureDimension and Time Dimension
- DimensionGroupDescriptor – content is an association to Dimension, MeasureDimension, TimeDimension
- MeasureDescriptor – content is PrimaryMeasure
- AttributeDescriptor – content is DataAttribute
The classes that inherit from Component are:
- PrimaryMeasure
- DimensionComponent and thereby its sub classes of Dimension, MeasureDimension, and TimeDimension
- DataAttribute
The class that inherit from DataAttribute is:
- ReportingYearStartDay
The concrete classes identified above are the majority of the classes required to define the metamodel for the DataStructureDefinition. The diagrams and explanations in the rest of this section show how these concrete classes are related in order to support the functionality required.
5.3 Data Structure Definition – Relationship View
5.3.1 Class Diagram
Figure 23 Relationship class diagram of the Data Structure Definition excluding representation
5.3.2 Explanation of the Diagrams
5.3.2.1 Narrative
A DataStructureDefinition defines the Dimensions, MeasureDimension, TimeDimension, DataAttributes, and PrimaryMeasure, and associated Representation that comprise the valid structure of data and related attributes that are contained in a DataSet, which is defined by a DataflowDefinition.
The DataflowDefinition may also have additional metadata attached that defines qualitative information and Constraints on the use of the DataStructureDefinition such as the sub set of Codes used in a Dimension (this is covered later in this document – see “Data Constraints and Provisioning” section 9). Each DataflowDefinition has a maximum of one DataStructureDefinition specified which defines the structure of any DataSets to be reported/disseminated.
There are three types of dimension each having a common association to Concept:
- Dimension
- MeasureDimension
- TimeDimension
Note that In the description here DimensionComponent can be oany or all of its sub classes i.e. Dimension, MeasureDimension, TimeDimension., and the term “DataAttribute” refers to both DataAttribute and its sub class ReportingYearStartDate.
The DimensionComponent, DataAttribute, and PrimaryMeasure link to the Concept that defines its name and semantic (/conceptIdentity association to Concept). The DataAttribute, Dimension, and MeasureDimension (but not TimeDimension) can optionally have a +conceptRole association with a Concept that identifies its role in the DataStructureDefinition. Therefore, the allowable roles of a Concept are maintained in a ConceptScheme. Examples of roles are: geography, entity, count, unit of measure. The use of these roles is to enable applications to process the data in a meaningful way (e.g. relating a dimension value to a mapping vector). It is expected that communities (such as the official statistics community) will harmonise these roles with their community so that data can be exchanged and shared in a meaningful way in the community.
The valid values for a DimensionComponent, PrimaryMeasure, or DataAttribute, when used in this DataStructureDefinition, are defined by the Representation. This Representation is taken from the Concept definition (coreRepresentation) unless it is overridden in this DataStructureDefinition (localRepresentation) – see Figure 23. Note that for the MeasureDimension the Representation must be a ConceptScheme and this must always be referenced from the MeasureDimension and cannot therefore be defaulted to the Representation of the Concept associated by the/conceptIdentity. Note also that TimeDimension and ReportingYearStartDate are constrained to specific FacetValueTypes
There will always be a DimensionDescriptor grouping that identifies all of the Dimension comprising the full key. Together the Dimensions specify the key of an Observation.
The DimensionComponent can optionally be grouped by multiple GroupDimensionDescriptors each of which identifies the group of Dimensions that can form a partial key. The GroupDimensionDescriptor must be identified (GroupDimensionDescriptor.id) and this is used in the GroupKey of the DataSet to declare which DataAttributes are reported at this group level in the DataSet.
There may be a maximum of one MeasureDimension specified in the DimensionDescriptor. The purpose of a MeasureDimension is to specify formally the meaning of the measures (because the PrimaryMeasure typically has a generic meaning e.g. observation value) and to enable multiple measures to be defined and reported in a StructureSpecificDataSet. Note that the MeasureDimension references a ConceptScheme as its Representation (see later) whereas a Dimension can have either an enumerated (Codelist) or non-enumerated (Facet) representation. For a MeasureDimension the Concepts in the ConceptScheme comprise the list of allowable measures. This enables the representation for each individual measure (Concept) to be declared as the coreRepresentation of the Concept, thus overriding the Representation specified for the PrimaryMeasure for the observation value of this MeasureDimension Concept.
There can be a maximum of one TimeDimension specified in the DimensionDescriptor. The TimeDimension is used to specify the Concept used to convey the time period of the observation in a data set. The TimeDimension must contain a valid representation of time and cannot be coded
The PrimaryMeasure is the observable phenomenon, and, although there can be only one PrimaryMeasure, for consistency with the ComponentList/Component pattern it is grouped by a MeasureDescriptor.
The DataAttribute defines a characteristic of data that are collected or disseminated and is grouped in the DataStructureDefinition by a single AttributeDescriptor. The DataAttribute can be specified as being mandatory, or conditional, as defined in usageStatus. The DataAttribute may play a specific role in the structure and this is specified by the +role association to the Concept that identifies its role.
A DataAttribute is specified as being +relatedTo an AttributeRelationship which defines the constructs to which the DataAttribute is to be reported present in a DataSet. The DataAttribute can be specified as being related to one of the following artefacts:
- DataSet (NoSpecifiedRelationship)
- Dimension or set of Dimensions (DimensionRelationship)
- Set of Dimensions specified by a GroupKey (GroupRelationship – this is retained for compatibility reasons – or +groupKey of the DimensionRelationship)
- Observation (PrimaryMeasureRelationship)
Figure 24: Attribute Attachment Defined in the Data Structure Definition
The following table details the possible relationships a DataAttribute may specify. Note that these relationships are mutually exclusive, and therefore only one of the following is possible.
Relationship Meaning Location in Data Set at which the Attribute is reported None The value of the attribute does not vary with the values of any other Component. The attribute is reported at the level of the Dataset Attribute. (1..n)
The value of the attribute will vary with the value(s) of the referenced Dimension(s). In this case, Group(s) to which the attribute should be attached may optionally be specified. The attribute is reported at the lowest level of the Dimension to which the Attribute is related, otherwise at the level of the Group if Attachment Group(s) is specified. Group The value of the Attribute varies with combination of values for all of the Dimensions contained in the Group. This is added as a convenience to listing all Dimensions and the attachment Group, but should only be used when the Attribute value varies based on all Group Dimension values.
The attribute is reported at the level of Group. Primary Measure The value of the Attribute varies with the observed value. The attribute is reported at the level of Observation. Figure 25: Representation of DSD Components
Each of Dimension, MeasureDimension, TimeDimension, PrimaryMeasure, and DataAttribute can have a Representation specified (using the localRepresentation association). If this is not specified in the DataStructureDefinition then the representation specified for Concept (coreRepresentation) is used. For the MeasureDimension the representation for the individual measures is specified for the Concept in the ConceptScheme referenced by the MeasureDimension.
A DataStructureDefinition can be extended to form a derived DataStructureDefinition. This is supported in the StructureMap.
5.3.2.2 Definitions
Class Feature Description StructureUsage See “SDMX Base”. DataflowDefinition Inherits from
StructureUsageAbstract concept (i.e. the structure without any data) of a flow of data that providers will provide for different reference periods. /structure Associates a Dataflow Definition to the Data Structure Definition. DataStructureDefinition A collection of metadata concepts, their structure and usage when used to collect or disseminate data. /grouping An association to a set of metadata concepts that have an identified structural role in a Data Structure Definition. Group
DimensionDescriptor
Inherits from
ComponentListA set metadata concepts that define a partial key derived from the Dimension Descriptor in a Data Structure Definition. +constraint Identifies an Attachment Constraint that specifies the sub set of Dimension, Measure, or Attribute values to which an Attribute can be attached. /components An association to the Dimension and Measure Dimension components that comprise the group. DimensionDescriptor Inherits from
ComponentListAn ordered set of metadata concepts that, combined, classify a statistical series, and whose values, when combined (the key) in an instance such as a data set, uniquely identify a specific observation. /components An association to the Dimension, Measure Dimension, and Time Dimension comprising the Key Descriptor. AttributeDescriptor Inherits from
ComponentListA set metadata concepts that define the attributes of a Data Structure Definition. /components An association to a Data Attribute component. MeasureDescriptor Inherits from
ComponentListA metadata concept that defines the measure of a Data Structure Definition. /components An association to a measure component. Dimension Inherits from
ComponentA metadata concept used (most probably together with other metadata concepts) to classify a statistical series, e.g. a statistical concept indicating a certain economic activity or a geographical reference area. /role Association to the Сoncept that specifies the role that that the Dimension plays in the Data Structure Definition. /conceptIdentity An association to the metadata concept which defines the semantic of the Dimension. MeasureDimension Inherits from
A statistical concept that identifies the component in the key structure that has an enumerated list of measures. This dimension has, as its representation the Concept Scheme that enumerates the measure concepts. DataAttribute Inherits from
Component
Sub class
ReportingYear
StartDayA characteristic of an object or entity. /role Association to the Сoncept that specifies the role that that the Data Attribute plays in the Data Structure Definition. usageStatus Defines the usage status which is constrained by the data type Usage Status. +relatedTo Association to a Attribute Relationship. /conceptIdentity An association to the Сoncept which defines the semantic of the component. TimeDimension Inherits from
DimensionA metadata сoncept that identifies the component in the key structure that has the role of “time”. ReportingYearStartDay Inherits from
DataAttributeA specialised Data Attribute whose value is used in conjunction with the predefined reporting periods in the Time Dimension. If this is not present, then by default all reporting period values for the Time Dimension will be assumed to be based on a reporting year start day of January 1. PrimaryMeasure Inherits from
ComponentThe metadata сoncept that is the phenomenon to be measured in a data set. In a data set the instance of the measure is often called the observation. /conceptIdentity An association to the Сoncept which carries the values of the measures. AttributeRelationship Abstract Class
Sub classes
NoSpecified
Relationship
PrimaryMeasure
Relationship
GroupRelationship
Dimension
RelationshipSpecifies the type of artefact to which a Data Attribute can be attached in a Data Set. NoSpecifiedRelationship The Data Attribute is not related to any specific construct. PrimaryMeasure Relationship The Data Attribute is related to the Primary Measure construct. GroupRelationship The Data Attribute is related to a Group Dimension Descriptor construct. +groupKey An association to the Group Dimension Descriptor DimensionRelationship The Data Attribute is related to a set of Dimensions. +dimensions Association to the set of Dimensions to which the Data Attribute is related. +groupKey Association to the Group Dimension Descriptor which specifies the set of Dimensions to which the Data Attribute is attached. The explanation of the classes, attributes, and associations comprising the Representation is described in the section on the SDMX Base.
5.4 Data Set – Relationship View
5.4.1 Context
A data set comprises the collection of data values and associated metadata that are collected or disseminated according to a known DataStructureDefinition.
5.4.2 Class Diagram
Figure 26 Class Diagram of the Data Set
5.4.3 Explanation of the Diagram
5.4.3.1 Narrative – Data Set
Note that the DataSet must conform to the DataStructureDefinition associated to the DataflowDefinition for which this DataSet is an “instance of data”. Whilst the model shows the association to the classes of the DataStructureDefinition, this is for conceptual purposes to show the link to the DataStructureDefinition. In the actual DataSet as exchanged there must, of course, be a reference to the DataStructureDefinition and optionally a DataflowDefinition, but the DataStructureDefinition is not necessarily exchanged with the data. Therefore, the DataStructureDefinition classes are shown in the grey areas, as these are not a part of the DataSet when the DataSet is exchanged. However, the structural metadata in the DataStructureDefinition can be used by an application to validate the contents of the DataSet in terms of the valid content of a KeyValue as defined by the Representation in the DataStructureDefinition.
An organisation playing the role of DataProvider can be responsible for one or more DataSet.
A DataSet can be formatted either as a generic data set (GenericDataSet, GenericTimeseriesDataSet) or a DataStructureDefinition specific data set (StructureSpecificDataSet, StructureSpecificTimeseriesDataSet). The generic data set is structured in exactly the same way no matter which DataStructureDefinition the DataSet expresses. The structured data set is structured according to one specific DataStructureDefinition. Depending on the syntax chosen for the implementation the structured data set should support better validation at the syntax level.
A DataSet is a collection of a set of Observations that share the same dimensionality, which is specified by a set of unique components (Dimension, MeasureDimension, TimeDimension) defined in the DimensionDescriptor of the DataStructureDefinition, together with associated AttributeValues that define specific characteristics about the artefact to which it is attached. - DataSet, Observation, set of Dimensions. It is structured in terms of a SeriesKey to which Observations are reported.
The Observation can be the value of the variable being measured for the Concept associated to the PrimaryMeasure in the MeasureDescriptor of the DataStructureDefinition. This is true when there is no MeasureDimension that specifies the precise meaning of each Observation. Each Observation associates an ObservationValue with a KeyValue (+observationDimension) which is the value for the “Dimension at the Observation Level”. Any dimension can be specified as being the “Dimension at the Observation Level”, and this specification is made at the level of the DataSet (i.e. it must be the same dimension for the entire DataSet).
If the “Dimension at the Observation Level” is the MeasureDimension it is possible (but not mandatory) that an Observation can be reported with an explicit identification of one or more Concept in the ConceptScheme referenced by the MeasureDimension as its Representation. In other words, the actual Concepts are explicitly stated in the Observation.
If it is required to specify explicitly that the DataSet is time series then one of GenericTimeSeriesDataSet or StructureSpecificTimeSeriesDataSet is used and the KeyValue for the +observationDimension must be a TimeKeyValue. In a GenericDataSet and a StructureSpecificDataSet it is permissible to have any dimension as the +observationDimension including the TimeDimension.
The KeyValue is a value for one of MeasureDimension, TimeDimension, or Dimension specified in the DataStructureDefinition. If it is a Dimension it can be coded (CodedKeyValue) or uncoded (UncodedKeyValue). If it is a MeasureDimension then it is MeasureKeyValue. If it is TimeDimension then it is a TimeKeyValue. The actual value that the CodedDimensionValue can take must be one of the Codes in the Codelist specified as the Representation of the Dimension in the DataStructureDefinition. The actual value that the MeasureDimensionValue can take must be a valid representation specified for the Concept in the ConceptScheme to which this MeasureDimensionValue is related (+valueFor).
The ObservationValue can be coded - this is the CodedObservation – or it can be uncoded – this is the UncodedObservation.
The GroupKey is a sub unit of the Key that has the same dimensionality as the SeriesKey, but defines a subset of the KeyValues of the SeriesKey. Its sub dimension structure is defined in the GroupDimensionDescriptor of the DataStructureDefinition identified by the same id as the GroupKey. The id identifies a “type” of group and the purpose of the
GroupKey is to report one or more AttributeValue that are contained at this group level. The GroupKey is present when the GroupDimensionDescriptor is related to the GroupRelationship in the DataStructureDefinition. There can be many types of groups in a DataSet. If the Group is related to the DimensionRelationship in the DataStructureDefinition then the AttributeValue will be reported with the appropriate dimension in the SeriesKey or Observation.
In this way each of DataSet, SeriesKey, GroupKey, and Observation can have zero or more AttributeValue that defines some metadata about the object to which it is associated. The allowable Concepts and the objects to which these metadata can be associated (attached) are defined in the DataStructureDefinition.
The AttributeValue links to the object type (DataSet, SeriesKey, GroupKey, Observation,) to which it is associated.
5.4.3.2 Definitions
Class Feature Description DataSet Abstract Class
Sub classes
GenericDataSet
StructureSpecificDataSet
GenericTime
SeriesDataSet
StructureSpecificTime
SeriesDataSetAn organised collection of data. reportingBegin A specific time period in a known system of time periods that identifies the start period of a report. reportingEnd A specific time period in a known system of time periods that identifies the end period of a report. dataExtractionDate A specific time period that identifies the date and time that the data are extracted from a data source. validFrom Indicates the inclusive start time indicating the validity of the information in the data set. validTo Indicates the inclusive end time indicating the validity of the information in the data set. publicationYear Specifies the year of publication of the data or metadata in terms of whatever provisioning agreements might be in force. publicationPeriod Specifies the period of publication of the data or metadata in terms of whatever provisioning agreements might be in force. setId Provides an identification of the data set. action Defines the action to be taken by the recipient system (update, append, delete) describedBy Associates a data flow definition and thereby a Data Structure Definition to the data set. +structuredBy Associates the Data Structure Definition that defines the structure of the Data Set. Note that the Data Structure Definition is the same as that associated (nonmandatory) to the Dataflow Definition. +publishedBy Associates the Data Provider that reports/publishes the data. +attachedAttribute Association to the Attribute Values relating to the Data Set GenericDataSet A data format structure that is able to contain data corresponding to any Data Structure Definition. StructureSpecific DataSet A data format structure that contains data corresponding to one specific Data Structure Definition. GenericTimeseries DataSet A data format structure that is able to contain timeseries data corresponding to any Data Structure Definition. StructureSpecific
TimeseriesDataSet
A data format structure that contains timeseries data corresponding to one specific Data Structure Definition. Key Abstract class Sub classes
SeriesKey
GroupKeyComprises the cross product of values of dimensions that identify uniquely an Observation. keyValues Association to the individual Key Values that comprise the Key. +attachedAttribute Association to the Attribute Values relating to the Series Key or Group Key. KeyValue Abstract class
Sub classes
MeasureKeyValue
TimeKeyValue
CodedKeyValue
UncodedKeyValueThe value of a component of a key such as the value of the instance a Dimension in a Dimension Descriptor of a Data Structure Definition. +valueFor Association to the key component in the Data Structure Definition for which this Key Value is a valid representation.
Note that this is conceptual association as the key component is identified explicitly in the data set.MeasureKeyValue Inherits from
KeyValueThe value of the Measure Dimension component of the key. The value is the Сoncept to which this class is associated. +value Association to the Сoncept.
Note that this is a conceptual association showing that the Сoncept must exist in the Concept Scheme associated with the Measure Dimension in the Data Structure Definition. In the actual Data Set the value of the Сoncept is placed in the Key Value.TimeKeyValue Inherits from
KeyValueThe value of the Time Dimension component of the key. CodedKeyValue Inherits from
KeyValueThe value of a coded component of the key. The value is the Code to which this class is associated. +value UnCodedKeyValue Inherits from
KeyValueThe value of an uncoded component of the key. value The value of the key component. startTime This attribute is only used if the textFormat of the attribute is of the Timespan type in the Data Structure Definition (in which case the value field takes a duration). +valueFor Associates Dimension, Measure Dimension, or Time Dimension to the Key Value, and thereby to the Сoncept that is the semantic of the Dimension, or Time Dimension. GroupKey Inherits from
Key
A set of Key Values that comprise a partial key, of the same dimensionality as the Time Series Key for the purpose of attaching Data Attributes. +describedBy Associates the Group Dimension Descriptor defined in the Data Structure Definition. SeriesKey Inherits from
KeyComprises the cross product of values of all the Key Values that, together with the Key Value of the +observation Dimension identify uniquely an Observation. +describedBy Associates the Dimension Descriptor defined in the Data Structure Definition. Observation The value of the observed phenomenon in the context of the Key Values comprising the key. +valueFor Associates the Primary Measure defined in the Data Structure Definition. +attachedAttribute Association to the Attribute Values relating to the Observation. +observationDimension Association to the Key Value that holds the value of the “Dimension at the Observation Level”. ObservationValue Abstract class
Sub classes
UncodedObservation
CodedObservationUncodedObservation Inherits from
ObservationValueAn observation that has a text value. value The value of the Uncoded Observation. CodedObservation Inherits from
ObservationValueAn Observation that takes its value from a code in a Code list. +value Association to the Code that is the value of the Observation.
Note that this is a conceptual association showing that the Code must exist in the Code list associated with the Primary Measure or the Сoncept of the Measure Dimension in the Data Structure Definition. In the actual Data Set the value of the Code is placed in the Observation.AttributeValue Abstract class
Sub classes
UncodedAttributeValue
CodedAttributeValueThe value of an attribute, such as the instance of a Coded Attribute or of an Uncoded Attribute in a structure such as a Data Structure Definition. value The value of the attribute. +valueFor Association to the Data Attribute defined in the Data Structure Definition.
Note that this is conceptual association as the Сoncept is identified explicitly in the data set.UncodedAttribute Value Inherits from
AttributeValueAn attribute value that has a text value. startTime This attribute is only used if the textFormat of the attribute is of the Timespan type in the Data Structure Definition (in which case the value field takes a duration). CodedAttribute
Value
Inherits from
AttributeValueAn attribute that takes it value from a Code in Code list. +value Association to the Code that is the value of the Attribute Value.
Note that this is a conceptual association showing that the Code must exist in the Code list associated with the Data Attribute in the Data Structure Definition. In the actual Data Set the value of the Code is placed in the Attribute Value.