10 Constraints
- Contents
10.1 Introduction
Constraints are used as a way to restrict what data can be reported, or to report what data exists in a given context. There are three types of Constraint, which serve different purposes
- Availability Constraint
- Dimension Constraint
- Reporting Constraints
An Availability Constraint defines the data that exists in the context of a data query.
They form part of the response message from the Availability REST API. Availability Constraints are dynamically generated by a system based on the data that exists and the query context. Availability Constraints are therefore not Identifiable structures (they have no URN).
A Dimension Constraint is a property of a Dataflow, they are used to fix the Dimensions that they use in the Data Structure Definition which they use. Dimension Constraints enable Data Structure Definitions to evolve over time by having new Dimensions added, without having to undergo a major version change.
A Reporting Constraint is used to define the set of allowed and/or disallowed values that can be reported in a data or metadata set.
10.2 Availability Constraint
An Availability Constraint is not a maintained structure, instead it is generated dynamically as a response to the availability REST API. The purpose of the Availability Constraint is to define the distinct set of values that have data over 1 or more Dimensions. Unlike a Data and Metadata Constraint, which can attach to multiple Constrainable structures (of the same type), an Availability Constraint can only attach to only one structure. The attachment defines the context of the response (data exists for components in the context of). The subset of Constrainable structures the Availability Constraint can attach to are:
- Data Structure Definition
- Dataflow
- Provision Agreement
10.3 Dimension Constraint
A Dimension Constraint is a property of a Dataflow; its purpose is to explicitly list the Dimensions from the corresponding DSD that are being used by the Dataflow.
Dimension Constraints were introduced in SDMX 3.1 and are not required for most Dataflows where the dataset must always contain the full complement of Dimensions as defined by the corresponding DSD. However, for some complex data collections, which may span long periods and where the full complement of required Dimensions are not necessarily known at design time, the DSD is subject to increasing its Dimensionality over time. In this scenario it is possible to define the DSD as an evolving structure, this property tells the user that the DSD can have new Dimensions added without having to undergo a major version change; a DSD at version 1.0.0 for example would be able to add a new Dimension and move to version 1.1.0; a change that would not ordinarily be allowed. A minor version change on the addition of a new Dimension is only possible if the DSD defines itself as an evolving structure. This is a new property of the DSD introduced in version 3.1 to satisfy this use case. The evolving structure property is either true or false, defaulting to false if not specified. Setting the evolving structure property to true requires a major version change, and therefore can only be introduced on an x.0.0 release (e.g. 1.0.0). The evolving structure property can be set to false to indicate that there will be no additional Dimensions added to the Data Structure under the same major version number; setting the evolving structure property to false does not require require a major version change on the Data Structure.
When a Dataflow references a DSD, late binding on the minor release, and the DSD has the evolving structure property set to true, then the Dataflow must contain a Dimension Constraint to protect its Dimensionality from changing over time without a version change.
The Dimension Constraint provides the explicit list of Dimensions that the Dataflow uses from the DSD that it references. This enables the DSD to evolve over time without breaking the compatibility of datasets against the Dataflow.
Rules for a Dimension Constraint
- A Dataflow must contain a Dimension Constraint if the DSD which it uses states that it is an evolving structure and the Dataflow is late binding on the minor release (latest minor release of a given major version, e.g. 1.0+.0)
- The Dimension Constraint can only include Dimensions from the DSD that is referenced by the Dataflow.
- A Dimension Constraint can only be changed if the Dataflow undergoes a major version change
- Datasets reported against the Dataflow must only contain reported values for the Dimensions specified in the Dimension Constraint.
- When exporting data for the Dataflow, the dataset should only include the Dimensions specified by the Dimension Constraint.
- When exporting data for the DSD the dataset must contain the full set of Dimensions as specified by the DSD. The tilde ‘’ character is used to represent a value which is not present due to the Dimension not being included in the corresponding Dataflow.
Example Datasets with Evolving Structures
A dataset is built against a Data Structure Definition. The dataset contains data for two Dataflows. Dataflows ‘DF_POP’ uses a Dimension Constraint which fixes its Dimensions to FREQ and REF_AREA. Dataflow ‘DF_POP_SA’ does not reference a Dimension Constraint, and as such includes all Dimensions as specified by the DSD.
The resulting dataset contains values ‘’ for both the SEX and AGE Dimension for the series related to DF_POP.
Dataflow FREQ REF_AREA SEX AGE OBS_VALUE TIME_PERIOD UNIT DF_POP A UK 65 2022 6 DF_POP A FR 50 2022 6 DF_POP_SA A UK M 1 1.2 2022 6 10.4 Reporting Constraints
A Reporting Constraint is a Maintainable Artefact which restricts the values that can be reported in a dataset or metadata set based on one or more inclusion or exclusion rules.
A reporting constraint is one of the following concrete types:
- Data Constraint
- Metadata Constraint
10.4.1 Data Constraint
A Data Constraint is used to add additional restrictions to the allowable values reported in a dataset. Data Constraints can be applied to the follow structures which are collectively known as Constrainable structures:
- Data Structure Definition
- Dataflow
- Provision Agreement
- Data Provider
Note regardless of the Constrainable structure, the restricted values relate to the allowable content for the Component of the DSD to which the constrained object relates.
10.4.2 Metadata Constraint
A Metadata Constraint is used to add additional restrictions to the allowable values reported in a metadataset. Metadata Constraints can be applied to the follow structures which are collectively known as Constrainable structures:
- Metadata Structure Definition
- Metadataflow
- Metadata Provision Agreement
- Metadata Provider
Note regardless of the Constrainable structure, the restricted values relate to the allowable content for the Component of the MSD to which the constrained object relates.
10.4.3 Scope of a Constraint
A Constraint is used specify the content of a data or metadata source in terms of the component values or the keys.
In terms of data the components are:
- Dimension
- Time Dimension
- Data Attribute
- Measure
- Metadata Attribute
- DataKeySets: the keys are the content of the KeyDescriptor – i.e., the series keys composed, for each key, by a value for each Dimension.
In terms of reference metadata the components are:
- Metadata Attribute
For a Constraint based on a DSD the Constraint can reference one or more of:
- Data Structure Definition
- Dataflow
- Provision Agreement
- Data Provider
For a Constraint based on an MSD the Constraint can reference one or more of:
- Metadata Structure Definition
- Metadataflow
- Metadata Provision Agreement
- Metadata Provider
- Metadata Set
Furthermore, there can be more than one Constraint specified for a specific object e.g., more than one Constraint for a specific DSD.
In view of the flexibility of constraints attachment, clear rules on their usage are required. These are elaborated below.
10.4.4 Multiple Constraints
There can be many Constraints for any Constrainable Artefact (e.g., DSD), subject to the following restrictions:
10.4.4.1 Cube Region
A Constraint can contain multiple Member Selections (e.g., Dimensions).
- A specific Member Selection (e.g., Dimension FREQ) can only be contained in one Cube Region for any one attached object (e.g., a specific DSD or specific Dataflow).
- Component values within a Member Selection may define a validity period. Otherwise, the value is valid for the whole validity of the Cube Region.
- For partial reference resolution purposes (as per the SDMX REST API), the latest non-draft Constraint must be considered.
- A Member Selection may include wildcarding of values (using character ‘%’ to represent zero or more occurrences of any character), as well as cascading through hierarchic structures (e.g., parents in Codelist), or localised values (e.g., text for English only). Lack of locale means any language may match. Cascading values are mutual exclusive to localised values, as the former refer to coded values, while the latter refer to uncoded values.
- Any values included in a Member Selection for Components with an array data type (i.e., Measures, Attributes or Metadata Attributes), will be applied as single values and will not be assessed combined with other values to match all possible array values. For example, including the Code ‘A’ for an Attribute will allow any instance of the Attribute that includes ‘A’, like [‘A’, ‘B’] or [‘A’, ‘C’, ‘D’]. Similarly, if Code ‘A’ was excluded, all those arrays of values would also be excluded.
10.4.4.2 Key Set
Key Sets will be processed in the order they appear in the Constraint and wildcards can be used (e.g., any key position not reference explicitly is deemed to be "all values").
As the Key Sets can be "included" or "excluded" it is recommended that Key Sets with wildcards are declared before KeySets with specific series keys. This will minimize the risk that keys are inadvertently included or excluded.
In addition, Attribute, Measure and Metadata Attribute constraints may accompany KeySets, in order to specify the allowed values per Key. Those are expressed following the rules for Cube Regions, as explained above.
Finally, a validity period may be specified per Key.
10.4.4 Versioning
When Data and Metadata Constraints are versioned, the latest version of the Constraint is used to generate the reporting restriction rules; all previous versions are for historical information only.
If restrictions are applicable to certain periods in time, the validFrom and validTo properties can be set on the specific values. This allows Constraints to evolve over time, increasing their version number as they do so, whilst being able to maintain a complete set of reporting restrictions for current and past datasets.
Example:
Data Constraint 1.0.0
Component Valid Value Valid from Valid to COUNTRY UK FR DE Data Constraint 1.1.0
Component Valid Value Valid from Valid to COUNTRY UK FR 2012 DE When both versions of the Data Constraint are in a system, an observation value reported against COUNTRY FR for time period 2013 would be deemed invalid as the 1.1.0 rule would be applied.
10.4.6 Inheritance
10.4.6.1 Attachment levels of a Constraint
There are three levels of constraint attachment for which these inheritance rules apply:
- DSD/MSD – top level
- Dataflow/Metadataflow – second level
- Provision Agreement – third level
- Dataflow/Metadataflow – second level
It is not necessary for a Constraint to be attached to a higher level artefact. e.g., it is valid to have a Constraint for a Provision Agreement where there are no constraints attached the relevant Dataflow or DSD.
10.4.6.2 Cascade rules for processing Constraints
The processing of the constraints on either Dataflow/Metadataflow or Provision Agreement must take into account the constraints declared at higher levels. The rules for the lower-level constraints (attached to Dataflow/ Metadataflow and Provision Agreement) are detailed below.
Note that there can be a situation where a constraint is specified at a lower level before a constraint is specified at a higher level. Therefore, it is possible that a higher-level constraint makes a lower-level constraint invalid. SDMX makes no rules on how such a conflict should be handled when processing the constraint for attachment. However, the cascade rules on evaluating constraints for usage are clear – the higher-level constraint takes precedence in any conflicts that result in a less restrictive specification at the lower level.
10.4.6.3 Cube Region
It is not necessary to have a Constraint on the higher-level artefact (e.g., DSD referenced by the Dataflow), but if there is such a Constraint at the higher level(s) then:
- The lower-level Constraint cannot be less restrictive than the Constraint specified for the same Member Selection (e.g. Dimension) at the next higher level, which constrains that Member Selection. For example, if the Dimension FREQ is constrained to A, Q in a DSD, then the Constraint at the Dataflow or Provision Agreement cannot be A, Q, M or even just M – it can only further constrain A, Q.
- The Constraint at the lower level for any one Member Selection further constrains the content for the same Member Selection at the higher level(s).
- Any Member Selection, which is not referenced in a Constraint, is deemed to be constrained according to the Constraint specified at the next higher level which constraints that Member Selection.
- If there is a conflict when resolving the Constraint in terms of a lower-level Constraint being less restrictive than a higher-level Constraint, then the Constraint at the higher-level is used.
Note that it is possible for a Constraint at a higher level to constrain, say, four Dimensions in a single Constraint, and a Constraint at a lower level to constrain the same four in two, three, or four Constraints.
10.4.6.4 Key Set
It is not necessary to have a Constraint on the higher-level artefact (e.g., DSD referenced by the Dataflow), but if there is such a Constraint at the higher level(s) then:
- The lower-level Constraint cannot be less restrictive than the Constraint specified at the higher level.
- The Constraint at the lower level for any one Member Selection further constrains the keys specified at the higher level(s).
- Any Member Selection, which is not referenced in a Constraint, is deemed to be constrained according to the Constraint specified at the next higher level which constraints that Member Selection.
- If there is a conflict when resolving the keys in the Constraint at two levels, in terms of a lower-level constraint being less restrictive than a higher-level Constraint, then the offending keys specified at the lower level are not deemed part of the Constraint.
Note that a Key in a Key Set can have wildcarded Components. For instance, the Constraint may simply constrain the Dimension FREQ to "A", and all keys where the FREQ="A" are therefore valid.
The following logic explains how the inheritance mechanism works. Note that this is conceptual logic and actual systems may differ in the way this is implemented.
- Determine all possible keys that are valid at the higher level.
- These keys are deemed to be inherited by the lower-level constrained object, subject to the Constraints specified at the lower level.
- Determine all possible keys that are possible using the Constraints specified at the lower level.
- At the lower level inherit all keys that match with the higher-level Constraint.
- If there are keys in the lower-level Constraint that are not inherited then the key is invalid (i.e., it is less restrictive).
10.4.7 Constraints Examples
10.4.7.1 Data Constraint and Cascading
The following scenario is used.
A DSD contains the following Dimensions:
- GEO – Geography
- SEX – Sex
- AGE – Age
- CAS – Current Activity Status
In the DSD, common code lists are used and the requirement is to restrict these at various levels to specify the actual code that are valid for the object to which the Constraint is attached.
Figure 20: Example Scenario for Constraints
Constraints are declared as follows:
Figure 21: Example Constraints
Notes:
AGE is constrained for the DSD and is further restricted for the Dataflow CENSUS_CUBE1.
- The same Constraint applies to both Provision Agreements.
The cascade rules elaborated above result as follows:
Dataflow CENSUS_CUBE1
- Constrained by restricting the code list for the AGE Dimension to codes 002 and 003 (note that this is a more restrictive constraint than that declared for the DSD which specifies all codes except code 001).
- Restricts the CAS codes to 003 and 004.
Dataflow CENSUS_CUBE2
- Restricts the code list for the CAS Dimension to codes TOT and NAP.
- Inherits the AGE constraint applied at the level of the DSD.
Provision Agreement CENSUS_CUBE1_IT
- Restricts the codes for the GEO Dimension to IT and its children.
- Inherits the constraints from Dataflow CENSUS_CUBE1 for the AGE and CAS Dimensions.
Provision Agreement CENSUS_CUBE2_IT
- Restricts the codes for the GEO Dimension to IT and its children.
- Inherits the constraints from Dataflow CENSUS_CUBE2 for the CAS Dimension.
- Inherits the AGE constraint applied at the level of the DSD.
The Constraints are defined as follows:
DSD Constraint
Dataflow Constraints
Provision Agreement Constraint
10.4.7.2 Combination of Constraints
The possible combination of constraining terms are explained in this section, following a few examples.
Let’s assume a DSD with the following Components:
Dimension FREQ Dimension JD_TYPE Dimension JD_CATEGORY Dimension VIS_CTY TimeDimension TIME_PERIOD Attribute OBS_STATUS Attribute UNIT Attribute COMMENT MetadataAttribute CONTACT Measure MULTISELECT Measure CHOICE On the above, let’s assume the following use cases with their constraining requirements:
Use Case 1: A Constraint on allowed values for some Dimensions
R1: Allow monthly and quarterly data
R2: Allow Mexico for vis-à-vis country
This is expressed with the following CubeRegion:
FREQ M, Q VIS_CTY MX Use Case 2: A Constraint on allowed combinations for some Dimensions
R1: Allow monthly data for Germany
R2: Allow quarterly data for Mexico
This is expressed with the following DataKeySet:
Key1 FREQ M VIS_CTY DE Key2 FREQ Q VIS_CTY MX Use Case 3: A Constraint on allowed values for some Dimensions combined with allowed values for some Attributes R1: Allow monthly and quarterly data
R2: Allow Mexico for vis-à-vis country
R3: Allow present for status
This may be expressed with the following CubeRegion:
FREQ M, Q VIS_CTY MX OBS_STATUS A Use Case 4: A Constraint on allowed combinations for some Dimensions combined with specific Attribute values
R1: Allow monthly data, for Germany, with unit euro
R2: Allow quarterly data, for Mexico, with unit usd
This may be expressed with the following DataKeySet:
Key1 FREQ M VIS_CTY DE UNIT EUR Key2 FREQ Q VIS_CTY MX UNIT USD Use Case 5: A Constraint on allowed values for some Dimensions together with some combination of Dimension values
R1: For annually and quarterly data, for Mexico and Germany, only A status is allowed
R2: For monthly data, for Mexico and Germany, only F status is allowed
Considering the above examples, the following CubeRegions would be created:
CubeRegion1 FREQ Q, A VIS_CTY MX, DE OBS_STATUS A CubeRegion2 FREQ M VIS_CTY MX, DE OBS_STATUS F The problem with this approach is that according to the business rule for Constraints, only one should be specified per Component. Thus, if a software would perform some conflict resolution would end up with empty sets for FREQ and OBS_STATUS (as they do not share any values).
Nevertheless, there is a much easier approach to that; this is the cascading mechanism of Constraints (as shown in 10.4.7.1). Hence, these rules would be expressed into two levels of Constraints, e.g., DSD and Dataflows:
DSD CubeRegion:
FREQ M, Q, A VIS_CTY MX, DE OBS_STATUS A, F Dataflow1 CubeRegion:
FREQ Q, A VIS_CTY MX, DE OBS_STATUS F Dataflow2 CubeRegion:
FREQ M VIS_CTY MX, DE OBS_STATUS A Use case 6: A Constraint on allowed values for some Dimensions combined with allowed values for Measures
R1: Allow monthly data, for Germany, with unit euro, and measure choice is 'A' R2: Allow quarterly data, for Mexico, with unit usd, and measure choice is 'B'
This may be expressed with the following DataKeySet:
Key1 FREQ M VIS_CTY DE UNIT EUR CHOICE A Key2 FREQ Q VIS_CTY MX UNIT USD CHOICE B Use Case 7: A Constraint with wildcards for Codes and removePrefix property
For this example, we assume that the VIS_CTY representation has been prefixed with prefix ‘AREA_’. In this Constraint, we need to remove the prefix.
R1: Allow monthly and quarterly data
R2: Allow vis-à-vis countries that start with M
R3: Remove the prefix ‘AREA_’
This may be expressed with the following CubeRegion:
FREQ M, Q VIS_CTY (removePrefix=’AREA_’) M% Use Case 8: A Constraint with multilingual support on Attributes
R1: Allow monthly and quarterly data
R2: Allow Mexico for vis-à-vis country
R3: Allow a comment, in English, which includes the term adjusted for status
This may be expressed with the following CubeRegion:
FREQ M, Q VIS_CTY MX COMMENT (lang=’en’) %adjusted% Use Case 9: A Constraint on allowed values for Dimensions combined with allowed values for Metadata Attributes R1: Allow monthly and quarterly data
R2: Allow Mexico for vis-à-vis country
R3: Allow John Doe for contact
This may be expressed with the following CubeRegion:
FREQ M, Q VIS_CTY MX CONTACT John Doe