13 Structure Mapping

Last modified by Helena on 2025/07/20 12:47

13.1 Introduction

The purpose of SDMX structure mapping is to transform datasets from one dimensionality to another. In practice, this means that the input and output datasets conform to different Data Structure Definition.

Structure mapping does not alter the observation values and is not intended to perform any aggregations or calculations.

An input series maps to:

  1. Exactly one output series; or
  2. Multiple output series with different Series Keys, but the same observation values; or
  3. Zero output series where no source rule matches the input Component values.

Typical use cases include:

  • Transforming received data into a common internal structure;
  • Transforming reported data into the data collector's preferred structure;
  • Transforming unidimensional datasets1 to multi-dimensional; and
  • Transforming internal datasets with a complex structure to a simpler structure with fewer dimensions suitable for dissemination.

13.2 1-1 structure maps

1-1 (pronounced 'one to one') mappings support the simple use case where the value of a Component in the source structure is translated to a different value in the target, usually where different classification schemes are used for the same Concept.

In the example below, ISO 2-character country codes are mapped to their ISO 3character equivalent.

CountryAlpha-2 codeAlpha-3 code
AfghanistanAFAFG
AlbaniaALALB
AlgeriaDZDZA
American SamoaASASM
AndorraADAND
etc…  

Different source values can also map to the same target value, for example when deriving regions from country codes.

Source Component:
REF_AREA
Target Component:
REGION
FREUR
DEEUR
ITEUR
ESEUR
BEEUR

13.3 N-n structure maps

N-n (pronounced 'N to N') mappings describe rules where a specified combination of values in multiple source Components map to specified values in one or more target Components. For example, when mapping a partial Series Key from a highly multidimensional cube (like Balance of Payments) to a single 'Indicator' Dimension in a target Data Structure.

Example:

RuleSourceTarget
1

If
FREQUENCY=A; and ADJUSTMENT=N; and MATURITY=L.

Set
INDICATOR=A_N_L

2

If
FREQUENCY=M; and ADJUSTMENT=S_A1; and MATURITY=TY12.

Set
INDICATOR=MON_SAX_12

N-n rules can also set values for multiple source Components.

RuleSourceTarget
1

If
FREQUENCY=A; and ADJUSTMENT=N; and MATURITY=L.

Set
INDICATOR=A_N_L,
STATUS=QXR15,
NOTE="Unadjusted".

2

If
FREQUENCY=M; and ADJUSTMENT=S_A1; and MATURITY=TY12.

Set
INDICATOR=MON_SAX_12, STATUS=MPM12,
NOTE="Seasonally Adjusted"

13.4 Ambiguous mapping rules

A structure map is ambiguous if the rules result in a dataset containing multiple series with the same Series Key.

A simple example mapping a source dataset with a single dimension to one with multiple dimensions is shown below:

SourceTargetOutput Series Key
SERIES_CODE=XMAN_Z_21

Dimensions
INDICATOR=XM
FREQ=A
ADJUSTMENT=N
Attributes
UNIT_MEASURE=_Z
COMP_ORG=21

XM:A:N
SERIES_CODE=XMAN_Z_34

Dimensions
INDICATOR=XM
FREQ=A
ADJUSTMENT=N
Attributes
UNIT_MEASURE=_Z
COMP_ORG=34

XM:A:N

The above behaviour can be okay if the series XMAN_Z_21 contains observations for different periods of time then the series XMAN_Z_34. If however both series contain observations for the same point in time, the output for this mapping will be two observations with the same series key, for the same period in time.

13.5 Representation maps

Representation Maps replace the SDMX 2.1 Codelist Maps and are used describe explicit mappings between source and target Component values.

The source and target of a Representation Map can reference any of the following:

  1. Codelist
  2. Free Text (restricted by type, e.g String, Integer, Boolean)
  3. Valuelist

A Representation Map mapping ISO 2-character to ISO 3-character Codelists would take the following form:

CL_ISO_ALPHA2CL_ISO_ALPHA3
AFAFG
ALALB
DZDZA
ASASM
ADAND
etc… 

A Representation Map mapping free text country names to an ISO 2-character Codelist could be similarly described:

TextCL_ISO_ALPHA2
"Germany"DE
"France"FR
"United Kingdom"GB
"Great Britain"GB
"Ireland"IE
"Eire"IE
etc… 

Valuelists, introduced in SDMX 3.0, are equivalent to Codelists but allow the maintenance of non-SDMX identifiers. Importantly, their IDs do not need to conform to IDType, but as a consequence are not Identifiable.

When used in Representation Maps, Valuelists allow Non-SDMX identifiers containing characters like £, $, % to be mapped to Code IDs, or Codes mapped to non-SDMX identifiers.

In common with Codelists, each item in a Valuelist has a multilingual name giving it a human-readable label and an optional description. For example:

ValueLocaleName
$enUnited States Dollar
%EnPercentage
 frPourcentage

Other characteristics of Representation Maps:

  • Support the mapping of multiple source Component values to multiple Target Component values as described in section 13.3 on n-to-n mappings; this covers also the case of mapping an Attribute with an array representation to map combinations of values to a single target value;
  • Allow source or target mappings for an Item to be optional allowing rules such as 'A maps to nothing' or 'nothing maps to A'; and
  • Support for mapping rules where regular expressions or substrings are used to match source Component values. Refer to section 13.6 for more on this topic.

13.6 Regular expression and substring rules

It is common for classifications to contain meanings within the identifier, for example the code Id 'XULADS' may refer to a particular seasonality because it starts with the letters XU.

With SDMX 2.1 each code that starts with XU had to be individually mapped to the same seasonality, and additional mappings added when new Codes were added to the Codelists. This led to many hundreds or thousands of mappings which can be more efficiently summarised in a single conceptual rule:

If starts with 'XU' map to 'Y'

These rules are described using either regular expressions, or substrings for simpler use cases.

13.6.1 Regular expressions

Regular expression mapping rules are defined in the Representation Map.

Below is an example set of regular expression rules for a particular component.

RegexDescriptionOutput
ARule match if input = 'A'OUT_A
^[A-G]Rule match if the input starts with letters A to GOUT_B
A|BRule match if input is either 'A' or 'B'OUT_C

Like all mapping rules, the output is either a Code, a Value or free text depending on the representation of the Component in the target Data Structure Definition.

If the regular expression contains capture groups, these can be used in the definition of the output value, by specifying \n as an output value where n is the number of the capture group starting from 1. For example

RegexTarget outputExample InputExample Output

([0-9]{4})[0-9]([0-9]{1})

\1-Q\22009332009-Q3

As regular expression rules can be used as a general catch-all if nothing else matches, the ordering of the rules is important. Rules should be tested starting with the highest priority, moving down the list until a match is found.

The following example shows this:

PriorityRegexDescriptionOutput
1ARule match if input = 'A'OUT_A
2BRule match if input = 'B'OUT_B
3[A-Z]Any character A-ZOUT_C

The input 'A' matches both the first and the last rule, but the first takes precedence having the higher priority. The output is OUT_A.

The input 'G' matches on the last rule which is used as a catch-all or default in this example.

13.6.2 Substrings

Substrings provide an alternative to regular expressions where the required section of an input value can be described using the number of the starting character, and the length of the substring in characters. The first character is at position 1.

For instance:

Input StringStartLengthOutput
ABC_DEF_XYZ53DEF
XULADS12XU

Sub-strings can therefore be used for the conceptual rule If starts with 'XU' map to Y as shown in the following example:

StartLengthSourceTarget
12XUY

13.7 Mapping non-SDMX time formats to SDMX formats

Structure mapping allows non-SDMX compliant time values in source datasets to be mapped to an SDMX compliant time format.

Two types of time input are defined:

a. Pattern based dates – a string which can be described using a notation like dd/mm/yyyy or is represented as the number of periods since a point in time, for example: 2010M001 (first month in 2010), or 2014D123 (123rd day in 2014); and b. Numerical based datetime – a number specifying the elapsed periods since a fixed point in time, for example Unix Time is measured by the number of milliseconds since 1970.

The output of a time-based mapping is derived from the output Frequency, which is either explicitly stated in the mapping or defined as the value output by a specific Dimension or Attribute in the output mapping. If the output frequency is unknown or if the SDMX format is not desired, then additional rules can be provided to specify the output date format for the given frequency Id. The default rules are:

FrequencyFormatExample
AYYYY2010
DYYYY-MM-DD2010-01-01
IYYYY-MM-DDThh:mm:ss2010-01T20:22:00
MYYYY-MM2010-01
QYYYY-Qn2010-Q1
SYYYY-Sn2010-S1
TYYYY-Tn2010-T1
WYYYY-WnYYYY-W53

In the case where the input frequency is lower than the output frequency, the mapping defaults to end of period, but can be explicitly set to start, end or mid-period.

There are two important points to note:

  1. The output frequency determines the output date format, but the default output can be redefined using a Frequency Format mapping to force explicit rules on how the output time period is formatted.
  2. To support the use case of changing frequency the structure map can optionally provide a start of year attribute, which defines the year start date in MM-DD format. For example: YearStart=04-01.

13.7.1 Pattern based dates

Date and time formats are specified by date and time pattern strings based on Java's Simple Date Format. Within date and time pattern strings, unquoted letters from 'A' to 'Z' and from 'a' to 'z' are interpreted as pattern letters representing the components of a date or time string. Text can be quoted using single quotes (') to avoid interpretation. "''" represents a single quote. All other characters are not interpreted; they're simply copied into the output string during formatting or matched against the input string during parsing.

Due to the fact that dates may differ per locale, an optional property, defining the locale of the pattern, is provided. This would assist processing of source dates, according to the given locale2. An indicative list of examples is presented in the following table:

English (en)Australia (AU)en-AU
English (en)Canada (CA)en-CA
English (en)United Kingdom (GB)en-GB
English (en)United States (US)en-US
Estonian (et)Estonia (EE)et-EE
Finnish (fi)Finland (FI)fi-FI
French (fr)Belgium (BE)fr-BE
French (fr)Canada (CA)fr-CA
French (fr)France (FR)fr-FR
French (fr)Luxembourg (LU)fr-LU
French (fr)Switzerland (CH)fr-CH
German (de)Austria (AT)de-AT
German (de)Germany (DE)de-DE
German (de)Luxembourg (LU)de-LU
German (de)Switzerland (CH)de-CH
Greek (el)Cyprus (CY)el-CY(*)
Greek (el)Greece (GR)el-GR
Hebrew (iw)Israel (IL)iw-IL
Hindi (hi)India (IN)hi-IN
Hungarian (hu)Hungary (HU)hu-HU
Icelandic (is)Iceland (IS)is-IS
Indonesian (in)Indonesia (ID)in-ID(*)
Irish (ga)Ireland (IE)ga-IE(*)
Italian (it)Italy (IT)it-IT

Examples

22/06/1981 would be described as dd/MM/YYYY, with locale en-GB
2008-mars-12 would be described as YYYY-MMM-DD, with locale fr-FR
22 July 1981 would be described as dd MMMM YYYY, with locale en-US
22 Jul 1981 would be described as dd MMM YYYY
2010 D62 would be described as YYYYDnn (day 62 of the year 2010)

The following pattern letters are defined (all other characters from 'A' to 'Z' and from 'a' to 'z' are reserved):

LetterDate or Time ComponentPresentationExamples
GEra designatorTextAD
yyYear short (upper case is Year of Week3)Year96
yyyyYear Full (upper case is Year of Week)Year1996
MMMonth number in year starting with 1Month07
MMMMonth name shortMonthJul
MMMMMonth name fullMonthJuly
wwWeek in yearNumber27
WWeek in monthNumber2
DDDay in yearNumber189
ddDay in monthNumber10
FDay of week in monthNumber2
EDay name in weekTextTuesday; Tue
UDay number of week (1 = Monday, ..., 7 = Sunday)Number1
HHHour in day (0-23)Number0
kkHour in day (1-24)Number24
KKHour in am/pm (0-11)Number0
hhHour in am/pm (1-12)Number12
mmMinute in hourNumber30
ssSecond in minuteNumber55
SMillisecondNumber978
nNumber of periods, used after a SDMX Frequency Identifier such as M, Q, D (month, quarter, day)Number12

The model is illustrated below:

SDMX 3-0-0 SECTION 6 FINAL-1.0_en_295af259.jpg

Figure 24 showing the component map mapping the SOURCE_DATE Dimension to the TIME_PERIOD dimension with the additional information on the component map to describe the time format

SDMX 3-0-0 SECTION 6 FINAL-1.0_en_a3215c79.jpg

Figure 25 showing an input date format, whose output frequency is derived from the output value of the FREQ Dimension

13.7.2 Numerical based datetime

Where the source datetime input is purely numerical, the mapping rules are defined by the Base as a valid SDMX Time Period, and the Period which must take one of the following enumerated values:

  • day
  • second
  • millisecond
  • microsecond
  • nanosecond
Numerical datetime systemsBasePeriod

Epoch Time (UNIX)
Milliseconds since 01 Jan 1970

1970millisecond

Windows System Time
Milliseconds since 01 Jan 1601

1601millisecond

The example above illustrates numerical based datetime mapping rules for two commonly used time standards.

The model is illustrated below:

SDMX 3-0-0 SECTION 6 FINAL-1.0_en_ab51b44a.jpg

Figure 26 showing the component map mapping the SOURCE_DATE Dimension to the

TIME_PERIOD Dimension with the additional information on the component map to describe the numerical datetime system in use

13.7.3 Mapping more complex time inputs

VTL should be used for more complex time inputs that cannot be interpreted using the pattern based on numerical methods.

13.8 Using TIME_PERIOD in mapping rules

The source TIME_PERIOD Dimension can be used in conjunction with other input Dimensions to create discrete mapping rules where the output is conditional on the time period value.

The main use case is setting the value of Observation Attributes in the target dataset.

RuleSourceTarget
1

If
INDICATOR=XULADS; and TIME_PERIOD=2007.

Set
OBS_CONF=F

2

If
INDICATOR=XULADS; and TIME_PERIOD=2008.

Set
OBS_CONF=F

3

If
INDICATOR=XULADS; and TIME_PERIOD=2009.

Set
OBS_CONF=F

4

If
INDICATOR=XULADS; and TIME_PERIOD=2010.

Set
OBS_CONF=C

In the example above, OBS_CONF is an Observation Attribute.

13.9 Time span mapping rules using validity periods

Creating discrete mapping rules for each TIME_PERIOD is impractical where rules need to cover a specific span of time regardless of frequency, and for high-frequency data.

Instead, an optional validity period can be set for each mapping.

By specifying validity periods, the example from Section 13.8 can be re-written using two rules as follows:

RuleSourceTarget
1

If
INDICATOR=XULADS.
Validity Period start period=2007 end period=2009

Set
OBS_CONF=F

2

If
INDICATOR=XULADS.
Validity Period start period=2010

Set
OBS_CONF=F

In Rule 1, start period resolves to the start of the 2007 period (2007-01-01T00:00:00), and the end period resolves to the very end of 2009 (2009-12-31T23:59:59). The rule will hold true regardless of the input data frequency. Any observations reporting data for the Indicator XULADS that fall into that time range will have an OBS_CONF value of F.

In Rule 2, no end period is specified so remains in effect from the start of the period (2010-01-01T00:00:00) until the end of time. Any observations reporting data for the Indicator XULADS that fall into that time range will have an OBS_CONF value of C.

13.10 Mapping examples

13.10.1 Many to one mapping (N-1)

1747377208446-496.png

The bold Dimensions map from source to target verbatim. The mapping simply specifies:

FREQ => FREQ
REF_AREA=> REF_AREA
COUNTERPART_AREA=> COUNTERPART _AREA

No Representation Mapping is required. The source value simply copies across unmodified.

The remaining Dimensions all map to the Indicator Dimension. This is an example of many Dimensions mapping to one Dimension. In this case a Representation Mapping is required, and the mapping first describes the input 'partial key' and how this maps to the target indicator:

N:S1:S1:B:B5G => IND_ABC

Where the key sequence is based on the order specified in the mapping (i.e ADJUSTMENT, REF_SECTOR, etc will result in the first value N being taken from ADJUSTMENT as this was the first item in the source Dimension list.

Note: The key order is NOT based on the Dimension order of the DSD, as the mapping needs to be resilient to the DSD changing.

13.10.2 Mapping other data types to Code Id

In the case where the incoming data type is not a string and not a code identifier i.e. the source Dimension is of type Integer and the target is Codelist. This is supported by the RepresentationMap. The RepresentationMap source can reference a Codelist, Valuelist, or be free text, the free text can include regular expressions.

The following representation mapping can be used to explicitly map each age to an output code.

Source Input Free TextDesired Output Code Id
0A
1A
2A
3B
4B

If this mapping takes advantage of regular expressions it can be expressed in two rules:

Regular Expression

Desired Output
[0-2]A
[3-4]B

13.10.3 Observation Attributes for Time Period

This use case is where a specific observation for a specific time period has an attribute value.

Input INDICATORInput TIME_PERIODOutput OBS_CONF
XULADS2008C
XULADS2009C
XULADS2010C

Or using a validity period on the Representation Mapping:

Input INDICATORValid From/ Valid ToOutput OBS_CONF
XULADS2008/2010C

13.10.4 Time mapping

This use case is to create a time period from an input that does not respect SDMXTime Formats.

The Component Mapping from SYS_TIME to TIME_PERIOD specifies itself as a time mapping with the following details:

Source ValueSource MappingTarget FrequencyOutput
18/07/1981dd/MM/yyyyA1981

When the target frequency is based on another target Dimension value, in this example the value of the FREQ Dimension in the target DSD.

Source ValueSource MappingTarget FrequencyOutput Dimension
18/07/1981dd/MM/yyyyFREQ1981-07-18 (when FREQ=D)

When the source is a numerical format

Source ValueStart PeriodIntervTarget FREQOutput
15898082201970millisecond1981-07-18 (when FREQ=D)2020-05

When the source frequency is lower than the target frequency additional information 3485 can be provided for resolve to start of period, end of period, or mid period, as shown 3486 in the following example:

Source ValueSource MappingTarget DimensionFrequencyOutput
1981yyyyD – End of Period 1981-12-31

When the start of year is April 1st the Structure Map has YearStart=04-01:

Source ValueSource MappingTarget DimensionFrequencyOutput
1981yyyyD – End of Period 1982-03-31

  1. ^ Unidimensional datasets are those with a single 'indicator' or 'series code' dimension.
  2. ^ Unidimensional datasets are those with a single 'indicator' or 'series code' dimension.
  3. ^ yyyy represents the calendar year while YYYY represents the year of the week, which is only relevant for 53 week years