Changes for page 13 Structure Mapping

Last modified by Artur on 2025/09/10 11:19

From version 4.9
edited by Helena
on 2025/06/16 14:50
Change comment: There is no comment for this version
To version 4.5
edited by Helena
on 2025/06/16 14:41
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -189,37 +189,36 @@
189 189  
190 190  These rules are described using either regular expressions, or substrings for simpler use cases.
191 191  
192 -=== 13.6.1 Regular expressions ===
192 +=== 13.5.1 Regular expressions ===
193 193  
194 -Regular expression mapping rules are defined in the [[Representation>>doc:sdmx:Glossary.Representation.WebHome]] Map.
194 +Regular expression mapping rules are defined in the Representation Map.
195 195  
196 -Below is an example set of regular expression rules for a particular [[component>>doc:sdmx:Glossary.Component.WebHome]].
196 +Below is an example set of regular expression rules for a particular component.
197 197  
198 -(% style="width:664.294px" %)
199 -|(% style="width:141px" %)**Regex**|(% style="width:362px" %)**Description**|(% style="width:158px" %)**Output**
200 -|(% style="width:141px" %)A|(% style="width:362px" %)Rule match if input = 'A'|(% style="width:158px" %)OUT_A
201 -|(% style="width:141px" %)^[A-G]|(% style="width:362px" %)Rule match if the input starts with letters A to G|(% style="width:158px" %)OUT_B
202 -|(% style="width:141px" %)A~|B|(% style="width:362px" %)Rule match if input is either 'A' or 'B'|(% style="width:158px" %)OUT_C
198 +|Regex|Description|Output
199 +|A|Rule match if input = 'A'|OUT_A
200 +|^[A-G]|Rule match if the input starts with letters A to G|OUT_B
201 +|A~|B|Rule match if input is either 'A' or 'B'|OUT_C
203 203  
204 -Like all mapping rules, the output is either a [[Code>>doc:sdmx:Glossary.Code.WebHome]], a Value or free text depending on the [[representation>>doc:sdmx:Glossary.Representation.WebHome]] of the [[Component>>doc:sdmx:Glossary.Component.WebHome]] in the target [[Data Structure Definition>>doc:sdmx:Glossary.Data structure definition.WebHome]].
203 +Like all mapping rules, the output is either a Code, a Value or free text depending on the representation of the Component in the target Data Structure Definition.
205 205  
206 206  If the regular expression contains capture groups, these can be used in the definition of the output value, by specifying \**//n//**// //as an output value where **//n//** is the number of the capture group starting from 1. For example
207 207  
208 -(% style="width:700.294px" %)
209 -|(% style="width:203px" %)Regex|(% style="width:148px" %)Target output|(% style="width:157px" %)Example Input|(% style="width:189px" %)Example Output
210 -|(% style="width:203px" %)(((
211 -([0-9]{4})[0-9]([0-9]{1})
212 -)))|(% style="width:148px" %)\1-Q\2|(% style="width:157px" %)200933|(% style="width:189px" %)2009-Q3
207 +|Regex|Target output|Example Input|Example Output
208 +|(((
209 +([0-9]{4})[0-
213 213  
211 +9]([0-9]{1})
212 +)))|\1-Q\2|200933|2009-Q3
213 +
214 214  As regular expression rules can be used as a general catch-all if nothing else matches, the ordering of the rules is important. Rules should be tested starting with the highest priority, moving down the list until a match is found.
215 215  
216 216  The following example shows this:
217 217  
218 -(% style="width:704.294px" %)
219 -|(% style="width:130px" %)Priority|(% style="width:125px" %)Regex|(% style="width:241px" %)Description|(% style="width:205px" %)Output
220 -|(% style="width:130px" %)1|(% style="width:125px" %)A|(% style="width:241px" %)Rule match if input = 'A'|(% style="width:205px" %)OUT_A
221 -|(% style="width:130px" %)2|(% style="width:125px" %)B|(% style="width:241px" %)Rule match if input = 'B'|(% style="width:205px" %)OUT_B
222 -|(% style="width:130px" %)3|(% style="width:125px" %)[A-Z]|(% style="width:241px" %)Any character A-Z|(% style="width:205px" %)OUT_C
218 +|Priority|Regex|Description|Output
219 +|1|A|Rule match if input = 'A'|OUT_A
220 +|2|B|Rule match if input = 'B'|OUT_B
221 +|3|[A-Z]|Any character A-Z|OUT_C
223 223  
224 224  The input 'A' matches both the first and the last rule, but the first takes precedence having the higher priority. The output is OUT_A.
225 225  
... ... @@ -231,27 +231,24 @@
231 231  
232 232  For instance:
233 233  
234 -(% style="width:623.294px" %)
235 -|(% style="width:169px" %)Input String|(% style="width:147px" %)Start|(% style="width:133px" %)Length|(% style="width:171px" %)Output
236 -|(% style="width:169px" %)ABC_DEF_XYZ|(% style="width:147px" %)5|(% style="width:133px" %)3|(% style="width:171px" %)DEF
237 -|(% style="width:169px" %)XULADS|(% style="width:147px" %)1|(% style="width:133px" %)2|(% style="width:171px" %)XU
233 +|Input String|Start|Length|Output
234 +|ABC_DEF_XYZ|5|3|DEF
235 +|XULADS|1|2|XU
238 238  
239 -Sub-strings can therefore be used for the conceptual rule //If starts with 'XU' (% style="color:#e74c3c" %)map(%%) to Y// as shown in the following example:
237 +Sub-strings can therefore be used for the conceptual rule //If starts with 'XU' map to Y// as shown in the following example:
240 240  
241 -(% style="width:628.294px" %)
242 -|(% style="width:163px" %)Start|(% style="width:158px" %)Length|(% style="width:128px" %)Source|(% style="width:176px" %)Target
243 -|(% style="width:163px" %)1|(% style="width:158px" %)2|(% style="width:128px" %)XU|(% style="width:176px" %)Y
239 +|Start|Length|Source|Target
240 +|1|2|XU|Y
244 244  
245 -== 13.7 Mapping non-SDMX time formats to SDMX formats ==
242 +== 13.6 Mapping non-SDMX time formats to SDMX formats ==
246 246  
247 -Structure mapping allows non-[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] compliant time values in source [[datasets>>doc:sdmx:Glossary.Data set.WebHome]] to be (% style="color:#e74c3c" %)mapped(%%) to an [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] compliant time format.
244 +Structure mapping allows non-SDMX compliant time values in source datasets to be mapped to an SDMX compliant time format.
248 248  
249 249  Two types of time input are defined:
250 250  
251 -a. **Pattern based dates** – a string which can be described using a notation like dd/mm/yyyy or is represented as the number of periods since a point in time, for example: 2010M001 (first month in 2010), or 2014D123 (123^^rd^^ day in 2014); and
252 -b. **Numerical based datetime** – a number specifying the elapsed periods since a fixed point in time, for example Unix Time is measured by the number of milliseconds since 1970.
248 +a. **Pattern based dates** – a string which can be described using a notation like dd/mm/yyyy or is represented as the number of periods since a point in time, for example: 2010M001 (first month in 2010), or 2014D123 (123^^rd^^ day in 2014); and b. **Numerical based datetime** – a number specifying the elapsed periods since a fixed point in time, for example Unix Time is measured by the number of milliseconds since 1970.
253 253  
254 -The output of a time-based mapping is derived from the output Frequency, which is either explicitly stated in the mapping or defined as the value output by a specific [[Dimension>>doc:sdmx:Glossary.Dimension.WebHome]] or [[Attribute>>doc:sdmx:Glossary.Attribute.WebHome]] in the output mapping. If the output frequency is unknown or if the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] format is not desired, then additional rules can be provided to specify the output date format for the given frequency Id. The default rules are:
250 +The output of a time-based mapping is derived from the output Frequency, which is either explicitly stated in the mapping or defined as the value output by a specific Dimension or Attribute in the output mapping. If the output frequency is unknown or if the SDMX format is not desired, then additional rules can be provided to specify the output date format for the given frequency Id. The default rules are:
255 255  
256 256  |Frequency|Format|Example
257 257  |A|YYYY|2010
... ... @@ -271,43 +271,40 @@
271 271  
272 272  There are two important points to note:
273 273  
274 -1. The output frequency determines the output date format, but the default output can be redefined using a Frequency Format mapping to force explicit rules on how the output [[time period>>doc:sdmx:Glossary.Time period.WebHome]] is formatted.
275 -1. To support the use case of changing frequency the structure (% style="color:#e74c3c" %)map(%%) can optionally provide a start of year [[attribute>>doc:sdmx:Glossary.Attribute.WebHome]], which defines the year start date in MM-DD format. For example: YearStart=04-01.
270 +1. The output frequency determines the output date format, but the default output can be redefined using a Frequency Format mapping to force explicit rules on how the output time period is formatted.
271 +1. To support the use case of changing frequency the structure map can optionally provide a start of year attribute, which defines the year start date in MM-DD format. For example: YearStart=04-01.
272 +11.
273 +111. Pattern based dates
276 276  
277 -=== 13.7.1 Pattern based dates ===
275 +Date and time formats are specified by date and time pattern strings based on Java's Simple Date Format. Within date and time pattern strings, unquoted letters from 'A' to 'Z' and from 'a' to 'z' are interpreted as pattern letters representing the components of a date or time string. Text can be quoted using single quotes (') to avoid interpretation. "''" represents a single quote. All other characters are not interpreted; they're simply copied into the output string during formatting or matched against the input string during parsing.
278 278  
279 -Date and [[time formats>>doc:sdmx:Glossary.Time format.WebHome]] are specified by date and time pattern strings based on Java's Simple Date Format. Within date and time pattern strings, unquoted letters from 'A' to 'Z' and from 'a' to 'z' are interpreted as pattern letters representing the [[components>>doc:sdmx:Glossary.Component.WebHome]] of a date or time string. Text can be quoted using single quotes (') to avoid interpretation. "''" represents a single quote. All other characters are not interpreted; they're simply copied into the output string during formatting or matched against the input string during parsing.
280 -
281 281  Due to the fact that dates may differ per locale, an optional property, defining the locale of the pattern, is provided. This would assist processing of source dates, according to the given locale{{footnote}} A list of commonly used locales can be found in the Java supported locales: https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html{{/footnote}}. An indicative list of examples is presented in the following table:
282 282  
283 -(% style="width:604.294px" %)
284 -|(% style="width:172px" %)English (en)|(% style="width:216px" %)Australia (AU)|(% style="width:213px" %)en-AU
285 -|(% style="width:172px" %)English (en)|(% style="width:216px" %)Canada (CA)|(% style="width:213px" %)en-CA
286 -|(% style="width:172px" %)English (en)|(% style="width:216px" %)United Kingdom (GB)|(% style="width:213px" %)en-GB
287 -|(% style="width:172px" %)English (en)|(% style="width:216px" %)United States (US)|(% style="width:213px" %)en-US
288 -|(% style="width:172px" %)Estonian (et)|(% style="width:216px" %)Estonia (EE)|(% style="width:213px" %)et-EE
289 -|(% style="width:172px" %)Finnish (fi)|(% style="width:216px" %)Finland (FI)|(% style="width:213px" %)fi-FI
290 -|(% style="width:172px" %)French (fr)|(% style="width:216px" %)Belgium (BE)|(% style="width:213px" %)fr-BE
291 -|(% style="width:172px" %)French (fr)|(% style="width:216px" %)Canada (CA)|(% style="width:213px" %)fr-CA
292 -|(% style="width:172px" %)French (fr)|(% style="width:216px" %)France (FR)|(% style="width:213px" %)fr-FR
293 -|(% style="width:172px" %)French (fr)|(% style="width:216px" %)Luxembourg (LU)|(% style="width:213px" %)fr-LU
294 -|(% style="width:172px" %)French (fr)|(% style="width:216px" %)Switzerland (CH)|(% style="width:213px" %)fr-CH
295 -|(% style="width:172px" %)German (de)|(% style="width:216px" %)Austria (AT)|(% style="width:213px" %)de-AT
296 -|(% style="width:172px" %)German (de)|(% style="width:216px" %)Germany (DE)|(% style="width:213px" %)de-DE
297 -|(% style="width:172px" %)German (de)|(% style="width:216px" %)Luxembourg (LU)|(% style="width:213px" %)de-LU
298 -|(% style="width:172px" %)German (de)|(% style="width:216px" %)Switzerland (CH)|(% style="width:213px" %)de-CH
299 -|(% style="width:172px" %)Greek (el)|(% style="width:216px" %)Cyprus (CY)|(% style="width:213px" %)el-CY(*)
300 -|(% style="width:172px" %)Greek (el)|(% style="width:216px" %)Greece (GR)|(% style="width:213px" %)el-GR
301 -|(% style="width:172px" %)Hebrew (iw)|(% style="width:216px" %)Israel (IL)|(% style="width:213px" %)iw-IL
302 -|(% style="width:172px" %)Hindi (hi)|(% style="width:216px" %)India (IN)|(% style="width:213px" %)hi-IN
303 -|(% style="width:172px" %)Hungarian (hu)|(% style="width:216px" %)Hungary (HU)|(% style="width:213px" %)hu-HU
304 -|(% style="width:172px" %)Icelandic (is)|(% style="width:216px" %)Iceland (IS)|(% style="width:213px" %)is-IS
305 -|(% style="width:172px" %)Indonesian (in)|(% style="width:216px" %)Indonesia (ID)|(% style="width:213px" %)in-ID(*)
306 -|(% style="width:172px" %)Irish (ga)|(% style="width:216px" %)Ireland (IE)|(% style="width:213px" %)ga-IE(*)
307 -|(% style="width:172px" %)Italian (it)|(% style="width:216px" %)Italy (IT)|(% style="width:213px" %)it-IT
279 +|English (en)|Australia (AU)|en-AU
280 +|English (en)|Canada (CA)|en-CA
281 +|English (en)|United Kingdom (GB)|en-GB
282 +|English (en)|United States (US)|en-US
283 +|Estonian (et)|Estonia (EE)|et-EE
284 +|Finnish (fi)|Finland (FI)|fi-FI
285 +|French (fr)|Belgium (BE)|fr-BE
286 +|French (fr)|Canada (CA)|fr-CA
287 +|French (fr)|France (FR)|fr-FR
288 +|French (fr)|Luxembourg (LU)|fr-LU
289 +|French (fr)|Switzerland (CH)|fr-CH
290 +|German (de)|Austria (AT)|de-AT
291 +|German (de)|Germany (DE)|de-DE
292 +|German (de)|Luxembourg (LU)|de-LU
293 +|German (de)|Switzerland (CH)|de-CH
294 +|Greek (el)|Cyprus (CY)|el-CY[[(*)>>url:https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html#cldrlocale]][[url:https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html#cldrlocale]]
295 +|Greek (el)|Greece (GR)|el-GR
296 +|Hebrew (iw)|Israel (IL)|iw-IL
297 +|Hindi (hi)|India (IN)|hi-IN
298 +|Hungarian (hu)|Hungary (HU)|hu-HU
299 +|Icelandic (is)|Iceland (IS)|is-IS
300 +|Indonesian (in)|Indonesia (ID)|in-ID[[(*)>>url:https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html#cldrlocale]][[url:https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html#cldrlocale]]
301 +|Irish (ga)|Ireland (IE)|ga-IE[[(*)>>url:https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html#cldrlocale]][[url:https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html#cldrlocale]]
302 +|Italian (it)|Italy (IT)|it-IT
308 308  
309 -~* - [[https:~~/~~/www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html#cldrlocale>>https://www.oracle.com/java/technologies/javase/jdk8-jre8-suported-locales.html#cldrlocale]]
310 -
311 311  Examples
312 312  
313 313  22/06/1981 would be described as dd/MM/YYYY, with locale en-GB
... ... @@ -520,8 +520,8 @@
520 520  
521 521  **Note**: The key order is NOT based on the Dimension order of the DSD, as the mapping needs to be resilient to the DSD changing.
522 522  
523 -1.
524 -11.
516 +1.
517 +11.
525 525  111. Mapping other data types to Code Id
526 526  
527 527  In the case where the incoming data type is not a string and not a code identifier i.e. the source Dimension is of type Integer and the target is Codelist. This is supported by the RepresentationMap. The RepresentationMap source can reference a Codelist, Valuelist, or be free text, the free text can include regular expressions.