Changes for page 12 Validation and Transformation Language (VTL)
Last modified by Artur on 2025/09/10 11:19
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -2,7 +2,8 @@ 2 2 {{toc/}} 3 3 {{/box}} 4 4 5 -== 12.1 Introduction == 5 +1. 6 +11. Introduction 6 6 7 7 The Validation and Transformation Language (VTL) supports the definition of Transformations, which are algorithms to calculate new data starting from already existing ones{{footnote}}The Validation and Transformation Language is a standard language designed and published under the SDMX initiative. VTL is described in the VTL User and Reference Guides available on the SDMX website https://sdmx.org.{{/footnote}}. The purpose of the VTL in the SDMX context is to enable the: 8 8 ... ... @@ -18,8 +18,9 @@ 18 18 19 19 This section does not explain the VTL language or any of the content published in the VTL guides. Rather, this is a description of how the VTL can be used in the SDMX context and applied to SDMX artefacts. 20 20 21 -== 12.2 References to SDMX artefacts from VTL statements == 22 -=== 12.2.1 Introduction === 22 +1. 23 +11. References to SDMX artefacts from VTL statements 24 +111. Introduction 23 23 24 24 The VTL can manipulate SDMX artefacts (or objects) by referencing them through predefined conventional names (aliases). 25 25 ... ... @@ -31,7 +31,9 @@ 31 31 32 32 The references through the URN and the abbreviated URN are described in the following paragraphs. 33 33 34 -=== 12.2.2 References through the URN === 36 +1. 37 +11. 38 +111. References through the URN 35 35 36 36 This approach has the advantage that in the VTL code the URN of the referenced artefacts is directly intelligible by a human reader but has the drawback that the references are verbose. 37 37 ... ... @@ -90,7 +90,9 @@ 90 90 91 91 'urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=AG:DF2(1.0.0)' 92 92 93 -=== 12.2.3 Abbreviation of the URN === 97 +1. 98 +11. 99 +111. Abbreviation of the URN 94 94 95 95 The complete formulation of the URN described above is exhaustive but verbose, even for very simple statements. In order to reduce the verbosity through a simplified identifier and make the work of transformation definers easier, proper abbreviations of the URN are possible. Using this approach, the referenced artefacts remain intelligible in the VTL code by a human reader. 96 96 ... ... @@ -102,7 +102,11 @@ 102 102 * The class-name can be omitted as it can be deduced from the VTL invocation. In particular, starting from the VTL class of the invoked artefact (e.g. dataset, component, identifier, measure, attribute, variable, valuedomain), which is known given the syntax of the invoking VTL operator{{footnote}}For the syntax of the VTL operators see the VTL Reference Manual{{/footnote}}, the SDMX class can be deduced from the mapping rules between VTL and SDMX (see the section "Mapping between VTL and SDMX" hereinafter){{footnote}}In case the invoked artefact is a VTL component, which can be invoked only within the invocation of a VTL data set (SDMX Dataflow), the specific SDMX class-name (e.g. Dimension, TimeDimension, Measure or DataAttribute) can be deduced from the data structure of the SDMX Dataflow, which the component belongs to.{{/footnote}}. 103 103 * If the agency-id is not specified, it is assumed by default equal to the agency-id of the TransformationScheme, UserDefinedOperatorScheme or RulesetScheme from which the artefact is invoked. For example, the agencyid can be omitted if it is the same as the invoking TransformationScheme and cannot be omitted if the artefact comes from another agency{{footnote}}If the Agency is composite (for example AgencyA.Dept1.Unit2), the agency is considered different even if only part of the composite name is different (for example AgencyA.Dept1.Unit3 is a different Agency than the previous one). Moreover the agency-id cannot be omitted in part (i.e., if a TransformationScheme owned by AgencyA.Dept1.Unit2 references an artefact coming from AgencyA.Dept1.Unit3, the specification of the agency-id becomes mandatory and must be complete, without omitting the possibly equal parts like AgencyA.Dept1){{/footnote}}. Take also into account that, according to the VTL consistency rules, the agency of the result of a Transformation must be the same as its TransformationScheme, therefore the agency-id can be omitted for all the results (left part of Transformation statements). 104 104 * As for the maintainedobject-id, this is essential in some cases while in other cases it can be omitted: o if the referenced artefact is a Dataflow, which is a maintainable class, the maintainedobject-id is the dataflow-id and obviously cannot be omitted; 105 -** if the referenced artefact is a Dimension, TimeDimension, Measure, DataAttribute, which are not maintainable and belong to the DataStructure maintainable class, the maintainedobject-id is the dataStructure-id and can be omitted, given that these components are always invoked within the invocation of a Dataflow, whose dataStructure-id can be deduced from the SDMX structural definitions; 111 +** if the referenced artefact is a Dimension, TimeDimension, Measure, 112 + 113 +DataAttribute, which are not maintainable and belong to the DataStructure maintainable class, the maintainedobject-id is the dataStructure-id and can be omitted, given that these components are always invoked within the invocation of a Dataflow, whose dataStructure-id can be deduced from the SDMX structural definitions; 114 + 115 +* 106 106 ** if the referenced artefact is a Concept, which is not maintainable and belong to the ConceptScheme maintainable class, the maintained object is the conceptScheme-id and cannot be omitted; 107 107 ** if the referenced artefact is a Codelist, which is a maintainable class, the maintainedobject-id is the codelist-id and obviously cannot be omitted. 108 108 * When the maintainedobject-id is omitted, the maintainedobject-version is omitted too. When the maintainedobject-id is not omitted and the maintainedobject-version is omitted, the version 1.0 is assumed by default. ... ... @@ -161,13 +161,17 @@ 161 161 162 162 The artefact (Component, Concept, Codelist …) which the Values are referred to can be deduced from the context in which the reference is made, taking also into account the VTL syntax. In the Transformation above, for example, the values 0 and 2500 are compared to the values of the measures of DF1(1.0.0). 163 163 164 -=== 12.2.4 User-defined alias === 174 +1. 175 +11. 176 +111. User-defined alias 165 165 166 166 The third possibility for referencing SDMX artefacts from VTL statements is to use user-defined aliases not related to the SDMX URN of the artefact. 167 167 168 168 This approach gives preference to the use of symbolic names for the SDMX artefacts. As a consequence, in the VTL code the referenced artefacts may become not directly intelligible by a human reader. In any case, the VTL aliases are associated to the SDMX URN through the VtlMappingScheme and VtlMapping classes. These classes provide for structured references to SDMX artefacts whatever kind of reference is used in VTL statements (URN, abbreviated URN or user-defined aliases). 169 169 170 -=== 12.2.5 References to SDMX artefacts from VTL Rulesets === 182 +1. 183 +11. 184 +111. References to SDMX artefacts from VTL Rulesets 171 171 172 172 The VTL Rulesets allow defining sets of reusable Rules that can be applied by some VTL operators, like the ones for validation and hierarchical roll-up. A "Rule" consists in a relationship between Values belonging to some Value Domains or taken by some Variables, for example: (i) when the Country is USA then the Currency is USD; (ii) the Benelux is composed by Belgium, Luxembourg, Netherlands. 173 173 ... ... @@ -179,8 +179,9 @@ 179 179 180 180 In the body of the Rulesets, the Codes and in general all the Values can be written without any other specification, because the artefact, which the Values are referred (Codelist, Concept) to can be deduced from the Ruleset signature. 181 181 182 -== 12.3 Mapping between SDMX and VTL artefacts == 183 -=== 12.3.1. When the mapping occurs === 196 +1. 197 +11. Mapping between SDMX and VTL artefacts 198 +111. When the mapping occurs 184 184 185 185 The mapping methods between the VTL and SDMX object classes allow transforming a SDMX definition in a VTL one and vice-versa for the artefacts to be manipulated. It should be remembered that VTL programs (i.e. Transformation Schemes) are represented in SDMX through the TransformationScheme maintainable class which is composed of Transformations (nameable artefacts). Each Transformation assigns the outcome of the evaluation of a VTL expression to a result: the input operands of the expression and the result can be SDMX artefacts. Every time a SDMX object is referenced in a VTL Transformation as an input operand, there is the need to generate a VTL definition of the object, so that the VTL operations can take place. This can be made starting from the SDMX definition and applying a SDMX-VTL mapping method in the direction from SDMX to VTL. The possible mapping methods from SDMX to VTL are described in the following paragraphs and are conceived to allow the automatic deduction of the VTL definition of the object from the knowledge of the SDMX definition. 186 186 ... ... @@ -188,7 +188,9 @@ 188 188 189 189 The mapping methods from VTL to SDMX are described in the following paragraphs as well, however they do not allow the complete SDMX definition to be automatically deduced from the VTL definition, more than all because the former typically contains additional information in respect to the latter. For example, the definition of a SDMX DSD includes also some mandatory information not available in VTL (like the concept scheme to which the SDMX components refer, the ‘usage’ and ‘attributeRelationship’ for the DataAttributes and so on). Therefore the mapping methods from VTL to SDMX provide only a general guidance for generating SDMX definitions properly starting from the information available in VTL, independently of how the SDMX definition it is actually generated (manually, automatically or part and part). 190 190 191 -=== 12.3.2 General mapping of VTL and SDMX data structures === 206 +1. 207 +11. 208 +111. General mapping of VTL and SDMX data structures 192 192 193 193 This section makes reference to the VTL "Model for data and their structure"{{footnote}}See the VTL 2.0 User Manual{{/footnote}} and the correspondent SDMX "Data Structure Definition"{{footnote}}See the SDMX Standards Section 2 – Information Model{{/footnote}}. 194 194 ... ... @@ -204,9 +204,11 @@ 204 204 205 205 The possible mapping options are described in more detail in the following sections. 206 206 207 -=== 12.3.2 Mapping from SDMX to VTL data structures === 224 +1. 225 +11. 226 +111. Mapping from SDMX to VTL data structures 208 208 209 - ====12.3.3.1 Basic Mapping====228 +**12.3.3.1 Basic Mapping** 210 210 211 211 The main mapping method from SDMX to VTL is called **Basic **mapping. This is considered as the default mapping method and is applied unless a different method is specified through the VtlMappingScheme and VtlDataflowMapping classes. 212 212 ... ... @@ -222,7 +222,7 @@ 222 222 223 223 With the Basic mapping, one SDMX observation^^27^^ generates one VTL data point. 224 224 225 - ====12.3.3.2 Pivot Mapping====244 +**12.3.3.2 Pivot Mapping** 226 226 227 227 An alternative mapping method from SDMX to VTL is the **Pivot **mapping, which makes sense and is different from the Basic method only for the SDMX data structures that contain a Dimension that plays the role of measure dimension (like in SDMX 2.1) and just one Measure. Through this method, these structures can be mapped to multimeasure VTL data structures. Besides that, a user may choose to use any Dimension acting as a list of Measures (e.g., a Dimension with indicators), either by considering the “Measure” role of a Dimension, or at will using any coded Dimension. Of course, in SDMX 3.0, this can only work when only one Measure is defined in the DSD. 228 228 ... ... @@ -253,6 +253,7 @@ 253 253 |DataAttribute not depending on the MeasureDimension|Attribute 254 254 |DataAttribute depending on the MeasureDimension|((( 255 255 One Attribute for each Code of the 275 + 256 256 SDMX MeasureDimension 257 257 ))) 258 258 ... ... @@ -265,10 +265,13 @@ 265 265 266 266 Identifiers, (time) Identifier and Attributes. 267 267 268 -* The value of the Measure of the SDMX observation belonging to the set above and having MeasureDimension=Cj becomes the value of the VTL Measure Cj 288 +* The value of the Measure of the SDMX observation belonging to the set above and having MeasureDimension=Cj becomes the value of the VTL Measure 289 + 290 +Cj 291 + 269 269 * For the SDMX DataAttributes depending on the MeasureDimension, the value of the DataAttribute DA of the SDMX observation belonging to the set above and having MeasureDimension=Cj becomes the value of the VTL Attribute DA_Cj 270 270 271 - ====12.3.3.3 From SDMX DataAttributes to VTL Measures====294 +**12.3.3.3 From SDMX DataAttributes to VTL Measures** 272 272 273 273 * In some cases, it may happen that the DataAttributes of the SDMX DataStructure need to be managed as Measures in VTL. Therefore, a variant of both the methods above consists in transforming all the SDMX DataAttributes in VTL Measures. When DataAttributes are converted to Measures, the two methods above are called Basic_A2M and Pivot_A2M (the suffix "A2M" stands for Attributes to Measures). Obviously, the resulting VTL data structure is, in general, multi-measure and does not contain 274 274 ... ... @@ -278,9 +278,11 @@ 278 278 279 279 Proper VTL features allow changing the role of specific attributes even after the SDMX to VTL mapping: they can be useful when only some of the DataAttributes need to be managed as VTL Measures. 280 280 281 -=== 12.3.4 Mapping from VTL to SDMX data structures === 304 +1. 305 +11. 306 +111. Mapping from VTL to SDMX data structures 282 282 283 - ====12.3.4.1 Basic Mapping====308 +**12.3.4.1 Basic Mapping** 284 284 285 285 The main mapping method **from VTL to SDMX** is called **Basic **mapping as well. 286 286 ... ... @@ -304,7 +304,7 @@ 304 304 305 305 As said, the resulting SDMX definitions must be compliant with the SDMX consistency rules. For example, the SDMX DSD must have the AttributeRelationship for the DataAttributes, which does not exist in VTL. 306 306 307 - ====12.3.4.2 Unpivot Mapping====332 +**12.3.4.2 Unpivot Mapping** 308 308 309 309 An alternative mapping method from VTL to SDMX is the **Unpivot **mapping. 310 310 ... ... @@ -340,7 +340,7 @@ 340 340 341 341 In any case, the resulting SDMX definitions must be compliant with the SDMX consistency rules. For example, the possible Codes of the SDMX MeasureDimension need to be listed in a SDMX Codelist, with proper id, agency and version; moreover, the SDMX DSD must have the AttributeRelationship for the DataAttributes, which does not exist in VTL. 342 342 343 - ====12.3.4.3 From VTL Measures to SDMX Data Attributes====368 +**12.3.4.3 From VTL Measures to SDMX Data Attributes** 344 344 345 345 More than all for the multi-measure VTL structures (having more than one Measure Component), it may happen that the Measures of the VTL Data Structure need to be managed as DataAttributes in SDMX. Therefore, a third mapping method consists in transforming some VTL measures in a corresponding SDMX Measures and all the other VTL Measures in SDMX DataAttributes. This method is called M2A (“M2A” stands for “Measures to DataAttributes”). 346 346 ... ... @@ -357,7 +357,9 @@ 357 357 358 358 Even in this case, the resulting SDMX definitions must be compliant with the SDMX consistency rules. For example, the SDMX DSD must have the attributeRelationship for the DataAttributes, which does not exist in VTL. 359 359 360 -=== 12.3.5 Declaration of the mapping methods between data structures === 385 +1. 386 +11. 387 +111. Declaration of the mapping methods between data structures 361 361 362 362 In order to define and understand properly VTL Transformations, the applied mapping methods must be specified in the SDMX structural metadata. If the default mapping method (Basic) is applied, no specification is needed. 363 363 ... ... @@ -367,10 +367,14 @@ 367 367 368 368 The VtlMappingScheme is a container for zero or more VtlDataflowMapping (it may contain also mappings towards artefacts other than dataflows). 369 369 370 -=== 12.3.6 Mapping dataflow subsets to distinct VTL Data Sets === 397 +1. 398 +11. 399 +111. Mapping dataflow subsets to distinct VTL Data Sets 371 371 372 -Until now it has been assumed to map one SMDX Dataflow to one VTL Data Set and vice-versa. This mapping one-to-one is not mandatory according to VTL because a VTL Data Set is meant to be a set of observations (data points) on a logical plane, having the same logical data structure and the same general meaning, independently of the possible physical representation or storage (see VTL 2.0 User Manual page 24), therefore a SDMX Dataflow can be seen either as a unique set of data observations (corresponding to one VTL Data Set) or as the union of many sets of data observations (each one corresponding to a distinct VTL Data Set).401 +Until now it has been assumed to map one SMDX Dataflow to one VTL Data Set and vice-versa. This mapping one-to-one is not mandatory according to VTL because a VTL Data Set is meant to be a set of observations (data points) on a logical plane, having the same logical data structure and the same general meaning, independently of the possible physical representation or storage (see VTL 2.0 User Manual page 24), therefore a SDMX Dataflow can be seen either as a unique set of data observations 373 373 403 +(corresponding to one VTL Data Set) or as the union of many sets of data observations (each one corresponding to a distinct VTL Data Set). 404 + 374 374 As a matter of fact, in some cases it can be useful to define VTL operations involving definite parts of a SDMX Dataflow instead than the whole.{{footnote}}A typical example of this kind is the validation, and more in general the manipulation, of individual time series belonging to the same Dataflow, identifiable through the DimensionComponents of the Dataflow except the TimeDimension. The coding of these kind of operations might be simplified by mapping distinct time series (i.e. different parts of a SDMX Dataflow) to distinct VTL Data Sets.{{/footnote}} 375 375 376 376 Therefore, in order to make the coding of VTL operations simpler when applied on parts of SDMX Dataflows, it is allowed to map distinct parts of a SDMX Dataflow to distinct VTL Data Sets according to the following rules and conventions. This kind of mapping is possible both from SDMX to VTL and from VTL to SDMX, as better explained below.{{footnote}}Please note that this kind of mapping is only an option at disposal of the definer of VTL Transformations; in fact it remains always possible to manipulate the needed parts of SDMX Dataflows by means of VTL operators (e.g. “sub”, “filter”, “calc”, “union” …), maintaining a mapping one-to-one between SDMX Dataflows and VTL Data Sets.{{/footnote}}