Changes for page Guidelines for SDMX Data Structure Definitions
Last modified by Artur K. on 2026/05/29 14:28
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Author
-
... ... @@ -1,1 +1,1 @@ 1 -xwiki:XWiki. arturkryazhev1 +xwiki:XWiki.helena - Content
-
... ... @@ -14,17 +14,17 @@ 14 14 15 15 Target audiences for these guidelines include domain experts and official statisticians involved in DSD development. Thus focusing on the business/content side of DSD development, the document tries to avoid technical jargon when explaining underlying concepts and ideas, but tries to still be useful for IT experts supporting SDMX implementations. Ideally, the document can bridge the gap between IT and statistical experts. The scope of the guidelines is restricted to conceptual aspects. Organizational and technical aspects are treated in separate documents. 16 16 17 -* Code lists are the crucial building blocks of data structure definitions. Especially in the case of SDMX recommended code lists (particularly for cross-domain concepts; see SDMX Content Oriented Guidelines under “Guidelines” at [[http:~~/~~/sdmx.org/>>https://http:sdmx.org]]), list development and maintenance as well as DSD development and maintenance are carried out by different organizations at different points in time. For example SDMX recommended code lists for frequency and observation status already exist and should be used by reference in DSDs. While “SDMX” is responsible for the maintenance of these code lists, the DSD developing organization will be responsible for the maintenance of the DSD, that is, for the structure at a higher level. (Of course, a global DSD may also have “SDMX” as maintenance agency.) In any case, there is a strong interrelationship between DSD and code list development and maintenance (see SDMX Guidelines for the creation and maintenance of code lists under “Guidelines” at[[http:~~/~~/sdmx.org/>>https://http:sdmx.org?page_id=13]]).18 -* Maintenance and governance rules for DSDs including issues of updating, versioning, retiring as well as questions of responsibilities, especially relevant in the context of global DSDs jointly developed by multiple organizations and maintained by “SDMX” (or multiple organizations), will be covered by separate guidelines (see “Guidelines” at [[http:~~/~~/sdmx.org/>>https://http:sdmx.org]]).19 -* Issues related to SDMX registries (in general, and the global SDMX registry in particular) such as storage, federation, and registration of, as well as search for, retrieval and download of, code lists and DSDs are not in the scope of this document. For more information on the registry see the “Standards” page at [[http:~~/~~/sdmx.org/>>https://http:sdmx.org]].20 -* Guidelines for the development, maintenance, and governance of metadata structure definitions (MSDs) will be made available separately under “Guidelines” at [[http:~~/~~/sdmx.org/>>https://http:sdmx.org]].21 -* Documentation on more IT-related issues is available at the SDMX IT tools and SDMX tutorials site at [[http:~~/~~/sdmx.org/?page_id=13>>https://http:sdmx.org?page_id=13]].The SDMX Tools Repository can be accessed at[[http:~~/~~/www.sdmxtools.org/.>>https://http:www.sdmxtools.org.WebHome]]Many of the SDMX tools listed and described there are available free of charge.17 +* Code lists are the crucial building blocks of data structure definitions. Especially in the case of SDMX recommended code lists (particularly for cross-domain concepts; see SDMX Content Oriented Guidelines under “Guidelines” at http:~/~/sdmx.org/), list development and maintenance as well as DSD development and maintenance are carried out by different organizations at different points in time. For example SDMX recommended code lists for frequency and observation status already exist and should be used by reference in DSDs. While “SDMX” is responsible for the maintenance of these code lists, the DSD developing organization will be responsible for the maintenance of the DSD, that is, for the structure at a higher level. (Of course, a global DSD may also have “SDMX” as maintenance agency.) In any case, there is a strong interrelationship between DSD and code list development and maintenance (see SDMX Guidelines for the creation and maintenance of code lists under “Guidelines” at http:~/~/sdmx.org/). 18 +* Maintenance and governance rules for DSDs including issues of updating, versioning, retiring as well as questions of responsibilities, especially relevant in the context of global DSDs jointly developed by multiple organizations and maintained by “SDMX” (or multiple organizations), will be covered by separate guidelines (see “Guidelines” at http:~/~/sdmx.org/). 19 +* Issues related to SDMX registries (in general, and the global SDMX registry in particular) such as storage, federation, and registration of, as well as search for, retrieval and download of, code lists and DSDs are not in the scope of this document. For more information on the registry see the “Standards” page at http:~/~/sdmx.org/. 20 +* Guidelines for the development, maintenance, and governance of metadata structure definitions (MSDs) will be made available separately under “Guidelines” at http:~/~/sdmx.org/. 21 +* Documentation on more IT-related issues is available at the SDMX IT tools and SDMX tutorials site at http:~/~/sdmx.org/?page_id=13. The SDMX Tools Repository can be accessed at http:~/~/www.sdmxtools.org/. Many of the SDMX tools listed and described there are available free of charge. 22 22 23 23 This document is structured as follows. Section 2 outlines general design principles of DSDs. Section 3 discusses different usage contexts of DSDs in more detail. Section 4 gives an overview of different data structuring approaches including benefits, drawbacks, and contextspecific recommendations. General minimum structural and semantic requirements are discussed in section 5. Section 6 provides a step-by-step guide to designing DSDs including a checklist for DSD designers. The three annexes include a glossary in Annex 1, a definition and brief introduction of the core components of a DSD in Annex 2, and a list of references in Annex 3. 24 24 25 25 = 2 GENERAL DESIGN PRINCIPLES = 26 26 27 -Besides the evident requirement of //standard compliance//, a couple of general design principles apply to SDMX DSD development independently of the domain and the particular usage context the DSD is embedded in. Examples include //flexibility in changing requirements//; //stability//; //usage of existing// code lists or even DSDs; and //parsimony//, //simplicity//, //unambiguousness//, and //density// of the dimensional model. Please note that the SDMX-ML Standards do not impose an order on concepts (i.e., dimensions and attributes). Strictly speaking, standard compliance of a DSD only entails technical compliance with the SDMX technical standard. However, //adherence to// SDMX content recommendations, principles, and best practices as provided in the //SDMX Content-Oriented Guidelines// (see [[http:~~/~~/sdmx.org/?page_id=11>>https://http:sdmx.org?page_id=11]]) is strongly recommended. It should be kept in mind that one major aim of SDMX is to have transparency and agreement on the meaning of statistical concepts in order to allow their flawless communication.27 +Besides the evident requirement of //standard compliance//, a couple of general design principles apply to SDMX DSD development independently of the domain and the particular usage context the DSD is embedded in. Examples include //flexibility in changing requirements//; //stability//; //usage of existing// code lists or even DSDs; and //parsimony//, //simplicity//, //unambiguousness//, and //density// of the dimensional model. Please note that the SDMX-ML Standards do not impose an order on concepts (i.e., dimensions and attributes). Strictly speaking, standard compliance of a DSD only entails technical compliance with the SDMX technical standard. However, //adherence to// SDMX content recommendations, principles, and best practices as provided in the //SDMX Content-Oriented Guidelines// (see http:~/~/sdmx.org/?page_id=11) is strongly recommended. It should be kept in mind that one major aim of SDMX is to have transparency and agreement on the meaning of statistical concepts in order to allow their flawless communication. 28 28 29 29 == 2.1 Reuse of existing DSDs and code lists == 30 30 ... ... @@ -61,7 +61,7 @@ 61 61 62 62 === 2.1.3 Suitability of available DSDs and code lists === 63 63 64 -In case an existing DSD is close to but differs from what is needed, it may: {{{(i)}}}contain irrelevant concepts, (ii) lack some required concepts, (iii) use the concepts in different roles than required, (iv) deviate with respect to some of the code lists, or (v) contain pure dimensions when mixed dimensions would make more sense or vice versa. More complex situations that are combinations of several (or even all) of these five cases may occur as well. For example, an existing DSD could contain unnecessary concepts and lack other concepts at the same time.64 +In case an existing DSD is close to but differs from what is needed, it may: (i) contain irrelevant concepts, (ii) lack some required concepts, (iii) use the concepts in different roles than required, (iv) deviate with respect to some of the code lists, or (v) contain pure dimensions when mixed dimensions would make more sense or vice versa. More complex situations that are combinations of several (or even all) of these five cases may occur as well. For example, an existing DSD could contain unnecessary concepts and lack other concepts at the same time. 65 65 66 66 ==== 2.1.3.1 Irrelevant concepts ==== 67 67 ... ... @@ -243,7 +243,7 @@ 243 243 244 244 The decision on content and number of concepts in a DSD usually leads to the question of how far the “//indicator//” dimension should be decomposed. There are some (cross-domain) concepts, such as geographical and temporal reference and unit of measure, that are relevant in most DSDs. Once those are defined (the usage of the SDMX COG is highly recommended!) the actual “//subject-matter//” or “//domain//” concepts remain. One option is to combine all those concepts into one “indicator” dimension which may make sense in certain scenarios, for example for smaller single-domain, single-purpose DSDs with few or no crossclassifications or for display in an end-user dissemination tool. The other extreme strategy is to decompose into as many components as possible by splitting any breakdown concepts from the core indicator concept. 245 245 246 -The range of options between the “//just one//” (mixed) and “//all component//” subject-matter dimensions approaches is subject to the comprehensiveness (i.e. size, coverage) of the data exchange that the DSD is being developed for. If using a “mixed dimensions” approach, rules for the composition of the mixed dimension(s) may be specified (e.g. concatenate concepts A, B, and C to get mixed dimension X), allowing their easy re-decomposition. In general composite dimensions should be avoided as previously recommended by the SDMX Technical Notes, but there are cases that suggest the usage of composite dimensions. Table 4 juxtaposes general pros and cons of the “//many pure concepts//” and “//fewer composite concepts//” approaches. 246 +The range of options between the “//just one//”// //(mixed) and “//all component//” subject-matter dimensions approaches is subject to the comprehensiveness (i.e. size, coverage) of the data exchange that the DSD is being developed for. If using a “mixed dimensions” approach, rules for the composition of the mixed dimension(s) may be specified (e.g. concatenate concepts A, B, and C to get mixed dimension X), allowing their easy re-decomposition. In general composite dimensions should be avoided as previously recommended by the SDMX Technical Notes, but there are cases that suggest the usage of composite dimensions. Table 4 juxtaposes general pros and cons of the “//many pure concepts//” and “//fewer composite concepts//” approaches. 247 247 248 248 **Table 4. General comparison of data structuring approaches** 249 249 ... ... @@ -315,7 +315,7 @@ 315 315 316 316 The “one DSD” approach works best for single-domain and/or single-purpose scenarios. In more complex scenarios, more complex approaches are more suitable. Usage of the “one DSD” approach in a multi-domain or multi-purpose scenario actually means that one master DSD containing all concepts, code lists, and codes relevant in any (but most likely not all) domains and/or purposes is used by all domains and/or purposes without constraints. If a “many pure concepts” approach is used, the DSD will be sparse and require many “not applicable” values or structure maps. 317 317 318 -In those more complex scenarios, multi-DSD approaches have more potential. The “master DSD + satellite DSDs” approach imposes more restrictions and aims at a higher degree of content harmonization than the more loosely coupled (or even independent) multi-DSD approach. While the former specifies the concepts and code lists to be used by all derived DSDs, the latter is more flexible. Therefore, the master + satellites approach is suggested for data exchange scenarios with a high degree of harmonization/standardization required such as at the international level or between national and international organizations. Please note that what is termed “master DSD + satellite DSDs” approach here may also be implemented as master DSD plus constrained data flows with or without using structure maps. 318 +In those more complex scenarios, multi-DSD approaches have more potential. The “master DSD + satellite DSDs” approach imposes more restrictions and aims at a higher degree of content harmonization than the more loosely coupled (or even independent) multi-DSD approach. While the former specifies the concepts and code lists to be used by all derived DSDs, the latter is more flexible. Therefore, the master + satellites approach is suggested for data exchange scenarios with a high degree of harmonization / standardization required such as at the international level or between national and international organizations. Please note that what is termed “master DSD + satellite DSDs” approach here may also be implemented as master DSD plus constrained data flows with or without using structure maps. 319 319 320 320 Even in the multiple independent DSDs approach, sharing of concepts and code lists by reference is recommended. This may be problematic if additional codes are needed by certain DSDs, as neither the addition of codes to a code list used by reference nor the concatenation of multiple code lists included by reference is supported by the current SDMX Technical Standards. The only way of implementing “combined” code lists by reference is to reference each single code from each relevant partial code list. 321 321 ... ... @@ -332,7 +332,7 @@ 332 332 )))|(% colspan="2" %)use if harmonization is important in covered domains or purposes or if such a set of DSDs is already available at international level|easier to do than master + satellite approach each domain/purpose can maintain DSDs independently can be created on the fly from structured databases 333 333 |**between national organizations**|(% colspan="4" %)the same applies as to the “within organization” scenario 334 334 |**between int. organization and national organizations**|(% colspan="2" %)best for single domain, single purpose scenarios that are usually rather restricted with very clear specification of what needs to be exchanged|preferable over multiDSD approach in case of multi-domain and/or multi-purpose scenarios with highly correlated data flows for maintenance reasons|((( 335 -for multi-domain and/or multipurpose scenarios; only recommended if overlap of domains/purposes is minor (e.g. just w.r.t. cross-domain concepts) equivalent to multiple “one DSD” solutions, one for each domain/purpose 335 +for multi-domain and/or multipurpose scenarios; only recommended if overlap of domains/purposes is minor (e.g. just w.r.t. cross-domain concepts) equivalent to multiple “one DSD” solutions, one for each domain / purpose 336 336 ))) 337 337 |**between international organizations**|(% colspan="3" %)comparable to “national to international” scenario| 338 338 |**dissemination to public**|(% colspan="2" %)for single-domain, single-purpose cases in more complex cases this may be the preferable approach for data discovery tools (one data structure to find and access all data)|(% colspan="2" %)((( ... ... @@ -340,7 +340,7 @@ 340 340 341 341 * if it is relevant for the public to see the relationship between the data structures: use master + satellites approach 342 342 * otherwise the multi-DSD option is preferable, although with the highest possible degree of re-use of code lists and concepts 343 -* in both cases: important to include only concepts, code lists, and codes actually available/used by the data 343 +* in both cases: important to include only concepts, code lists, and codes actually available / used by the data 344 344 ))) 345 345 346 346 In general, finding the “perfect” data structure is less important for bilateral data exchange. Independent, custom-tailored DSDs may do the job quite well, as harmonization and standardization are typically not of high importance. If the data exchange is just a part of a more comprehensive scenario (e.g. multi-purpose, multi-domain, gateway, or data-sharing scenarios), a master DSD with satellite DSDs is preferable. ... ... @@ -367,7 +367,7 @@ 367 367 368 368 Certain concepts can be broadly agreed upon as being relevant in any data exchange, although their roles may differ between scenarios. The SDMX Content-Oriented Guidelines define many of these cross-domain concepts and, thus, should be referred to for further details on their specification. 369 369 370 -In general, multi-purpose and multi-domain scenarios may require more concepts than single-purpose and/or – domain scenarios. This mainly applies to subject-matter (or domainspecific) concepts and concepts that inform about the data source, provider, or process.370 +In general, multi-purpose and multi-domain scenarios may require more concepts than single-purpose and/or –domain scenarios. This mainly applies to subject-matter (or domainspecific) concepts and concepts that inform about the data source, provider, or process. 371 371 372 372 Exchanges between organizations, especially on an international level, typically require more concepts to cover context information, as data are transferred out of their usual context, meaning that users in the new context do not have the same knowledge of the data and may need additional background information. For exchanges of data within an organization, some context information may be common (implicit) knowledge so that it does not need to be made explicit in the data structure. 373 373 ... ... @@ -614,7 +614,7 @@ 614 614 615 615 == 9.1 SDMX Documents == 616 616 617 -The SDMX documents referred to in these guidelines as well as the complete technical specification of the SDMX Technical Standard 2.1 (and earlier versions) are available online at [[http:~~/~~/sdmx.org/>>https://http:sdmx.org]].The SDMX documents currently under development by the Statistical and Technical Working Groups will also be made available on the SDMX website.617 +The SDMX documents referred to in these guidelines as well as the complete technical specification of the SDMX Technical Standard 2.1 (and earlier versions) are available online at http:~/~/sdmx.org/. The SDMX documents currently under development by the Statistical and Technical Working Groups will also be made available on the SDMX website. 618 618 619 619 === 9.1.1 Existing documents === 620 620 ... ... @@ -638,12 +638,10 @@ 638 638 639 639 == 9.2 Non-SDMX Documents == 640 640 641 -6th Edition of the IMF's Balance of Payments Manual (BPM6). Available online at [[http:~~/~~/www.imf.org/external/pubs/ft/bop/2007/bopman6.htm>>https://http:www.imf.orgexternalpubsftbop2007bopman6.htm||rel="noopener noreferrer" target="_blank"]].641 +6th Edition of the IMF's Balance of Payments Manual (BPM6). Available online at http:~/~/www.imf.org/external/pubs/ft/bop/2007/bopman6.htm. 642 642 643 -METIS: Generic Statistical Business Process Model (GSBPM). Available online at [[http:~~/~~/www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+Model>>https://http:www1.unece.orgstatplatformdisplaymetisThe+Generic+Statistical+Business+Process+Model||rel="noopener noreferrer" target="_blank"]].643 +METIS: Generic Statistical Business Process Model (GSBPM). Available online at http:~/~/www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+Model. UN's System of National Accounts Manual 2008 (SNA2008). Available online at http:~/~/unstats.un.org/unsd/nationalaccount/sna2008.asp. 644 644 645 -UN's System of National Accounts Manual 2008 (SNA2008). Available online at [[http:~~/~~/unstats.un.org/unsd/nationalaccount/sna2008.asp>>https://http:unstats.un.orgunsdnationalaccountsna2008.asp||rel="noopener noreferrer" target="_blank"]]. 646 - 647 647 ---- 648 648 649 649 {{putFootnotes/}}