Wiki source code of 2 What is SDMX

Version 19.1 by Artur on 2025/09/11 13:35

Hide last authors
Artur 1.1 1 {{box title="**Contents**"}}
2 {{toc/}}
3 {{/box}}
4
5 == 2.1 Introduction ==
6
Helena 2.1 7 This chapter provides some background on the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Initiative, the issues which [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] addresses, the history of the standards and guidelines which have come out of this initiative, and the areas in which [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is playing a role today. This chapter also provides some guidance about how prospective users should think about [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] as it relates to their own part of the statistical process.
Artur 1.1 8
Elena 4.2 9 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] offers a wide variety of technical tools, and these – like any tools – if used well, they produce positive results. [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] also offers a set of guidelines regarding the application of harmonized statistical[[ concepts>>doc:Glossary.Concept.WebHome]] to [[data sets>>doc:sdmx:Glossary.Data set.WebHome]], and how these can be represented. Other guidelines address the classification of statistical data and domains, and the harmonization of relevant terminology. Additionally, [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] represents a framework for the process of harmonization within domains. All of these different aspects are considered here.
Artur 1.1 10
11 == 2.2 Background – Official Statistics ==
12
Elena 4.3 13 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] comes out of the world of official statistics. If you work for a national or international statistical agency, you already understand “official statistics”, but many people who work with statistical data may not understand this domain, so we will provide a brief, high-level description.
Artur 1.1 14
15 “Official statistics” are the data which is collected and disseminated by a set of governmental and international organizations to provide the factual basis for making policy and supporting research. Some countries have a “national statistical office” (NSO) while others may have several governmental organizations which are charged with collecting statistical data for governmental use. Most countries also have central banks or similar organizations which collect and disseminate financial and economic data.
16
17 Typically, several national government organizations have a statistical function (ministries of education, justice, labour, etc.).
18
19 These national organizations typically report their statistics to a set of supra-national organizations, representing either regions of the globe (examples include Eurostat and the European Central Bank) or domains (examples include the World Health Organization, the Food and Agriculture Organization, UNESCO, and the World Bank). Many of these organizations belong to the UN, or are treaty organizations.
20
Elena 4.3 21 All of these organizations exchange, report, and disseminate data in a chain which can be understood as starting at the lowest level within each country, and resulting in highlevel [[data sets>>doc:sdmx:Glossary.Data set.WebHome]] which are “aggregated” as they move through various levels to reach the international level.
Artur 1.1 22
23 The system of official statistics is this network of reported data, according to legal requirements or other types of agreements. There are several important meetings, conferences, and initiatives within this system, so that all organizations adopt similar approaches and techniques, and to coordinate reporting: the Conference of European Statisticians is an important meeting, as is the United Nations Statistical Commission meeting. Ultimately the goal is to measure important phenomenon occurring in the world, and to report the data to policy makers, students, journalists, and other users to help inform their activity. The data is “official” because it comes with the reputation of the world’s governments and international institutions behind it.
24
25 == 2.3 The History of SDMX and its Work Products ==
26
Helena 2.1 27 In 2001, the heads of seven international statistical institutions came together to form [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], with the goal of taking concrete steps towards addressing issues around statistical exchange. These organizations became the sponsors of the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Initiative: the Bank for International Settlements, the European Central Bank, Eurostat, the International Monetary Fund, the Organization for Economic Co-operation and Development, the UN Statistics Division, and the World Bank. They created an initiative with a governing sponsors committee, and a secretariat function to execute the work programme.
Artur 1.1 28
29 The issues can be briefly characterized as follows:
30
31 * Statistical collection, processing, and exchange is time-consuming and resourceintensive
32 * Various international and national organisations have individual approaches for their constituencies
33 * Uncertainties (in 2001) about how to proceed with new technologies (XML, web services etc.)
34
Helena 2.1 35 The [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Initiative stated that it would address these issues:
Artur 1.1 36
37 * //By focusing on business practices in the field of statistical information//
38 * //By identifying more efficient processes for exchange and sharing of data and metadata using modern technology//
39
40 The initial projects of the initiative, based largely on work already on-going among various of the sponsor organizations, were:
41
42 * //A practical case study on emerging e-standards for data exchange//
43 * //Maintaining and advancing existing standards for time series data exchange//
44 * //Creation of a common vocabulary for statistical metadata//
Helena 2.1 45 * //Development of a framework for [[metadata repositories>>doc:sdmx:Glossary.Metadata repository.WebHome]]//
Artur 1.1 46
47 It was further stated that: “New standards should take advantage of the new web-based technologies and the expertise of those working on the business requirements and IT support for the collection, compilation, and dissemination of statistical information.”
48
Helena 2.1 49 Thus, the goals of the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] initiative were ones which were broadly agreed across the sponsoring organizations, and within the official statistics community generally. It is important to understand that there were some firm foundations on which [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] was building:
Artur 1.1 50
51 1. An existing standard for exchanging statistical data, known as GESMES/TS, was already in use among several of the sponsor organizations and their nationallevel counterparties. This was based not on modern Web technologies such as XML, but used the older UN/EDIFACT syntax.
Elena 4.2 52 1. The work on the “metadata common vocabulary” was based on many years of harmonization work within the community, notably Eurostat’s (% style="color:#e74c3c" %)//Concept//(%%)// and Definitions Database (CODED)// and the //OECD Glossary of Terms//.
Artur 1.1 53
Helena 2.1 54 The formation of the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Initiative can be understood as a recognition by the sponsor organizations that working together to address these issues, and that coordinating business approaches using modern, standards-based technology, was the best way forward. In one sense, [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] evolved from earlier work, but indicated the increased commitment the sponsors had toward reaching its goals. It also represents a comingtogether of efforts around harmonizing statistical content and terminology, and for deploying technology to support statistical processes.
Artur 1.1 55
Helena 2.1 56 Over time, the work of the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Initiative has expanded, both in terms of contentoriented work products and technical ones. We will describe the evolution of these work products below.
Artur 1.1 57
Helena 2.1 58 The [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Initiative decided early on to position the content-oriented work and the work on technology and standards in a fashion which made these strains of work separate but complimentary. The content-oriented work led to the development of the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] ContentOriented Guidelines, while the technical work resulted in the [[SDMX Technical Specifications>>doc:sdmx:Glossary.SDMX Technical Specification.WebHome]]. There were several reasons for taking this approach. It reflected the realization that technical specifications must be very precise and detailed in order to allow for automation of statistical exchanges – the programming of computers relies on having very specific rules about how applications communicate, otherwise the communication fails. The [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] technical standards in one sense function as exchange protocols for machine-to-machine communications (similar to HTTP, for example, but with a focus on specifically statistical exchanges).
Artur 1.1 59
Helena 2.1 60 Statistical content and terminology issues are very different – they are the subject to interpretation and analysis by trained statisticians. Thus, the technology specifications formed a basis for supporting work on the content side, but in fact are a very different type of work product. It is easiest to see this in the fact that the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] [[Content-Oriented Guidelines>>doc:sdmx:Glossary.Content-Oriented Guidelines.WebHome]] are //guidelines, //to help suggest approaches to people in their statistical work,// //while the [[SDMX Technical Specifications>>doc:sdmx:Glossary.SDMX Technical Specification.WebHome]] are //specifications //- rules for developing conforming computer applications//.//
Artur 1.1 61
Helena 2.1 62 Another reason for this separation is that the technical specifications and content guidelines were expected to be maintained at different rates – once stable, technical specifications tend to be updated less frequently. Also, the reasons for making updates and changes in each area have no dependency between them, so it made sense to separate them. This is reflected in the fact that the technical specifications are submitted and published through the International Standards Organization (ISO), who publish many IT-related standards in various domains, while the [[content-oriented guidelines>>doc:sdmx:Glossary.Content-Oriented Guidelines.WebHome]] are not submitted to ISO, but are maintained by the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Initiative itself. This allows for updates of the [[content-oriented guidelines>>doc:sdmx:Glossary.Content-Oriented Guidelines.WebHome]] on an on-going basis.
Artur 1.1 63
Helena 2.1 64 A third reason for the separation of the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Technical Standards and the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] [[Content-Oriented Guidelines>>doc:sdmx:Glossary.Content-Oriented Guidelines.WebHome]] is that – because they are a technological foundation for exchanging //any// statistics – the technical specifications are applicable outside the domain of official statistics, while the [[content-oriented guidelines>>doc:sdmx:Glossary.Content-Oriented Guidelines.WebHome]] are specifically designed to be useful within that context (although they might also be useful outside that community, possibly).
Artur 1.1 65
Helena 2.1 66 This coordinated-but-separate positioning of the two threads of work has proven to be very useful, too, because often statisticians and economists do not have deep expertise in IT, and technologists do not have deep expertise in statistics. [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] helps to define the point where the two sets of expertise need to coordinate, to effectively use IT within statistical exchanges and processes.
Artur 1.1 67
68 Within the content-oriented work, there are a set of work products: //The Content Oriented Guidelines, //and 5 annexes:
69
Elena 4.3 70 1. Cross-Domain [[Concepts>>doc:Glossary.Concept.WebHome]]
71 1. Cross-Domain [[Codelists>>doc:Glossary.Code list.WebHome]]
Artur 1.1 72 1. Statistical Subject-Matter Domains
73 1. Metadata Common Vocabulary
Elena 5.1 74 1. [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]]-ML for the [[Content-Oriented Guidelines>>doc:sdmx:Glossary.Content-Oriented Guidelines.WebHome]] ([[Concepts>>doc:Glossary.Concept.WebHome]], [[Code Lists>>doc:sdmx:Glossary.Code list.WebHome]], [[Category>>doc:sdmx:Glossary.Category.WebHome]] Scheme)
Artur 1.1 75
76 These are discussed in more detail in the first annex to this user guide.
77
Elena 5.1 78 The [[SDMX Technical Specifications>>doc:sdmx:Glossary.SDMX Technical Specification.WebHome]] are now in version 2.1, but both version 1.0 and version 2.0 were implemented. The 1.0 version of the specifications have relatively limited coverage – a model for data formats and their structures, along with XML and UN/EDIFACT formats for exchanging these. The UN/EDIFACT format was backwardcompatible with GESMES/TS; the XML formats were new. There was also some support provided for [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]]-based Web services: an XML query document, and a set of guidelines about the use of other related Web-services standards (SOAP and WSDL).
Artur 1.1 79
Elena 5.1 80 The 2.0 version of the technical specifications had a greatly-expanded scope. The model was extended to include “[[reference metadata>>doc:sdmx:Glossary.Reference metadata.WebHome]]” as a way of structuring and formatting metadata related to data quality frameworks, methodological metadata, and other types of “footnote” metadata. Thus, XML formats for [[reference metadata>>doc:sdmx:Glossary.Reference metadata.WebHome]] were added. Further, a set of standard interfaces in XML for interactions with a [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] were added, for cataloguing the location of data and [[reference metadata>>doc:sdmx:Glossary.Reference metadata.WebHome]] across the Internet or within an organization, and for maintaining and retrieving structural metadata.
Artur 1.1 81
Elena 5.1 82 In version 2.1, many features of 2.0 have been improved, and the Web-services recommendations have been expanded to include a RESTful interface, standard functions, and error messages. Now, it is possible to develop generically interoperable applications based on the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] standards. Further, the various XML data formats have been simplified based on implementation experience with version 2.0.
Artur 1.1 83
Helena 2.1 84 For all types of work products, there have been internal reviews within the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] community, and also public review of the guidelines and standards.
Artur 1.1 85
86 == 2.4 The SDMX “Toolkit” Approach ==
87
Helena 2.1 88 There are many different elements in the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] suite of guidelines and specifications, and it may seem daunting to think of implementing them all. It is important to understand the philosophy behind this suite of tools. [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] has always taken seriously the idea that different organizations will implement at their own speeds, and with their own objectives. As much as possible, they have recognized that investments in legacy systems must be protected, and that existing content and processes should still be supported.
Artur 1.1 89
Helena 2.1 90 The result of this requirement has been the “toolkit” approach: [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] offers many different tools, but they need not all be adopted or used together. Indeed, many tools are now built on top of a more fine-grained set of [[components>>doc:sdmx:Glossary.Component.WebHome]] which themselves can be integrated into an organisation’s own systems. The technical specifications outline a number of different types of conformance with the specifications, based on which parts of the specifications are being used.
Artur 1.1 91
Helena 2.1 92 The following chapter describes the use-cases which [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] supports, but a basic list of business applications can be given:
Artur 1.1 93
Helena 2.1 94 * Collection cases o [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] as a “push” reporting format for data and metadata (reporter pushes data to collector)
95 ** [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] as a “pull” reporting format for data and metadata (collector pulls data from reporter)
Artur 1.1 96 * Dissemination cases
Helena 2.1 97 ** [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] for file downloads
98 ** [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] as a queryable data source
99 ** [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] to drive website presentation of data and metadata
Artur 1.1 100 * Data warehousing cases
Helena 2.1 101 ** [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] for extraction, transformation, and load of data
102 ** [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] as a model for the structure of a data warehouse or metadata repository
Artur 1.1 103
104 Different parts of the standard are used for each of these cases (and others), and the specifications are specifically written to allow only the relevant parts of the standard to be used by any given application.
105
106 == 2.5 Uptake of SDMX within Domains ==
107
Helena 2.1 108 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] has become very widely used within the world of official statistics, so much so that it is difficult to form a comprehensive list of users. This section attempts to characterize the current users of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] – a group that will likely grow not only in terms of numbers, but also in terms of the breadth of applications. A few possibilities here are suggested at the end of this section.
Artur 1.1 109
Helena 2.1 110 If we are to look at the most common uses of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], there are two:
Artur 1.1 111
Helena 2.1 112 1. The use of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] as a reporting and collection format, which is especially prevalent within the central banking community (as a result of the earlier implementation of GESMES/TS, now [[SDMX-EDI>>doc:sdmx:Glossary.SDMX-EDI.WebHome]]), and also among the statistical agencies in Europe (also users of GESMES historically, but implementation is now increasingly driven by such projects as Eurostat’s Census Hub);
Artur 1.1 113 1. Dissemination of statistical data from websites.
114
115 The second application is one which we see in a broad range of institutions, including central banks (ECB and European System of Central Banks, BIS, U.S. Federal Reserve Board and New York Federal Reserve, among others), other sponsoring institutions (IMF, World Bank, OECD, etc.), and national statistical agencies (INEGI in Mexico, Statistics New Zealand, Australian Bureau of Statistics, statistics offices in the European Statistical System etc.)
116
Helena 2.1 117 A less-common but growing use of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is as the basis for data warehouses and other forms of data management. Perhaps the best example of this is the European Central Bank, which has created all of its internal data warehouses around the [[SDMX Information Model>>doc:sdmx:Glossary.SDMX Information Model.WebHome]], and has realized many benefits from this. They are by no means the only organization looking at this type of implementation, however – many other organizations are using [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] to manage not only their statistical data, but also to create [[metadata repositories>>doc:sdmx:Glossary.Metadata repository.WebHome]], and to integrate their metadata and data.
Artur 1.1 118
Helena 2.1 119 If we look at which [[statistical domains>>doc:sdmx:Glossary.Statistical subject-matter domain.WebHome]] have been or are becoming major adopters of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], the list would be something like this (in no particular order):
Artur 1.1 120
121 * Census and Demography
122 * Education
123 * Financial and Monetary Indicators
124 * Economic Indicators
125 * National Accounts
126 * Labour
127 * Food and Agriculture including fisheries
128 * Epidemiology
129 * Transport
130 * Data Quality
131 * Development Indicators
132
Helena 2.1 133 It is easy to see that this is a broad and cross-cutting set of [[statistical domains>>doc:sdmx:Glossary.Statistical subject-matter domain.WebHome]] – in fact, there are probably very few domains in which [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is not being used in some fashion today, and the above list is intended as an indication of the breadth of the uptake.
Artur 1.1 134
Helena 2.1 135 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] was officially endorsed first within the European statistical system, and then by the UN Statistical Committee. These endorsements were powerful incentives for organizations to use [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], and the result has been widespread adoption. There are no major competing standards, which has saved the world of statistics from a phenomenon which has slowed the uptake of standards in some other communities.
Artur 1.1 136
Helena 2.1 137 Additionally, a strong culture of open-source and free tools development has emerged, helping to make the adoption of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] easier. This has come both from within the sponsors' community and without, and is supplemented by an increasing number of tools coming from commercial vendors as well.
Artur 1.1 138
Helena 2.1 139 To learn more about available [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] tools, the best place is to consult the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Tools Database, a service provided by the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] sponsors, linked to from www.[[sdmx>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]].org.
Artur 1.1 140
Elena 5.1 141 And support for the standards does not only take the form of tools – The Open Data Foundation hosts the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] User Forum in collaboration with the sponsors, providing a place where the community can interact online, and Eurostat’s CIRCA website provides many types of resources, from training videos to student guides. Many organizations offer [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] in-person training for different levels of users. The best single point of entry is of course the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] website itself.
Artur 1.1 142
Helena 2.1 143 Looking forward, [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is increasingly coming into use: at the most recent [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Global conference held in Washington DC in May 2011, Google showed some interest in [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] as a source of data for its Data Explorer; there is now an interest in setting up a [[global registry>>doc:sdmx:Glossary.Global registry.WebHome]] so that all [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] data and metadata sources can be easily found. Further, we see the strong possibility that the world of corporate statistics may realize the utility of having a strong standards basis around the vast amounts of data collected today to support business intelligence applications.