Version 2.2 by Helena on 2025/05/27 13:31

Show last authors
1 {{box title="**Contents**"}}
2 {{toc/}}
3 {{/box}}
4
5 == 12.1 Introduction ==
6
7 The Generic Statistical Business Process Model (GSBPM) is a reference model of the statistical production life-cycle in national statistical agencies, developed by the METIS group in UN/ECE. The work was based on many earlier models, and represents a view of statistical production which is now being accepted as the standard view.
8
9 For this reason, we are using the GSBPM as the basis of an example, demonstrating how [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] fits into the work of a national-level statistical agency.
10
11 This example is not a technical one – rather, it is meant to describe the use of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] from a business perspective: how, where, and why is [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] used? These questions will be answered by using the example of effective exchange rates..
12
13 It is important to note that in some scenarios, where collected data is already in the form of aggregates, [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] might be used earlier in the business process. However, for NSOs the most common scenario is probably where micro-data are collected and aggregated at the national (% style="color:#e74c3c" %)level(%%).
14
15 There are many benefits to the use of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], and while many of these are related to the use of technology, in the end the real benefits are simple: it becomes easier for users to locate and utilize data, and the data themselves are more comparable. Further, the data become easier to visualize and format, into whatever form is needed, either for the creation of dissemination outputs, or for re-formatting by data users or collectors.
16
17 == 12.2 High Level Schematic of the GSBPM ==
18
19 It is important to have at least a high-level understanding of the GSBPM, as shown in the diagram below.
20
21 [[image:SDMX_2-1_User_Guide_draft_0-1_html_c0e5b4561dbdfc6c.jpg||data-xwiki-image-style-alignment="center" height="495" width="684"]]
22
23 (% style="text-align: center;" %)
24 **{{id name="image_35"/}}Figure 35: Hush-level schematic of the GSBPM**
25
26 Across the top of the diagram, we see the high-(% style="color:#e74c3c" %)level(%%) process steps, from 1 to 9. The process begins with the evaluation of data collection needs, and proceeds through the design and creation of data-collection instruments, and then moves on to the actual collection of data. Once collected, data are processed, coded, edited, [[imputation>>doc:sdmx:Glossary.Imputation.WebHome]] is performed, weights are calculated, and the data are aggregated.
27
28 Up to this point (i.e. 5.6), the GSBPM has been concerned with the collection and processing of micro-data (at least from the perspective of an NSO – from the perspective of a supra-national organization, the collected data may themselves often be aggregates.)
29
30 For this example, we show how [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] can be used from the point of aggregation forward, as we move through the GSBPM. For our purposes, then, we will focus on steps 5.7 and later, as shown below:
31
32 [[image:SDMX_2-1_User_Guide_draft_0-1_html_2adef8aeaf6c1bc0.jpg||data-xwiki-image-style-alignment="center" height="560" width="472"]]
33
34 (% style="text-align: center;" %)
35 **Figure 36: The part of the GSBPM supported by SDMX**
36
37 To understand how [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] can be used throughout this process, we need to look not only at the internal business process of the Central Bank or NSO, but also at the collection framework of the organization to which the aggregate data are reported.
38
39 Because [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] focuses on the exchange of statistics, it will be necessary to consider the organization in our example which will be performing the collection. This will involve some constructs external to, but accessible by, the compiling organization – notably an [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]], with the various constructs that it contains ([[data flows>>doc:sdmx:Glossary.Dataflow.WebHome]], [[data providers>>doc:sdmx:Glossary.Data provider.WebHome]], etc.).
40
41 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is not only for reporting of aggregates, however – it also performs useful functions in the dissemination of data directly to users, and the archiving of data within the organization, so these functions of the organization will also be included in our example.
42
43 == 12.3 GSBPM and SDMX ==
44
45 === 12.3.1 Aggregation (Step 5.7), and Data Analysis (Step 6) ===
46
47 ==== 12.3.1.1 Calculation of Aggregates, and Understanding the SDMX Data Structure ====
48
49 Once data have been aggregated from the micro-data (Step 5.7 of the GSBPM) they will be stored in some format such as a relational database or data warehouse (Oracle, etc.) or in some processing format (SAS, SPSS), or in Excel spreadsheets or similar format. This will depend on the internal system and tools used within the organization, and is different in different organizations.
50
51 In order to capture the aggregates in a standard [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] format, we must first look at the required data structures as dictated by the collecting agency. In our example, we will use the Effective Exchange Rates data structure developed by the ECB^^[[(% class="wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink" %)^^4^^>>path:#sdfootnote4sym||name="sdfootnote4anc"]](%%)^^. Below is an example of the type of data which is contained in a [[data set>>doc:sdmx:Glossary.Data set.WebHome]] structured according to this [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]].
52
53 [[image:SDMX_2-1_User_Guide_draft_0-1_html_e5d197f835ebb755.jpg||data-xwiki-image-style-alignment="center" height="529" width="575"]]
54
55 (% style="text-align: center;" %)
56 **{{id name="image_37"/}}Figure 37: Example ECB Data of Effective Exchange Rates**
57
58 It is not important from the NSO or Central Bank’s perspective to understand every aspect of the analysis which went into the creation of the data structure, as the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] [[Data Structure Definition>>doc:sdmx:Glossary.Data structure definition.WebHome]] to be used for reporting will be provided by the agency collecting the data. All data reporters are expected to use the same [[data structure definition>>doc:sdmx:Glossary.Data structure definition.WebHome]] ([[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]]).
59
60 It is important to understand the [[data structure definition>>doc:sdmx:Glossary.Data structure definition.WebHome]], because this is the resource which describes how the reported aggregates themselves must be structured. Below is view of the Effective Exchange Rates [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]].
61
62 The [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] is an XML file, created according to the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] standard (it can also be expressed in [[SDMX-EDI>>doc:sdmx:Glossary.SDMX-EDI.WebHome]], but the XML (% style="color:#e74c3c" %)version(%%) is more common). The fact that is XML is important to the IT staff who must process it, and is used by many [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] tools, but what is most important is that the statistical (% style="color:#e74c3c" %)concepts(%%) and codelists (or “classifications”) it uses are also used in the reported aggregate data file. Thus, we do not look at the XML for this example – it is enough to know that the XML (% style="color:#e74c3c" %)version(%%) of the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] exists, and may be needed by tools or developers at some point.
63
64 [[image:SDMX_2-1_User_Guide_draft_0-1_html_bd8d57aa9f34e7ce.jpg||data-xwiki-image-style-alignment="center" height="227" width="575"]]
65
66 (% style="text-align: center;" %)
67 **{{id name="image_38"/}}Figure 38: Example [[Data Structure Definition>>doc:sdmx:Glossary.Data structure definition.WebHome]] ([[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]]) for Effective Exchange Rates with [[Code>>doc:sdmx:Glossary.Code.WebHome]] List**
68
69 [[image:SDMX_2-1_User_Guide_draft_0-1_html_7e32b093ddbdc2d7.jpg||data-xwiki-image-style-alignment="center" height="398" width="576"]]
70
71 (% style="text-align: center;" %)
72 **{{id name="image_39"/}}Figure 39: Example Data Structure Definition (Attributes) for Effective Exchange Rates**
73
74 The view here shows a number of very important things:
75
76 The ID, Agency, and (% style="color:#e74c3c" %)version(%%) of the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] are displayed at the top of the screen. Below this, there is a listing of statistical (% style="color:#e74c3c" %)concepts(%%) which are used as [[dimensions>>doc:sdmx:Glossary.Dimension.WebHome]]. (Frequency, [[Currency>>doc:sdmx:Glossary.Currency.WebHome]], Exchange Rate Type, etc.) Each of these (% style="color:#e74c3c" %)concepts(%%) will be taken from some authoritative source, and descriptions and definitions can be obtained from the organization which publishes and maintains the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]]. In many cases, these (% style="color:#e74c3c" %)concepts(%%) will be the standard (% style="color:#e74c3c" %)concepts(%%) defined in the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Cross-Domain Statistical (% style="color:#e74c3c" %)Concepts(%%) document (which can be obtained at [[__www.__sdmx.org>>https://xwiki:www.sdmx.org]]). In other cases, they may be formally defined and documented by the maintaining agency which publishes the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]]. It is important to understand the definition of these (% style="color:#e74c3c" %)concepts(%%), as they may be slightly different from those used by the NSO or Central Bank, but in most cases they will likely be very similar or the same as the (% style="color:#e74c3c" %)concepts(%%) already used at the national (% style="color:#e74c3c" %)level(%%) for this data.
77
78 The (% style="color:#e74c3c" %)concepts(%%) used as [[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]] each take a value which has a standard [[representation>>doc:sdmx:Glossary.Representation.WebHome]]. In most cases, this [[representation>>doc:sdmx:Glossary.Representation.WebHome]] will be a codelist – a standard classification which must be used to identify and describe the observations. In the righthand part of the screen, we can see which codelist is used to represent each (% style="color:#e74c3c" %)concept(%%) used as a [[dimension>>doc:sdmx:Glossary.Dimension.WebHome]]. For example, the Frequency [[dimension>>doc:sdmx:Glossary.Dimension.WebHome]] uses a codelist called “CL_FREQ” (% style="color:#e74c3c" %)version(%%) 1.0:
79
80 Frequency is perhaps the simplest example, as the reporting agency will generally know what the frequency of the data is, and have a record of this in their systems (quarterly, monthly, etc.)
81
82 Notice that the Time [[dimension>>doc:sdmx:Glossary.Dimension.WebHome]] is not coded, but instead has a time value, indicating the time of the observation.
83
84 The values for the [[codes>>doc:sdmx:Glossary.Code.WebHome]] may or may not match the classification used at the national (% style="color:#e74c3c" %)level(%%), and if they do not match then it will be necessary to (% style="color:#e74c3c" %)map(%%) the [[codes>>doc:sdmx:Glossary.Code.WebHome]] used internally against the classification used by the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]].
85
86 For each statistical (% style="color:#e74c3c" %)concept(%%) used as a [[dimension>>doc:sdmx:Glossary.Dimension.WebHome]], it must be possible to provide a value as specified by the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]]. This might seem like a lot of work, but it is done for obvious and important reasons – if the collected data are to be comparable at the supranational (% style="color:#e74c3c" %)level(%%), then there must be a standard expression of the data, using the same statistical (% style="color:#e74c3c" %)concepts(%%) and classifications to identify and describe the observations. This is no different than when reporting aggregate data today – each data collector will want to have a specific expression of the data collected. What is different, with [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], however, is that the data collectors are harmonizing the DSDs used in each domain, and there is an effort, through the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] [[Content-Oriented Guidelines>>doc:sdmx:Glossary.Content-Oriented Guidelines.WebHome]], to use identical statistical (% style="color:#e74c3c" %)concepts(%%) and [[representations>>doc:sdmx:Glossary.Representation.WebHome]] where this is possible.
87
88 Harmonization of data is a difficult process, but it is one which will result, in time, in more useful data (because it is more comparable), and also hopefully in a more uniform collection of data at the national (% style="color:#e74c3c" %)level(%%), because all reporting countries must conform to a standard [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] as they calculate aggregates, which in turn impacts the data collection process as shown in the GSBPM.
89
90 If we look again at the high-(% style="color:#e74c3c" %)level(%%) picture of the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] (above), we will also see a section which shows statistical (% style="color:#e74c3c" %)concepts(%%) used as “[[Attributes>>doc:sdmx:Glossary.Attribute.WebHome]]”. These are descriptive (% style="color:#e74c3c" %)concepts(%%), sometimes represented with [[codes>>doc:sdmx:Glossary.Code.WebHome]], or sometimes with strings. They are different from Dimensional (% style="color:#e74c3c" %)concepts(%%), because they are not always required – in the table [[image:SDMX_2-1_User_Guide_draft_0-1_html_7b5b4145814c1ed2.jpg||height="25" width="21"]] indicates “Conditional” and [[image:SDMX_2-1_User_Guide_draft_0-1_html_711950811766087b.jpg||height="21" width="23"]] indicates “Mandatory”.
91
92 An [[Attribute>>doc:sdmx:Glossary.Attribute.WebHome]] also has a relationship “[[Attribute Relationship>>doc:sdmx:Glossary.Attribute relationship.WebHome]]” to a construct such as a group of [[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]], which can be one or more [[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]], Observation etc. as discussed in Chapter 4.
93
94 In other ways, the [[Attributes>>doc:sdmx:Glossary.Attribute.WebHome]] are very similar to the [[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]] of the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] – the coding (if they are coded) must be standard, as dictated by the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]], and for the same reasons.
95
96 ==== 12.3.1.2 Formatting Aggregates with SDMX ====
97
98 Once it has been determined that the aggregate data can be expressed as [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], according to the rules of the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]], then we need to think about what is involved in actually creating the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] format for the data. This is important because if we can express the aggregates as [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]], there are a number of tools which will become useful in performing later activities in the GSBPM.
99
100 There are several techniques for creating [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] [[data sets>>doc:sdmx:Glossary.Data set.WebHome]], and several choices will need to be made. First, there are several “flavours” of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]]: [[SDMX-EDI>>doc:sdmx:Glossary.SDMX-EDI.WebHome]] (also known as GESMES/TS) and four types of [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] (the XML (% style="color:#e74c3c" %)version(%%)). This is a technical consideration, and it is typically the case that the data collector will dictate exactly which format is wanted. The XML formats include “Generic”, and “Structure Specific”. Different organizations use different flavours of [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]], but it is important to note that there are free tools available which will allow for transformations between these different flavours.
101
102 These are technical considerations which should be left to the IT staff, so we will not go into them in depth here – the reasons for using one or another are most often purely ITtechnical ones.
103
104 We do need to look at the practical options for creating the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] files, however. The options are discussed in Chapter 4- Data and Metadata Creation and Reporting and the technical mechanism for achieving different outputs from a database is discussed in Annex 4 – Data Reader and Data Writer Functions. There are several types of tools which will allow the formatting of aggregates into the correct form: tools based on Excel, tools which take the data from a relational database such as Oracle, and tools which work within processing tools such as SAS or PCAxis. Again, we will not look at the technical details of these tools, as this is an IT issue, but it is important to be aware that there are several different tools available. Eurostat provides a free tool for “data mapping” which is broadly useful, and PCAxis will have native [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] support built into it in future (% style="color:#e74c3c" %)versions(%%). It should be noted that when working with processing applications such as SAS and SPSS, it is often the case that dedicated scripts will need to be written within those environments, as different national formats within those applications will require specific formatting scripts to produce [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] outputs.
105
106 ==== 12.3.1.3 SDMX and Analysis of Aggregates (Step 6) ====
107
108 It may not seem obvious that [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is relevant to the process of analysis of aggregates, but it can sometimes be very useful. This will depend on which tools are used by an NSO to perform these various steps. Because most systems work well with XML generally – and because [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] is one flavour of XML – [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] can provide some useful functions as the aggregates are analyzed and further processed.
109
110 In the preparation of draft outputs (Step 6.1), it may be helpful to use any of the various visualization tools based on [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] when looking at the data. Tools exist for doing graphical visualizations of the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] data, using modern technology packages such as the Flex [[code>>doc:sdmx:Glossary.Code.WebHome]] developed by the European Central Bank (http:~/~/[[code>>doc:sdmx:Glossary.Code.WebHome]].google.com/p/flexcb/ ). Other packages also exist, provided by various commercial vendors. Other free tools exist for producing Excel spreadsheets and HTML displays of the data.
111
112 Especially if files are passed between several individuals while the draft outputs are prepared, it may be useful to exchange the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] file, so that different individuals can use different visualizations of the same data while performing this work.
113
114 The validation of outputs (Step 6.2) requires more than just data visualization, and it is here that [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] can provide some solid benefit. Some of the validation rules exist within the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]], and these can be automatically checked using free [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] data and [[metadata set>>doc:sdmx:Glossary.Metadata set.WebHome]] tools, others exist within an [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] where cross references, versioning, and request for deletions are validated to ensure the integrity of the [[structural metadata>>doc:sdmx:Glossary.Structural metadata.WebHome]].
115
116 What [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] cannot validate is that the numbers reported are correct in terms of other values in the [[data set>>doc:sdmx:Glossary.Data set.WebHome]] – that is, are they plausible values given the numbers reported in preceding periods, or in relation to other reported data. These are statistical issues that cannot be solved by [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]]-based technology, but which will require dedicated checks created by a statistician who understand the statistical issues.
117
118 **Scrutinizing and explaining** the aggregates (Step 6.3) is something which typically involves visualization of the data (as for Step 6.1) but may also include the creation of specific tabular views for inclusion in reports. The same tools which provide the ability to visualize [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] data may also allow for the creation of tabular views for use in reports (Excel tables, etc.) but this will vary based on the systems within each NSO or Central Bank.
119
120 There is nothing in [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] which directly addresses **disclosure control** (Step 6.4) or the finalization of outputs (6.5), other than the use of visualization tools as described for earlier parts of Step 6. However, it should be noted that any corrections or edits to the data will need to be reflected in the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] data to be reported. Depending on how the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] is generated, this may involve going back to the tools and systems used to format the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] in the first place, and making sure that the correct data are available in those tools for re-formatting as [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]].
121
122 === 12.3.2 Reporting/Dissemination (Step 7) ===
123
124 Step 7 of the GSBPM covers the process of dissemination in its broadest sense – that is, all users of the data are the target of this process step, including organizations which collect the aggregate data from NSOs and Central Banks. Thus, the GSBPM addresses reporting and dissemination as a single set of activities.
125
126 There are several types of data dissemination, and when we consider dissemination and reporting using the Internet this [[category>>doc:sdmx:Glossary.Category.WebHome]] is very broad. As we look at each sub-process in this step, we will need to consider this broad range of possibilities.
127
128 In addition to the sub-processes described by the GSBPM, we also need to consider one aspect of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] that potentially concerns all forms of reporting and dissemination, the [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] Services (see [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]]/Repository).
129
130 The first sub-process in Step 7 is the **updating of output systems**. This involves taking the aggregates as prepared in Step 6, and loading them into whatever systems are used to drive dissemination. Typically, this will involve database systems (e.g. Oracle) and - if the same database is not used to drive Web dissemination – also loading data into whatever system drives the views of data on the Website.
131
132 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] can be used as a format for the **exchange of data between systems**, whether these systems are internal to an organization, or external, and thus it makes a good format for loading databases used in all types of dissemination. Further, because it is an XML format, [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] can be used as inputs to systems for creating HTML, PDF, Excel, and other output formats. An [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] can make the reporting of such data more automated by using the data registration mechanism supported by a registry. The benefit of such a system is that – once new data have been “registered” (see below), the data collector can come and simply query the service for the new data. This helps to ease the burden of data reporting.
133
134 This application of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] tends to be very technical – because XML is well-supported by many types of systems, it is useful also in loading the databases used to drive dissemination. The details of this application are not something we will explore in any detail here.
135
136 The next sub-processes in the GSBPM is the **preparation of outputs**, and the management of their release. This covers a wide variety of potential products based on the data: reports (typically printed and disseminated as PDF, combining tabular views of the aggregate data with explanatory text and analysis), HTML pages displayed on a Web-site, data downloads in various formats (Excel, CSV, etc.), and Web-based interfaces for querying the data, and for doing graphic visualizations, which may even be interactive.
137
138 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] can be used as the single XML format for the creation of all other dissemination products, at least for providing the tabular views of the data. (Obviously, websites have more than just data on them.) Again, this can be a very IT-technical topic, but it is important to understand that there are many good technologies for “styling” XML to produce other outputs, including all of the ones typically found on statistical websites.
139
140 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is also directly useful in two ways: as a format for reporting to data collectors and as a direct download format. The use of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] as a download format has become increasingly popular, and in some cases has proven to be the most popular form of disseminated data available on Web-sites. Many users prefer this format because it is easy to process (being XML) and it is accompanied by rich metadata, including the [[structural metadata>>doc:sdmx:Glossary.Structural metadata.WebHome]] necessary for applications to process or visualize the data. Further, the format is predictable, allowing for easy use of the data coming from outside the organization.
141
142 It is worth noting that for many organizations, [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is being deployed using a Web service (such as those developed by ECB, IMF, and OECD). Such services allow for direct querying of the [[data sources>>doc:sdmx:Glossary.Data source.WebHome]], in [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] format, by any user allowed access to the service. Eurostat is currently developing a “Census [[Hub>>doc:sdmx:Glossary.Hub (dissemination architecture).WebHome]]” Web service for querying the census data to be collected in 2011. Here, the data for each country remains in the database of the country and role of the “[[hub>>doc:sdmx:Glossary.Hub (dissemination architecture).WebHome]]” is to broker a user query such that an “[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] Query” is sent to each relevant database which responds in [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]]. The resultant responses are then combined by the [[hub>>doc:sdmx:Glossary.Hub (dissemination architecture).WebHome]].
143
144 The “advanced” use of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] – where an [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]]-capable database can create many dissemination products which transform the [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] into other formats, and even in an on-demand fashion for Web dissemination – can greatly simplify the process of preparing dissemination outputs. Instead of having to produce several parallel forms of the data, having a single [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] source means that, once loaded, print and PDF reports must be prepared, and static Web-pages created, but all other types of data dissemination are basically handled by systems which generate needed outputs (Excel, CSV, graphical visualizations, [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]) in an on-demand fashion.
145
146 The figure below illustrates the basic principle behind this type of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] use.
147
148 [[image:SDMX_2-1_User_Guide_draft_0-1_html_6ed83c791a279c7f.jpg||data-xwiki-image-style-alignment="center" height="378" width="504"]]
149
150 (% style="text-align: center;" %)
151 **{{id name="image_40"/}}Figure 40: SDMX as the pivotal format in a dissemination system**
152
153 It should be noted that when [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] represents a [[dissemination format>>doc:sdmx:Glossary.Dissemination format.WebHome]] in its own right, the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] structure file containing the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] and all its [[components>>doc:sdmx:Glossary.Component.WebHome]] should be provided along with the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] [[data set>>doc:sdmx:Glossary.Data set.WebHome]] it structures, as users will want both types of files for use in their own systems. In most cases, the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] files will be available from their agency, but in this case a link to that source should be readily accessible to users (this may be through an [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] – see below).
154
155 Typically, all data products (including on-demand delivery via a Web service or query interface) are loaded into a “staging” environment, so that they can be subjected to [[quality assurance>>doc:sdmx:Glossary.Quality management - quality assurance.WebHome]] before being actually disseminated. [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] does not change this aspect of the dissemination and reporting process, but does place an emphasis on the proper testing of Web-delivery for data.
156
157 The next sub-process in the GSBPM is the **promotion of dissemination** products. [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is extremely useful in this regard, although not perhaps in a fashion which is obvious. This process in the GSBPM is typically seen as the “advertising” of the statistical products, and [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is not much use here except that the use of leading-edge standards may offer some opportunities for promotion (presentations at conferences, etc.).
158
159 Far more interesting in increasing the visibility and use of data is the existence of the [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] Services, which provide a platform for the automatic discovery of data products. Users have become used to the idea that resources can be “Googled”, and while the [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] services are not part of Google itself, they do provide a focused way of searching for all of the data produced within a domain, regardless of which site the data is published on.
160
161 In essence, the [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] Services provide an online catalogue, listing all of the data available within a community. That community can be open or closed, depending on who is allowed access to the catalogue. Thus, there are registries today which only provide access to data collectors, such as the [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] used by the Joint External Debt [[Hub>>doc:sdmx:Glossary.Hub (dissemination architecture).WebHome]] (it is only visible to the organizations which exchange data: the BIS, the IMF, OECD, and the World Bank). [[SDMX Registries>>doc:sdmx:Glossary.SDMX Registry.WebHome]] can be public, however, which means that any Website or Internet-aware application could search for all of the data listed in that catalog, and then go to the site where that data is found.
162
163 This is a very powerful thing: increasingly, this approach to locating data is being used, because it leverages the latest generation of Web-based technology. Exposing the existence of your data to these types of sites and applications, and making it queryable or otherwise accessible in [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] format, is a very efficient way to make it visible and available to re-publishers and users of all types.
164
165 === 12.3.3 Archiving (Step 8) ===
166
167 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is not specifically designed to support archiving, which is Step 8 of the GSBPM, but it is worth noting a few significant aspects of the standard which can be useful in this process. First, because [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] is XML, it provides a format which is not specific to any particular software package. Because of this, it can be used as a good archival format. Second, because [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] has an XML expression of the structures in the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]], it is possible to always understand how an [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] [[data set>>doc:sdmx:Glossary.Data set.WebHome]] is structured, such that it can always be easily processed. Third, [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] has strict rules about versioning. For archival use, this is good, because changes in the [[data sets>>doc:sdmx:Glossary.Data set.WebHome]] and their structures over time can be recorded and stored.
168
169 Thus, while [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is not explicitly designed as an archival format, there are aspects to it which are very useful in this process.
170
171 === 12.3.4 The GSPBM and Other Relevant Aspects of SDMX ===
172
173 One feature of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] that should be mentioned is the ability to document standard statistical processes. This is done by describing a Process, which is made up of Process Steps, which may themselves have sub-steps. Each step or sub-step can have inputs and outputs, and can be named and described. A Process Step can link to another Process Step either as a [[hierarchy>>doc:sdmx:Glossary.Hierarchy.WebHome]] or by reference via a Transition. The Computation involved in a Process Step can be documented, including the actual software used in the Process Step. Note that there is no support yet in [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] for specifying actual computations in a way that can be invoked by software.
174
175 [[image:SDMX_2-1_User_Guide_draft_0-1_html_743cde9d6320fef2.jpg||data-xwiki-image-style-alignment="center" height="333" width="576"]]
176
177 (% style="text-align: center;" %)
178 **{{id name="image_41"/}}Figure 41: Schematic of Model for Process in SDMX**
179
180 The process can be expressed in [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]], so that documentation can be produced in many useful formats, using the same types of transforms described earlier for disseminating statistical [[data sets>>doc:sdmx:Glossary.Data set.WebHome]]. Thus, a PDF or HTML (% style="color:#e74c3c" %)version(%%) of a process description could be generated from the XML.
181
182 It is easy to see that a particular organization could use the GSBPM as the basis of such a process, describe each input and output, and then send this to another organization, so that the exact processing of the data was clear.
183
184 There is currently no particular requirement from Eurostat or other data collectors for this functionality of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], but it is being implemented by some NSOs internationally, for internal process descriptions being exchanged between departments within the organization. In future, this feature may be used between organizations as well.
185
186 == 12.4 Summary ==
187
188 Our example involves the micro-data coming from external sources being recoded and aggregated, with consequent reporting and dissemination of the tabulated data. To provide a view of how [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] can be used in this scenario, the relevant parts of the GSBPM are highlighted below, and a summary table provided.
189
190 [[image:SDMX_2-1_User_Guide_draft_0-1_html_a844509ed340f102.jpg||data-xwiki-image-style-alignment="center" height="564" width="493"]]
191
192 (% style="text-align: center;" %)
193 **{{id name="image_42"/}}Figure 42: Summary showing the processes supported by SDMX**
194
195 The table below summarizes each step in the GSBPM where [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] is used in our scenario.
196
197 (% style="width:1217.45px" %)
198 |**GSBPM Step**|**Use of SDMX**|(% style="width:505px" %)**Notes**
199 |5.7 Calculate Aggregates|No direct use – may influence earlier steps in collection process|(% style="width:505px" %)Derived variables and recodes must match the requirements of the standard [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]]
200 |5.8 Finalize Data Files|Use of [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] and data formats to format aggregates|(% style="width:505px" %)Used to pass data and structure to subsequent process steps
201 |6.1 Prepare Draft Outputs|[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] can help to visualize and process data, and is used as a source format for outputs|(% style="width:505px" %)Relies on technologies which easily transform XML into other output formats
202 |6.2 Validate Outputs|[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] provides validation of all rules in the [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]] (correct [[codes>>doc:sdmx:Glossary.Code.WebHome]], complete and valid descriptions and keys, etc.)|(% style="width:505px" %)Some validation can be validated by XML schema (e.g. use of valid [[codes>>doc:sdmx:Glossary.Code.WebHome]] and [[dimension>>doc:sdmx:Glossary.Dimension.WebHome]] Ids), some validation can be undertaken with other [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] constructs such as (% style="color:#e74c3c" %)constraints(%%), whilst some cannot be performed using [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] structures e.g. comparison of numbers to determine plausibility
203 |6.3 Scrutinize and Explain|[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] visualizations may help to easily view data and generate views for output products|(% style="width:505px" %)
204 |6.4 Apply Disclosure Control|[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] visualizations help to verify disclosure processing|(% style="width:505px" %)Not a primary application of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], which does not dictate anything about disclosure
205 |6.5 Finalize Outputs|[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] visualizations may provide views of data for final outputs; outputs may be generated on-demand for dissemination on Website, etc.|(% style="width:505px" %)[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] data must be updated if data are corrected
206 |7.1 Update Output Systems|[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] provides useful format for loading into output systems|(% style="width:505px" %)Most technology tools and databases provide good support for XML formats such as [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]
207 |7.2 Produce Dissemination Products|[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] visualizations may provide views of data for final outputs; outputs may be generated on-demand for dissemination on Website, etc.|(% style="width:505px" %)
208 |7.3 Manage Release of Dissemination Products|[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] serves as a format for reporting and dissemination to some users/data collectors; serves as basis for generation of other outputs, whether static or on-demand|(% style="width:505px" %)
209 |7.4 Promote Dissemination products|(((
210 Use of [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] Services
211
212 provides a high (% style="color:#e74c3c" %)level(%%) of visibility for data
213 )))|(% style="width:505px" %)Depends on the availability of a domain registry for this purpose – requires that new data be registered automatically or manually
214 |8.2 Manage Archive Repository|(((
215 [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] provides an easy format for generation of formats needed, based on the user demands on the archive;
216
217 strict (% style="color:#e74c3c" %)version(%%) control allows for explicit management of dependencies between data and metadata
218 )))|(% style="width:505px" %)
219 |8.3 Preserve Data and Associated Metadata|Rich metadata and application/platform independence make [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] a good archival format|(% style="width:505px" %)
220
221 The benefits of using [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] here are several:
222
223 * Standard data structures, statistical (% style="color:#e74c3c" %)concepts(%%) and classifications, and formats make it easy to process and compare similar types of data from different national sources, both for data collectors and other users
224 * Richer [[dissemination format>>doc:sdmx:Glossary.Dissemination format.WebHome]], complete with metadata, supports not only good visualization of data, but also allows easy downloading and use of data in internal systems
225 * [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]] provides an excellent format for having a single source of data which can be easily transformed into different output formats for use
226 * Data becomes easier to find and use, through [[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]] Services, promoting the use of the data
227 * Data is archived in a long-lived format, independent of applications/platforms, and is accompanied by rich metadata, managed according to strict versioning rules
228
229 In some other scenarios, [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] might also be useful as a data collection format, but in the case where micro-data are aggregated, the use of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]] will be similar to that described here.