Changes for page 12 Annex 2 – SDMX Business Process Model
Last modified by Artur on 2025/09/10 11:19
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Author
-
... ... @@ -1,1 +1,1 @@ 1 -xwiki:XWiki.he lena1 +xwiki:XWiki.arturkryazhev - Content
-
... ... @@ -6,39 +6,41 @@ 6 6 7 7 The Generic Statistical Business Process Model (GSBPM) is a reference model of the statistical production life-cycle in national statistical agencies, developed by the METIS group in UN/ECE. The work was based on many earlier models, and represents a view of statistical production which is now being accepted as the standard view. 8 8 9 -For this reason, we are using the GSBPM as the basis of an example, demonstrating how [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]fits into the work of a national-level statistical agency.9 +For this reason, we are using the GSBPM as the basis of an example, demonstrating how SDMX fits into the work of a national-level statistical agency. 10 10 11 -This example is not a technical one – rather, it is meant to describe the use of [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]from a business perspective: how, where, and why is[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]used? These questions will be answered by using the example of effective exchange rates..11 +This example is not a technical one – rather, it is meant to describe the use of SDMX from a business perspective: how, where, and why is SDMX used? These questions will be answered by using the example of effective exchange rates.. 12 12 13 -It is important to note that in some scenarios, where collected data is already in the form of aggregates, [[SDMX>>doc:sdmx:Glossary.Statisticaldata andmetadata exchange.WebHome]] might be used earlier in the business process. However, for NSOs the most common scenario is probably where micro-data are collected and aggregated at the national(% style="color:#e74c3c" %)level(%%).13 +It is important to note that in some scenarios, where collected data is already in the form of aggregates, SDMX might be used earlier in the business process. However, for NSOs the most common scenario is probably where micro-data are collected and aggregated at the national level. 14 14 15 -There are many benefits to the use of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], and while many of these are related to the use of technology, in the end the real benefits are simple: it becomes easier for users to locate and utilize data, and the data themselves are more comparable. Further, the data become easier to visualize and format, into whatever form is needed, either for the creation of dissemination outputs, or for re-formatting by data users or collectors.15 +There are many benefits to the use of SDMX, and while many of these are related to the use of technology, in the end the real benefits are simple: it becomes easier for users to locate and utilize data, and the data themselves are more comparable. Further, the data become easier to visualize and format, into whatever form is needed, either for the creation of dissemination outputs, or for re-formatting by data users or collectors. 16 16 17 17 == 12.2 High Level Schematic of the GSBPM == 18 18 19 19 It is important to have at least a high-level understanding of the GSBPM, as shown in the diagram below. 20 20 21 -[[image:SDMX_2-1_User_Guide_draft_0-1_html_c0e5b4561dbdfc6c.jpg||data-xwiki-image-style-alignment="center" height="495" width="684"]] 21 +(% style="text-align: center;" %) 22 +[[image:SDMX_2-1_User_Guide_draft_0-1_html_c0e5b4561dbdfc6c.jpg||height="495" width="684"]] 22 22 23 23 (% style="text-align: center;" %) 24 24 **{{id name="image_35"/}}Figure 35: Hush-level schematic of the GSBPM** 25 25 26 -Across the top of the diagram, we see the high- (% style="color:#e74c3c" %)level(%%)process steps, from 1 to 9. The process begins with the evaluation of data collection needs, and proceeds through the design and creation of data-collection instruments, and then moves on to the actual collection of data. Once collected, data are processed, coded, edited,[[imputation>>doc:sdmx:Glossary.Imputation.WebHome]]is performed, weights are calculated, and the data are aggregated.27 +Across the top of the diagram, we see the high-level process steps, from 1 to 9. The process begins with the evaluation of data collection needs, and proceeds through the design and creation of data-collection instruments, and then moves on to the actual collection of data. Once collected, data are processed, coded, edited, imputation is performed, weights are calculated, and the data are aggregated. 27 27 28 28 Up to this point (i.e. 5.6), the GSBPM has been concerned with the collection and processing of micro-data (at least from the perspective of an NSO – from the perspective of a supra-national organization, the collected data may themselves often be aggregates.) 29 29 30 -For this example, we show how [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]canbe used from the point of aggregation forward, as we move through the GSBPM. For our purposes, then, we will focus on steps 5.7 and later, as shown below:31 +For this example, we show how SDMX can be used from the point of aggregation forward, as we move through the GSBPM. For our purposes, then, we will focus on steps 5.7 and later, as shown below: 31 31 32 -[[image:SDMX_2-1_User_Guide_draft_0-1_html_2adef8aeaf6c1bc0.jpg||data-xwiki-image-style-alignment="center" height="560" width="472"]] 33 +(% style="text-align: center;" %) 34 +[[image:SDMX_2-1_User_Guide_draft_0-1_html_2adef8aeaf6c1bc0.jpg||height="560" width="472"]] 33 33 34 34 (% style="text-align: center;" %) 35 35 **Figure 36: The part of the GSBPM supported by SDMX** 36 36 37 -To understand how [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]canbe used throughout this process, we need to look not only at the internal business process of the Central Bank or NSO, but also at the collection framework of the organization to which the aggregate data are reported.39 +To understand how SDMX can be used throughout this process, we need to look not only at the internal business process of the Central Bank or NSO, but also at the collection framework of the organization to which the aggregate data are reported. 38 38 39 -Because [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]focuses on the exchange of statistics, it will be necessary to consider the organization in our example which will be performing the collection. This will involve some constructs external to, but accessible by, the compiling organization – notably an[[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]], with the various constructs that it contains ([[data flows>>doc:sdmx:Glossary.Dataflow.WebHome]],[[data providers>>doc:sdmx:Glossary.Data provider.WebHome]], etc.).41 +Because SDMX focuses on the exchange of statistics, it will be necessary to consider the organization in our example which will be performing the collection. This will involve some constructs external to, but accessible by, the compiling organization – notably an SDMX Registry, with the various constructs that it contains (data flows, data providers, etc.). 40 40 41 - [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]is not only for reporting of aggregates, however – it also performs useful functions in the dissemination of data directly to users, and the archiving of data within the organization, so these functions of the organization will also be included in our example.43 +SDMX is not only for reporting of aggregates, however – it also performs useful functions in the dissemination of data directly to users, and the archiving of data within the organization, so these functions of the organization will also be included in our example. 42 42 43 43 == 12.3 GSBPM and SDMX == 44 44 ... ... @@ -48,25 +48,28 @@ 48 48 49 49 Once data have been aggregated from the micro-data (Step 5.7 of the GSBPM) they will be stored in some format such as a relational database or data warehouse (Oracle, etc.) or in some processing format (SAS, SPSS), or in Excel spreadsheets or similar format. This will depend on the internal system and tools used within the organization, and is different in different organizations. 50 50 51 -In order to capture the aggregates in a standard [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]format, we must first look at the required data structures as dictated by the collecting agency. In our example, we will use the Effective Exchange Rates data structure developed by the ECB^^[[(% class="wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallinkwikiinternallink" %)^^4^^>>path:#sdfootnote4sym||name="sdfootnote4anc"]](%%)^^. Below is an example of the type of data which is contained in a[[data set>>doc:sdmx:Glossary.Dataset.WebHome]] structured according to this[[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]].53 +In order to capture the aggregates in a standard SDMX format, we must first look at the required data structures as dictated by the collecting agency. In our example, we will use the Effective Exchange Rates data structure developed by the ECB^^[[(% class="wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink wikiinternallink" %)^^4^^>>path:#sdfootnote4sym||name="sdfootnote4anc"]](%%)^^. Below is an example of the type of data which is contained in a data set structured according to this DSD. 52 52 53 -[[image:SDMX_2-1_User_Guide_draft_0-1_html_e5d197f835ebb755.jpg||data-xwiki-image-style-alignment="center" height="529" width="575"]] 55 +(% style="text-align: center;" %) 56 +[[image:SDMX_2-1_User_Guide_draft_0-1_html_e5d197f835ebb755.jpg||height="529" width="575"]] 54 54 55 55 (% style="text-align: center;" %) 56 56 **{{id name="image_37"/}}Figure 37: Example ECB Data of Effective Exchange Rates** 57 57 58 -It is not important from the NSO or Central Bank’s perspective to understand every aspect of the analysis which went into the creation of the data structure, as the [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] [[Data Structure Definition>>doc:sdmx:Glossary.Datastructure definition.WebHome]]tobe used for reporting will be provided by the agency collecting the data. All data reporters are expected to use the same[[data structure definition>>doc:sdmx:Glossary.Datastructure definition.WebHome]]([[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]]).61 +It is not important from the NSO or Central Bank’s perspective to understand every aspect of the analysis which went into the creation of the data structure, as the SDMX Data Structure Definition to be used for reporting will be provided by the agency collecting the data. All data reporters are expected to use the same data structure definition (DSD). 59 59 60 -It is important to understand the [[data structure definition>>doc:sdmx:Glossary.Data structure definition.WebHome]], because this is the resource which describes how the reported aggregates themselves must be structured. Below is view of the Effective Exchange Rates[[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]].63 +It is important to understand the data structure definition, because this is the resource which describes how the reported aggregates themselves must be structured. Below is view of the Effective Exchange Rates DSD. 61 61 62 -The [[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]] is an XML file, created according to the[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]standard (it can also be expressed in[[SDMX-EDI>>doc:sdmx:Glossary.SDMX-EDI.WebHome]], but the XML(% style="color:#e74c3c" %)version(%%)is more common). The fact that is XML is important to the IT staff who must process it, and is used by many[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] tools, but what is most important is that the statistical(% style="color:#e74c3c" %)concepts(%%)and codelists (or “classifications”) it uses are also used in the reported aggregate data file. Thus, we do not look at the XML for this example – it is enough to know that the XML(% style="color:#e74c3c" %)version(%%)of the[[DSD>>doc:sdmx:Glossary.Datastructuredefinition.WebHome]] exists, and may be needed by tools or developers at some point.65 +The DSD is an XML file, created according to the SDMX-ML standard (it can also be expressed in SDMX-EDI, but the XML version is more common). The fact that is XML is important to the IT staff who must process it, and is used by many SDMX tools, but what is most important is that the statistical concepts and codelists (or “classifications”) it uses are also used in the reported aggregate data file. Thus, we do not look at the XML for this example – it is enough to know that the XML version of the DSD exists, and may be needed by tools or developers at some point. 63 63 64 -[[image:SDMX_2-1_User_Guide_draft_0-1_html_bd8d57aa9f34e7ce.jpg||data-xwiki-image-style-alignment="center" height="227" width="575"]] 67 +(% style="text-align: center;" %) 68 +[[image:SDMX_2-1_User_Guide_draft_0-1_html_bd8d57aa9f34e7ce.jpg||height="227" width="575"]] 65 65 66 66 (% style="text-align: center;" %) 67 -**{{id name="image_38"/}}Figure 38: Example [[Data Structure Definition>>doc:sdmx:Glossary.Datastructure definition.WebHome]]([[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]]) for Effective Exchange Rates with[[Code>>doc:sdmx:Glossary.Code.WebHome]]List**71 +**{{id name="image_38"/}}Figure 38: Example Data Structure Definition (Dimensions) for Effective Exchange Rates with Code List** 68 68 69 -[[image:SDMX_2-1_User_Guide_draft_0-1_html_7e32b093ddbdc2d7.jpg||data-xwiki-image-style-alignment="center" height="398" width="576"]] 73 +(% style="text-align: center;" %) 74 +[[image:SDMX_2-1_User_Guide_draft_0-1_html_7e32b093ddbdc2d7.jpg||height="398" width="576"]] 70 70 71 71 (% style="text-align: center;" %) 72 72 **{{id name="image_39"/}}Figure 39: Example Data Structure Definition (Attributes) for Effective Exchange Rates** ... ... @@ -73,157 +73,160 @@ 73 73 74 74 The view here shows a number of very important things: 75 75 76 -The ID, Agency, and (% style="color:#e74c3c" %)version(%%)of the[[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]]are displayed at the top of the screen. Below this, there is a listing of statistical(% style="color:#e74c3c" %)concepts(%%)which are used as[[dimensions>>doc:sdmx:Glossary.Dimension.WebHome]].(Frequency,[[Currency>>doc:sdmx:Glossary.Currency.WebHome]], Exchange Rate Type, etc.) Each of these(% style="color:#e74c3c" %)concepts(%%)will be taken from some authoritative source, and descriptions and definitions can be obtained from the organization which publishes and maintains the[[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]].In many cases, these(% style="color:#e74c3c" %)concepts(%%)will be the standard(% style="color:#e74c3c" %)concepts(%%)defined in the[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]Cross-Domain Statistical(% style="color:#e74c3c" %)Concepts(%%)document (which can be obtained at [[__www.__>>url:http://www.sdmx.org/]][[sdmx>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]].org[[)>>url:http://www.sdmx.org/]]. In other cases, they may be formally defined and documented by the maintaining agency which publishes the[[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]].It is important to understand the definition of these(% style="color:#e74c3c" %)concepts(%%), as they may be slightly different from those used by the NSO or Central Bank, but in most cases they will likely be very similar or the same as the(% style="color:#e74c3c" %)concepts(%%)already used at the national(% style="color:#e74c3c" %)level(%%)for this data.81 +The ID, Agency, and version of the DSD are displayed at the top of the screen. Below this, there is a listing of statistical concepts which are used as dimensions. (Frequency, Currency, Exchange Rate Type, etc.) Each of these concepts will be taken from some authoritative source, and descriptions and definitions can be obtained from the organization which publishes and maintains the DSD. In many cases, these concepts will be the standard concepts defined in the SDMX Cross-Domain Statistical Concepts document (which can be obtained at [[__www.sdmx.org__>>url:http://www.sdmx.org/]][[)>>url:http://www.sdmx.org/]]. In other cases, they may be formally defined and documented by the maintaining agency which publishes the DSD. It is important to understand the definition of these concepts, as they may be slightly different from those used by the NSO or Central Bank, but in most cases they will likely be very similar or the same as the concepts already used at the national level for this data. 77 77 78 -The (% style="color:#e74c3c" %)concepts(%%)used as[[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]]each take a value which has a standard[[representation>>doc:sdmx:Glossary.Representation.WebHome]].In most cases, this[[representation>>doc:sdmx:Glossary.Representation.WebHome]]will be a codelist – a standard classification which must be used to identify and describe the observations. In the righthand part of the screen, we can see which codelist is used to represent each(% style="color:#e74c3c" %)concept(%%)used as a[[dimension>>doc:sdmx:Glossary.Dimension.WebHome]].For example, the Frequency[[dimension>>doc:sdmx:Glossary.Dimension.WebHome]]uses a codelist called “CL_FREQ”(% style="color:#e74c3c" %)version(%%)1.0:83 +The concepts used as Dimensions each take a value which has a standard representation. In most cases, this representation will be a codelist – a standard classification which must be used to identify and describe the observations. In the righthand part of the screen, we can see which codelist is used to represent each concept used as a dimension. For example, the Frequency dimension uses a codelist called “CL_FREQ” version 1.0: 79 79 80 80 Frequency is perhaps the simplest example, as the reporting agency will generally know what the frequency of the data is, and have a record of this in their systems (quarterly, monthly, etc.) 81 81 82 -Notice that the Time [[dimension>>doc:sdmx:Glossary.Dimension.WebHome]]is not coded, but instead has a time value, indicating the time of the observation.87 +Notice that the Time dimension is not coded, but instead has a time value, indicating the time of the observation. 83 83 84 -The values for the [[codes>>doc:sdmx:Glossary.Code.WebHome]]may or may not match the classification used at the national(% style="color:#e74c3c" %)level(%%), and if they do not match then it will be necessary to(% style="color:#e74c3c" %)map(%%)the[[codes>>doc:sdmx:Glossary.Code.WebHome]]used internally against the classification used by the[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]].89 +The values for the codes may or may not match the classification used at the national level, and if they do not match then it will be necessary to map the codes used internally against the classification used by the SDMX DSD. 85 85 86 -For each statistical (% style="color:#e74c3c" %)concept(%%)used as a[[dimension>>doc:sdmx:Glossary.Dimension.WebHome]], it must be possible to provide a value as specified by the[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] [[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]].This might seem like a lot of work, but it is done for obvious and important reasons – if the collected data are to be comparable at the supranational(% style="color:#e74c3c" %)level(%%), then there must be a standard expression of the data, using the same statistical(% style="color:#e74c3c" %)concepts(%%)and classifications to identify and describe the observations. This is no different than when reporting aggregate data today – each data collector will want to have a specific expression of the data collected. What is different, with[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], however, is that the data collectors are harmonizing the DSDs used in each domain, and there is an effort, through the[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] [[Content-Oriented Guidelines>>doc:sdmx:Glossary.Content-Oriented Guidelines.WebHome]], to use identical statistical(% style="color:#e74c3c" %)concepts(%%)and[[representations>>doc:sdmx:Glossary.Representation.WebHome]]where this is possible.91 +For each statistical concept used as a dimension, it must be possible to provide a value as specified by the SDMX DSD. This might seem like a lot of work, but it is done for obvious and important reasons – if the collected data are to be comparable at the supranational level, then there must be a standard expression of the data, using the same statistical concepts and classifications to identify and describe the observations. This is no different than when reporting aggregate data today – each data collector will want to have a specific expression of the data collected. What is different, with SDMX, however, is that the data collectors are harmonizing the DSDs used in each domain, and there is an effort, through the SDMX Content-Oriented Guidelines, to use identical statistical concepts and representations where this is possible. 87 87 88 -Harmonization of data is a difficult process, but it is one which will result, in time, in more useful data (because it is more comparable), and also hopefully in a more uniform collection of data at the national (% style="color:#e74c3c" %)level(%%), because all reporting countries must conform to a standard[[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]]as they calculate aggregates, which in turn impacts the data collection process as shown in the GSBPM.93 +Harmonization of data is a difficult process, but it is one which will result, in time, in more useful data (because it is more comparable), and also hopefully in a more uniform collection of data at the national level, because all reporting countries must conform to a standard DSD as they calculate aggregates, which in turn impacts the data collection process as shown in the GSBPM. 89 89 90 -If we look again at the high- (% style="color:#e74c3c" %)level(%%)picture of the[[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]](above), we will also see a section which shows statistical(% style="color:#e74c3c" %)concepts(%%)used as “[[Attributes>>doc:sdmx:Glossary.Attribute.WebHome]]”. These are descriptive(% style="color:#e74c3c" %)concepts(%%), sometimes represented with[[codes>>doc:sdmx:Glossary.Code.WebHome]], or sometimes with strings. They are different from Dimensional(% style="color:#e74c3c" %)concepts(%%), because they are not always required – in the table [[image:SDMX_2-1_User_Guide_draft_0-1_html_7b5b4145814c1ed2.jpg||height="25" width="21"]] indicates “Conditional” and [[image:SDMX_2-1_User_Guide_draft_0-1_html_711950811766087b.jpg||height="21" width="23"]] indicates “Mandatory”.95 +If we look again at the high-level picture of the DSD (above), we will also see a section which shows statistical concepts used as “Attributes”. These are descriptive concepts, sometimes represented with codes, or sometimes with strings. They are different from Dimensional concepts, because they are not always required – in the table [[image:SDMX_2-1_User_Guide_draft_0-1_html_7b5b4145814c1ed2.jpg||height="25" width="21"]] indicates “Conditional” and [[image:SDMX_2-1_User_Guide_draft_0-1_html_711950811766087b.jpg||height="21" width="23"]] indicates “Mandatory”. 91 91 92 -An [[Attribute>>doc:sdmx:Glossary.Attribute.WebHome]]also has a relationship “[[Attribute Relationship>>doc:sdmx:Glossary.Attribute relationship.WebHome]]” to a construct such as a group of[[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]], which can be one or more[[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]], Observation etc. as discussed in Chapter 4.97 +An Attribute also has a relationship “Attribute Relationship” to a construct such as a group of Dimensions, which can be one or more Dimensions, Observation etc. as discussed in Chapter 4. 93 93 94 -In other ways, the [[Attributes>>doc:sdmx:Glossary.Attribute.WebHome]]are very similar to the[[Dimensions>>doc:sdmx:Glossary.Dimension.WebHome]]of the[[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]]– the coding (if they are coded) must be standard, as dictated by the[[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]], and for the same reasons.99 +In other ways, the Attributes are very similar to the Dimensions of the DSD – the coding (if they are coded) must be standard, as dictated by the DSD, and for the same reasons. 95 95 96 96 ==== 12.3.1.2 Formatting Aggregates with SDMX ==== 97 97 98 -Once it has been determined that the aggregate data can be expressed as [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], according to the rules of the[[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]], then we need to think about what is involved in actually creating the[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]format for the data. This is important because if we can express the aggregates as[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]], there are a number of tools which will become useful in performing later activities in the GSBPM.103 +Once it has been determined that the aggregate data can be expressed as SDMX, according to the rules of the DSD, then we need to think about what is involved in actually creating the SDMX-ML format for the data. This is important because if we can express the aggregates as SDMX-ML, there are a number of tools which will become useful in performing later activities in the GSBPM. 99 99 100 -There are several techniques for creating [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]][[data sets>>doc:sdmx:Glossary.Data set.WebHome]], and several choices will need to be made. First, there are several “flavours” of[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]: [[SDMX-EDI>>doc:sdmx:Glossary.SDMX-EDI.WebHome]](also known as GESMES/TS) and four types of[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]](the XML(% style="color:#e74c3c" %)version(%%)). This is a technical consideration, and it is typically the case that the data collector will dictate exactly which format is wanted. The XML formats include “Generic”, and “Structure Specific”. Different organizations use different flavours of[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]], but it is important to note that there are free tools available which will allow for transformations between these different flavours.105 +There are several techniques for creating SDMX-ML data sets, and several choices will need to be made. First, there are several “flavours” of SDMX: SDMX-EDI (also known as GESMES/TS) and four types of SDMX-ML (the XML version). This is a technical consideration, and it is typically the case that the data collector will dictate exactly which format is wanted. The XML formats include “Generic”, and “Structure Specific”. Different organizations use different flavours of SDMX-ML, but it is important to note that there are free tools available which will allow for transformations between these different flavours. 101 101 102 102 These are technical considerations which should be left to the IT staff, so we will not go into them in depth here – the reasons for using one or another are most often purely ITtechnical ones. 103 103 104 -We do need to look at the practical options for creating the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]files, however. The options are discussed in Chapter 4- Data and Metadata Creation and Reporting and the technical mechanism for achieving different outputs from a database is discussed in Annex 4 – Data Reader and Data Writer Functions. There are several types of tools which will allow the formatting of aggregates into the correct form: tools based on Excel, tools which take the data from a relational database such as Oracle, and tools which work within processing tools such as SAS or PCAxis. Again, we will not look at the technical details of these tools, as this is an IT issue, but it is important to be aware that there are several different tools available. Eurostat provides a free tool for “data mapping” which is broadly useful, and PCAxis will have native[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]support built into it in future(% style="color:#e74c3c" %)versions(%%). It should be noted that when working with processing applications such as SAS and SPSS, it is often the case that dedicated scripts will need to be written within those environments, as different national formats within those applications will require specific formatting scripts to produce[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]outputs.109 +We do need to look at the practical options for creating the SDMX-ML files, however. The options are discussed in Chapter 4- Data and Metadata Creation and Reporting and the technical mechanism for achieving different outputs from a database is discussed in Annex 4 – Data Reader and Data Writer Functions. There are several types of tools which will allow the formatting of aggregates into the correct form: tools based on Excel, tools which take the data from a relational database such as Oracle, and tools which work within processing tools such as SAS or PCAxis. Again, we will not look at the technical details of these tools, as this is an IT issue, but it is important to be aware that there are several different tools available. Eurostat provides a free tool for “data mapping” which is broadly useful, and PCAxis will have native SDMX support built into it in future versions. It should be noted that when working with processing applications such as SAS and SPSS, it is often the case that dedicated scripts will need to be written within those environments, as different national formats within those applications will require specific formatting scripts to produce SDMX-ML outputs. 105 105 106 106 ==== 12.3.1.3 SDMX and Analysis of Aggregates (Step 6) ==== 107 107 108 -It may not seem obvious that [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]is relevant to the process of analysis of aggregates, but it can sometimes be very useful. This will depend on which tools are used by an NSO to perform these various steps. Because most systems work well with XML generally – and because[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]is one flavour of XML –[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]canprovide some useful functions as the aggregates are analyzed and further processed.113 +It may not seem obvious that SDMX is relevant to the process of analysis of aggregates, but it can sometimes be very useful. This will depend on which tools are used by an NSO to perform these various steps. Because most systems work well with XML generally – and because SDMX-ML is one flavour of XML – SDMX can provide some useful functions as the aggregates are analyzed and further processed. 109 109 110 -In the preparation of draft outputs (Step 6.1), it may be helpful to use any of the various visualization tools based on [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]when looking at the data. Tools exist for doing graphical visualizations of the[[SDMX>>doc:sdmx:Glossary.Statisticaldataand metadata exchange.WebHome]] data, using modern technology packages such as the Flex[[code>>doc:sdmx:Glossary.Code.WebHome]]developed by the European Central Bank (http:~/~/[[code>>doc:sdmx:Glossary.Code.WebHome]].google.com/p/flexcb/ ). Other packages also exist, provided by various commercial vendors. Other free tools exist for producing Excel spreadsheets and HTML displays of the data.115 +In the preparation of draft outputs (Step 6.1), it may be helpful to use any of the various visualization tools based on SDMX when looking at the data. Tools exist for doing graphical visualizations of the SDMX data, using modern technology packages such as the Flex code developed by the European Central Bank (http:~/~/code.google.com/p/flexcb/ ). Other packages also exist, provided by various commercial vendors. Other free tools exist for producing Excel spreadsheets and HTML displays of the data. 111 111 112 -Especially if files are passed between several individuals while the draft outputs are prepared, it may be useful to exchange the [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]file, so that different individuals can use different visualizations of the same data while performing this work.117 +Especially if files are passed between several individuals while the draft outputs are prepared, it may be useful to exchange the SDMX-ML file, so that different individuals can use different visualizations of the same data while performing this work. 113 113 114 -The validation of outputs (Step 6.2) requires more than just data visualization, and it is here that [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]can provide some solid benefit. Some of the validation rules exist within the[[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]], and these can be automatically checked using free[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadataexchange.WebHome]] data and [[metadataset>>doc:sdmx:Glossary.Metadataset.WebHome]] tools, others exist within an[[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]where cross references, versioning, and request for deletions are validated to ensure the integrity of the[[structural metadata>>doc:sdmx:Glossary.Structural metadata.WebHome]].119 +The validation of outputs (Step 6.2) requires more than just data visualization, and it is here that SDMX-ML can provide some solid benefit. Some of the validation rules exist within the DSD, and these can be automatically checked using free SDMX data and metadata set tools, others exist within an SDMX Registry where cross references, versioning, and request for deletions are validated to ensure the integrity of the structural metadata. 115 115 116 -What [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] cannot validate is that the numbers reported are correct in terms of other values in the[[data set>>doc:sdmx:Glossary.Dataset.WebHome]]– that is, are they plausible values given the numbers reported in preceding periods, or in relation to other reported data. These are statistical issues that cannot be solved by[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]]-based technology, but which will require dedicated checks created by a statistician who understand the statistical issues.121 +What SDMX cannot validate is that the numbers reported are correct in terms of other values in the data set – that is, are they plausible values given the numbers reported in preceding periods, or in relation to other reported data. These are statistical issues that cannot be solved by SDMX-based technology, but which will require dedicated checks created by a statistician who understand the statistical issues. 117 117 118 -**Scrutinizing and explaining** the aggregates (Step 6.3) is something which typically involves visualization of the data (as for Step 6.1) but may also include the creation of specific tabular views for inclusion in reports. The same tools which provide the ability to visualize [[SDMX>>doc:sdmx:Glossary.Statisticaldataandmetadata exchange.WebHome]] data may also allow for the creation of tabular views for use in reports (Excel tables, etc.) but this will vary based on the systems within each NSO or Central Bank.123 +**Scrutinizing and explaining** the aggregates (Step 6.3) is something which typically involves visualization of the data (as for Step 6.1) but may also include the creation of specific tabular views for inclusion in reports. The same tools which provide the ability to visualize SDMX data may also allow for the creation of tabular views for use in reports (Excel tables, etc.) but this will vary based on the systems within each NSO or Central Bank. 119 119 120 -There is nothing in [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]which directly addresses **disclosure control** (Step 6.4) or the finalization of outputs (6.5), other than the use of visualization tools as described for earlier parts of Step 6. However, it should be noted that any corrections or edits to the data will need to be reflected in the[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]data to be reported. Depending on how the[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]is generated, this may involve going back to the tools and systems used to format the[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]in the first place, and making sure that the correct data are available in those tools for re-formatting as[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]].125 +There is nothing in SDMX which directly addresses **disclosure control** (Step 6.4) or the finalization of outputs (6.5), other than the use of visualization tools as described for earlier parts of Step 6. However, it should be noted that any corrections or edits to the data will need to be reflected in the SDMX-ML data to be reported. Depending on how the SDMX-ML is generated, this may involve going back to the tools and systems used to format the SDMX-ML in the first place, and making sure that the correct data are available in those tools for re-formatting as SDMX-ML. 121 121 122 122 === 12.3.2 Reporting/Dissemination (Step 7) === 123 123 124 124 Step 7 of the GSBPM covers the process of dissemination in its broadest sense – that is, all users of the data are the target of this process step, including organizations which collect the aggregate data from NSOs and Central Banks. Thus, the GSBPM addresses reporting and dissemination as a single set of activities. 125 125 126 -There are several types of data dissemination, and when we consider dissemination and reporting using the Internet this [[category>>doc:sdmx:Glossary.Category.WebHome]]is very broad. As we look at each sub-process in this step, we will need to consider this broad range of possibilities.131 +There are several types of data dissemination, and when we consider dissemination and reporting using the Internet this category is very broad. As we look at each sub-process in this step, we will need to consider this broad range of possibilities. 127 127 128 -In addition to the sub-processes described by the GSBPM, we also need to consider one aspect of [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]thatpotentially concerns all forms of reporting and dissemination, the[[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]Services (see[[SDMX Registry>>doc:sdmx:Glossary.SDMX Registry.WebHome]]/Repository).133 +In addition to the sub-processes described by the GSBPM, we also need to consider one aspect of SDMX that potentially concerns all forms of reporting and dissemination, the SDMX Registry Services (see SDMX Registry/Repository). 129 129 130 130 The first sub-process in Step 7 is the **updating of output systems**. This involves taking the aggregates as prepared in Step 6, and loading them into whatever systems are used to drive dissemination. Typically, this will involve database systems (e.g. Oracle) and - if the same database is not used to drive Web dissemination – also loading data into whatever system drives the views of data on the Website. 131 131 132 - [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]canbe used as a format for the **exchange of data between systems**, whether these systems are internal to an organization, or external, and thus it makes a good format for loading databases used in all types of dissemination. Further, because it is an XML format,[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]can be used as inputs to systems for creating HTML, PDF, Excel, and other output formats. An[[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]can make the reporting of such data more automated by using the data registration mechanism supported by a registry. The benefit of such a system is that – once new data have been “registered” (see below), the data collector can come and simply query the service for the new data. This helps to ease the burden of data reporting.137 +SDMX can be used as a format for the **exchange of data between systems**, whether these systems are internal to an organization, or external, and thus it makes a good format for loading databases used in all types of dissemination. Further, because it is an XML format, SDMX-ML can be used as inputs to systems for creating HTML, PDF, Excel, and other output formats. An SDMX Registry can make the reporting of such data more automated by using the data registration mechanism supported by a registry. The benefit of such a system is that – once new data have been “registered” (see below), the data collector can come and simply query the service for the new data. This helps to ease the burden of data reporting. 133 133 134 -This application of [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] tends to be very technical – because XML is well-supported by many types of systems, it is useful also in loading the databases used to drive dissemination. The details of this application are not something we will explore in any detail here.139 +This application of SDMX tends to be very technical – because XML is well-supported by many types of systems, it is useful also in loading the databases used to drive dissemination. The details of this application are not something we will explore in any detail here. 135 135 136 136 The next sub-processes in the GSBPM is the **preparation of outputs**, and the management of their release. This covers a wide variety of potential products based on the data: reports (typically printed and disseminated as PDF, combining tabular views of the aggregate data with explanatory text and analysis), HTML pages displayed on a Web-site, data downloads in various formats (Excel, CSV, etc.), and Web-based interfaces for querying the data, and for doing graphic visualizations, which may even be interactive. 137 137 138 - [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]canbe used as the single XML format for the creation of all other dissemination products, at least for providing the tabular views of the data. (Obviously, websites have more than just data on them.) Again, this can be a very IT-technical topic, but it is important to understand that there are many good technologies for “styling” XML to produce other outputs, including all of the ones typically found on statistical websites.143 +SDMX can be used as the single XML format for the creation of all other dissemination products, at least for providing the tabular views of the data. (Obviously, websites have more than just data on them.) Again, this can be a very IT-technical topic, but it is important to understand that there are many good technologies for “styling” XML to produce other outputs, including all of the ones typically found on statistical websites. 139 139 140 - [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]is also directly useful in two ways: as a format for reporting to data collectors and as a direct download format. The use of[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] as a download format has become increasingly popular, and in some cases has proven to be the most popular form of disseminated data available on Web-sites. Many users prefer this format because it is easy to process (being XML) and it is accompanied by rich metadata, including the[[structural metadata>>doc:sdmx:Glossary.Structuralmetadata.WebHome]]necessary for applications to process or visualize the data. Further, the format is predictable, allowing for easy use of the data coming from outside the organization.145 +SDMX is also directly useful in two ways: as a format for reporting to data collectors and as a direct download format. The use of SDMX as a download format has become increasingly popular, and in some cases has proven to be the most popular form of disseminated data available on Web-sites. Many users prefer this format because it is easy to process (being XML) and it is accompanied by rich metadata, including the structural metadata necessary for applications to process or visualize the data. Further, the format is predictable, allowing for easy use of the data coming from outside the organization. 141 141 142 -It is worth noting that for many organizations, [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]is being deployed using a Web service (such as those developed by ECB, IMF, and OECD). Such services allow for direct querying of the[[data sources>>doc:sdmx:Glossary.Data source.WebHome]], in[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]format, by any user allowed access to the service. Eurostat is currently developing a “Census[[Hub>>doc:sdmx:Glossary.Hub (dissemination architecture).WebHome]]” Web service for querying the census data to be collected in 2011. Here, the data for each country remains in the database of the country and role of the “[[hub>>doc:sdmx:Glossary.Hub (dissemination architecture).WebHome]]” is to broker a user query such that an “[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]Query” is sent to each relevant database which responds in[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]].The resultant responses are then combined by the[[hub>>doc:sdmx:Glossary.Hub (dissemination architecture).WebHome]].147 +It is worth noting that for many organizations, SDMX is being deployed using a Web service (such as those developed by ECB, IMF, and OECD). Such services allow for direct querying of the data sources, in SDMX-ML format, by any user allowed access to the service. Eurostat is currently developing a “Census Hub” Web service for querying the census data to be collected in 2011. Here, the data for each country remains in the database of the country and role of the “hub” is to broker a user query such that an “SDMX Query” is sent to each relevant database which responds in SDMX. The resultant responses are then combined by the hub. 143 143 144 -The “advanced” use of [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]– where an[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]]-capable database can create many dissemination products which transform the[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]into other formats, and even in an on-demand fashion for Web dissemination – can greatly simplify the process of preparing dissemination outputs. Instead of having to produce several parallel forms of the data, having a single[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]source means that, once loaded, print and PDF reports must be prepared, and static Web-pages created, but all other types of data dissemination are basically handled by systems which generate needed outputs (Excel, CSV, graphical visualizations,[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]) in an on-demand fashion.149 +The “advanced” use of SDMX – where an SDMX-capable database can create many dissemination products which transform the SDMX into other formats, and even in an on-demand fashion for Web dissemination – can greatly simplify the process of preparing dissemination outputs. Instead of having to produce several parallel forms of the data, having a single SDMX source means that, once loaded, print and PDF reports must be prepared, and static Web-pages created, but all other types of data dissemination are basically handled by systems which generate needed outputs (Excel, CSV, graphical visualizations, SDMX-ML) in an on-demand fashion. 145 145 146 -The figure below illustrates the basic principle behind this type of [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]use.151 +The figure below illustrates the basic principle behind this type of SDMX use. 147 147 148 -[[image:SDMX_2-1_User_Guide_draft_0-1_html_6ed83c791a279c7f.jpg||data-xwiki-image-style-alignment="center" height="378" width="504"]] 153 +(% style="text-align: center;" %) 154 +[[image:SDMX_2-1_User_Guide_draft_0-1_html_6ed83c791a279c7f.jpg||height="378" width="504"]] 149 149 150 150 (% style="text-align: center;" %) 151 151 **{{id name="image_40"/}}Figure 40: SDMX as the pivotal format in a dissemination system** 152 152 153 -It should be noted that when [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]represents a[[dissemination format>>doc:sdmx:Glossary.Disseminationformat.WebHome]]in its own right, the[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]structure file containing the[[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]]and all its[[components>>doc:sdmx:Glossary.Component.WebHome]]should be provided along with the[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]][[data set>>doc:sdmx:Glossary.Dataset.WebHome]]it structures, as users will want both types of files for use in their own systems. In most cases, the[[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]] files will be available from their agency, but in this case a link to that source should be readily accessible to users (this may be through an[[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]– see below).159 +It should be noted that when SDMX-ML represents a dissemination format in its own right, the SDMX-ML structure file containing the DSD and all its components should be provided along with the SDMX-ML data set it structures, as users will want both types of files for use in their own systems. In most cases, the DSD files will be available from their agency, but in this case a link to that source should be readily accessible to users (this may be through an SDMX Registry – see below). 154 154 155 -Typically, all data products (including on-demand delivery via a Web service or query interface) are loaded into a “staging” environment, so that they can be subjected to [[quality assurance>>doc:sdmx:Glossary.Qualitymanagement - quality assurance.WebHome]] before being actually disseminated.[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] does not change this aspect of the dissemination and reporting process, but does place an emphasis on the proper testing of Web-delivery for data.161 +Typically, all data products (including on-demand delivery via a Web service or query interface) are loaded into a “staging” environment, so that they can be subjected to quality assurance before being actually disseminated. SDMX does not change this aspect of the dissemination and reporting process, but does place an emphasis on the proper testing of Web-delivery for data. 156 156 157 -The next sub-process in the GSBPM is the **promotion of dissemination** products. [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]is extremely useful in this regard, although not perhaps in a fashion which is obvious. This process in the GSBPM is typically seen as the “advertising” of the statistical products, and[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]is not much use here except that the use of leading-edge standards may offer some opportunities for promotion (presentations at conferences, etc.).163 +The next sub-process in the GSBPM is the **promotion of dissemination** products. SDMX is extremely useful in this regard, although not perhaps in a fashion which is obvious. This process in the GSBPM is typically seen as the “advertising” of the statistical products, and SDMX is not much use here except that the use of leading-edge standards may offer some opportunities for promotion (presentations at conferences, etc.). 158 158 159 -Far more interesting in increasing the visibility and use of data is the existence of the [[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]Services, which provide a platform for the automatic discovery of data products. Users have become used to the idea that resources can be “Googled”, and while the[[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]] services are not part of Google itself, they do provide a focused way of searching for all of the data produced within a domain, regardless of which site the data is published on.165 +Far more interesting in increasing the visibility and use of data is the existence of the SDMX Registry Services, which provide a platform for the automatic discovery of data products. Users have become used to the idea that resources can be “Googled”, and while the SDMX Registry services are not part of Google itself, they do provide a focused way of searching for all of the data produced within a domain, regardless of which site the data is published on. 160 160 161 -In essence, the [[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]Services provide an online catalogue, listing all of the data available within a community. That community can be open or closed, depending on who is allowed access to the catalogue. Thus, there are registries today which only provide access to data collectors, such as the[[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]used by the Joint External Debt[[Hub>>doc:sdmx:Glossary.Hub(disseminationarchitecture).WebHome]] (it is only visible to the organizations which exchange data: the BIS, the IMF, OECD, and the World Bank).[[SDMX Registries>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]can be public, however, which means that any Website or Internet-aware application could search for all of the data listed in that catalog, and then go to the site where that data is found.167 +In essence, the SDMX Registry Services provide an online catalogue, listing all of the data available within a community. That community can be open or closed, depending on who is allowed access to the catalogue. Thus, there are registries today which only provide access to data collectors, such as the SDMX Registry used by the Joint External Debt Hub (it is only visible to the organizations which exchange data: the BIS, the IMF, OECD, and the World Bank). SDMX Registries can be public, however, which means that any Website or Internet-aware application could search for all of the data listed in that catalog, and then go to the site where that data is found. 162 162 163 -This is a very powerful thing: increasingly, this approach to locating data is being used, because it leverages the latest generation of Web-based technology. Exposing the existence of your data to these types of sites and applications, and making it queryable or otherwise accessible in [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]format, is a very efficient way to make it visible and available to re-publishers and users of all types.169 +This is a very powerful thing: increasingly, this approach to locating data is being used, because it leverages the latest generation of Web-based technology. Exposing the existence of your data to these types of sites and applications, and making it queryable or otherwise accessible in SDMX-ML format, is a very efficient way to make it visible and available to re-publishers and users of all types. 164 164 165 165 === 12.3.3 Archiving (Step 8) === 166 166 167 - [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]is not specifically designed to support archiving, which is Step 8 of the GSBPM, but it is worth noting a few significant aspects of the standard which can be useful in this process. First, because[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]is XML, it provides a format which is not specific to any particular software package. Because of this, it can be used as a good archival format. Second, because[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] has an XML expression of the structures in the[[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]], it is possible to always understand how an[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]][[data set>>doc:sdmx:Glossary.Dataset.WebHome]]is structured, such that it can always be easily processed. Third,[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] has strict rules about versioning. For archival use, this is good, because changes in the[[data sets>>doc:sdmx:Glossary.Dataset.WebHome]]and their structures over time can be recorded and stored.173 +SDMX is not specifically designed to support archiving, which is Step 8 of the GSBPM, but it is worth noting a few significant aspects of the standard which can be useful in this process. First, because SDMX-ML is XML, it provides a format which is not specific to any particular software package. Because of this, it can be used as a good archival format. Second, because SDMX has an XML expression of the structures in the DSD, it is possible to always understand how an SDMX-ML data set is structured, such that it can always be easily processed. Third, SDMX has strict rules about versioning. For archival use, this is good, because changes in the data sets and their structures over time can be recorded and stored. 168 168 169 -Thus, while [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]is not explicitly designed as an archival format, there are aspects to it which are very useful in this process.175 +Thus, while SDMX is not explicitly designed as an archival format, there are aspects to it which are very useful in this process. 170 170 171 171 === 12.3.4 The GSPBM and Other Relevant Aspects of SDMX === 172 172 173 -One feature of [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]thatshould be mentioned is the ability to document standard statistical processes. This is done by describing a Process, which is made up of Process Steps, which may themselves have sub-steps. Each step or sub-step can have inputs and outputs, and can be named and described. A Process Step can link to another Process Step either as a[[hierarchy>>doc:sdmx:Glossary.Hierarchy.WebHome]]or by reference via a Transition. The Computation involved in a Process Step can be documented, including the actual software used in the Process Step. Note that there is no support yet in[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]for specifying actual computations in a way that can be invoked by software.179 +One feature of SDMX that should be mentioned is the ability to document standard statistical processes. This is done by describing a Process, which is made up of Process Steps, which may themselves have sub-steps. Each step or sub-step can have inputs and outputs, and can be named and described. A Process Step can link to another Process Step either as a hierarchy or by reference via a Transition. The Computation involved in a Process Step can be documented, including the actual software used in the Process Step. Note that there is no support yet in SDMX for specifying actual computations in a way that can be invoked by software. 174 174 175 -[[image:SDMX_2-1_User_Guide_draft_0-1_html_743cde9d6320fef2.jpg||data-xwiki-image-style-alignment="center" height="333" width="576"]] 181 +(% style="text-align: center;" %) 182 +[[image:SDMX_2-1_User_Guide_draft_0-1_html_743cde9d6320fef2.jpg||height="333" width="576"]] 176 176 177 177 (% style="text-align: center;" %) 178 178 **{{id name="image_41"/}}Figure 41: Schematic of Model for Process in SDMX** 179 179 180 -The process can be expressed in [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]], so that documentation can be produced in many useful formats, using the same types of transforms described earlier for disseminating statistical[[data sets>>doc:sdmx:Glossary.Dataset.WebHome]].Thus, a PDF or HTML(% style="color:#e74c3c" %)version(%%)of a process description could be generated from the XML.187 +The process can be expressed in SDMX-ML, so that documentation can be produced in many useful formats, using the same types of transforms described earlier for disseminating statistical data sets. Thus, a PDF or HTML version of a process description could be generated from the XML. 181 181 182 182 It is easy to see that a particular organization could use the GSBPM as the basis of such a process, describe each input and output, and then send this to another organization, so that the exact processing of the data was clear. 183 183 184 -There is currently no particular requirement from Eurostat or other data collectors for this functionality of [[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], but it is being implemented by some NSOs internationally, for internal process descriptions being exchanged between departments within the organization. In future, this feature may be used between organizations as well.191 +There is currently no particular requirement from Eurostat or other data collectors for this functionality of SDMX, but it is being implemented by some NSOs internationally, for internal process descriptions being exchanged between departments within the organization. In future, this feature may be used between organizations as well. 185 185 186 186 == 12.4 Summary == 187 187 188 -Our example involves the micro-data coming from external sources being recoded and aggregated, with consequent reporting and dissemination of the tabulated data. To provide a view of how [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]canbe used in this scenario, the relevant parts of the GSBPM are highlighted below, and a summary table provided.195 +Our example involves the micro-data coming from external sources being recoded and aggregated, with consequent reporting and dissemination of the tabulated data. To provide a view of how SDMX can be used in this scenario, the relevant parts of the GSBPM are highlighted below, and a summary table provided. 189 189 190 -[[image:SDMX_2-1_User_Guide_draft_0-1_html_a844509ed340f102.jpg||data-xwiki-image-style-alignment="center" height="564" width="493"]] 197 +(% style="text-align: center;" %) 198 +[[image:SDMX_2-1_User_Guide_draft_0-1_html_a844509ed340f102.jpg||height="564" width="493"]] 191 191 192 192 (% style="text-align: center;" %) 193 193 **{{id name="image_42"/}}Figure 42: Summary showing the processes supported by SDMX** 194 194 195 -The table below summarizes each step in the GSBPM where [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]is used in our scenario.203 +The table below summarizes each step in the GSBPM where SDMX is used in our scenario. 196 196 197 197 (% style="width:1217.45px" %) 198 198 |**GSBPM Step**|**Use of SDMX**|(% style="width:505px" %)**Notes** 199 -|5.7 Calculate Aggregates|No direct use – may influence earlier steps in collection process|(% style="width:505px" %)Derived variables and recodes must match the requirements of the standard [[DSD>>doc:sdmx:Glossary.Data structure definition.WebHome]]200 -|5.8 Finalize Data Files|Use of [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]][[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]]and data formats to format aggregates|(% style="width:505px" %)Used to pass data and structure to subsequent process steps201 -|6.1 Prepare Draft Outputs| [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]canhelp to visualize and process data, and is used as a source format for outputs|(% style="width:505px" %)Relies on technologies which easily transform XML into other output formats202 -|6.2 Validate Outputs| [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]provides validation of all rules in the[[DSD>>doc:sdmx:Glossary.Datastructure definition.WebHome]](correct[[codes>>doc:sdmx:Glossary.Code.WebHome]], complete and valid descriptions and keys, etc.)|(% style="width:505px" %)Some validation can be validated by XML schema (e.g. use of valid[[codes>>doc:sdmx:Glossary.Code.WebHome]]and[[dimension>>doc:sdmx:Glossary.Dimension.WebHome]]Ids), some validation can be undertaken with other[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] constructs such as(% style="color:#e74c3c" %)constraints(%%), whilst some cannot be performed using[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]structures e.g. comparison of numbers to determine plausibility203 -|6.3 Scrutinize and Explain| [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]visualizations may help to easily view data and generate views for output products|(% style="width:505px" %)204 -|6.4 Apply Disclosure Control| [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]visualizations help to verify disclosure processing|(% style="width:505px" %)Not a primary application of[[SDMX>>doc:sdmx:Glossary.Statistical data and metadata exchange.WebHome]], which does not dictate anything about disclosure205 -|6.5 Finalize Outputs| [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]visualizations may provide views of data for final outputs; outputs may be generated on-demand for dissemination on Website, etc.|(% style="width:505px" %)[[SDMX>>doc:sdmx:Glossary.Statisticaldataandmetadata exchange.WebHome]] data must be updated if data are corrected206 -|7.1 Update Output Systems| [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]provides useful format for loading into output systems|(% style="width:505px" %)Most technology tools and databases provide good support for XML formats such as[[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]207 -|7.2 Produce Dissemination Products| [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]visualizations may provide views of data for final outputs; outputs may be generated on-demand for dissemination on Website, etc.|(% style="width:505px" %)208 -|7.3 Manage Release of Dissemination Products| [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]serves as a format for reporting and dissemination to some users/data collectors; serves as basis for generation of other outputs, whether static or on-demand|(% style="width:505px" %)207 +|5.7 Calculate Aggregates|No direct use – may influence earlier steps in collection process|(% style="width:505px" %)Derived variables and recodes must match the requirements of the standard DSD 208 +|5.8 Finalize Data Files|Use of SDMX-ML DSD and data formats to format aggregates|(% style="width:505px" %)Used to pass data and structure to subsequent process steps 209 +|6.1 Prepare Draft Outputs|SDMX can help to visualize and process data, and is used as a source format for outputs|(% style="width:505px" %)Relies on technologies which easily transform XML into other output formats 210 +|6.2 Validate Outputs|SDMX-ML provides validation of all rules in the DSD (correct codes, complete and valid descriptions and keys, etc.)|(% style="width:505px" %)Some validation can be validated by XML schema (e.g. use of valid codes and dimension Ids), some validation can be undertaken with other SDMX constructs such as constraints, whilst some cannot be performed using SDMX structures e.g. comparison of numbers to determine plausibility 211 +|6.3 Scrutinize and Explain|SDMX visualizations may help to easily view data and generate views for output products|(% style="width:505px" %) 212 +|6.4 Apply Disclosure Control|SDMX visualizations help to verify disclosure processing|(% style="width:505px" %)Not a primary application of SDMX, which does not dictate anything about disclosure 213 +|6.5 Finalize Outputs|SDMX visualizations may provide views of data for final outputs; outputs may be generated on-demand for dissemination on Website, etc.|(% style="width:505px" %)SDMX data must be updated if data are corrected 214 +|7.1 Update Output Systems|SDMX provides useful format for loading into output systems|(% style="width:505px" %)Most technology tools and databases provide good support for XML formats such as SDMX-ML 215 +|7.2 Produce Dissemination Products|SDMX visualizations may provide views of data for final outputs; outputs may be generated on-demand for dissemination on Website, etc.|(% style="width:505px" %) 216 +|7.3 Manage Release of Dissemination Products|SDMX serves as a format for reporting and dissemination to some users/data collectors; serves as basis for generation of other outputs, whether static or on-demand|(% style="width:505px" %) 209 209 |7.4 Promote Dissemination products|((( 210 -Use of [[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]Services218 +Use of SDMX Registry Services 211 211 212 -provides a high (% style="color:#e74c3c" %)level(%%)of visibility for data220 +provides a high level of visibility for data 213 213 )))|(% style="width:505px" %)Depends on the availability of a domain registry for this purpose – requires that new data be registered automatically or manually 214 214 |8.2 Manage Archive Repository|((( 215 - [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]provides an easy format for generation of formats needed, based on the user demands on the archive;223 +SDMX provides an easy format for generation of formats needed, based on the user demands on the archive; 216 216 217 -strict (% style="color:#e74c3c" %)version(%%)control allows for explicit management of dependencies between data and metadata225 +strict version control allows for explicit management of dependencies between data and metadata 218 218 )))|(% style="width:505px" %) 219 -|8.3 Preserve Data and Associated Metadata|Rich metadata and application/platform independence make [[SDMX>>doc:sdmx:Glossary.Statisticaldataand metadata exchange.WebHome]] a good archival format|(% style="width:505px" %)227 +|8.3 Preserve Data and Associated Metadata|Rich metadata and application/platform independence make SDMX a good archival format|(% style="width:505px" %) 220 220 221 -The benefits of using [[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]] here are several:229 +The benefits of using SDMX here are several: 222 222 223 -* Standard data structures, statistical (% style="color:#e74c3c" %)concepts(%%)and classifications, and formats make it easy to process and compare similar types of data from different national sources, both for data collectors and other users224 -* Richer [[dissemination format>>doc:sdmx:Glossary.Dissemination format.WebHome]], complete with metadata, supports not only good visualization of data, but also allows easy downloading and use of data in internal systems225 -* [[SDMX-ML>>doc:sdmx:Glossary.SDMX-ML.WebHome]]provides an excellent format for having a single source of data which can be easily transformed into different output formats for use226 -* Data becomes easier to find and use, through [[SDMX Registry>>doc:sdmx:Glossary.SDMXRegistry.WebHome]]Services, promoting the use of the data231 +* Standard data structures, statistical concepts and classifications, and formats make it easy to process and compare similar types of data from different national sources, both for data collectors and other users 232 +* Richer dissemination format, complete with metadata, supports not only good visualization of data, but also allows easy downloading and use of data in internal systems 233 +* SDMX-ML provides an excellent format for having a single source of data which can be easily transformed into different output formats for use 234 +* Data becomes easier to find and use, through SDMX Registry Services, promoting the use of the data 227 227 * Data is archived in a long-lived format, independent of applications/platforms, and is accompanied by rich metadata, managed according to strict versioning rules 228 228 229 -In some other scenarios, [[SDMX>>doc:sdmx:Glossary.Statisticaldata andmetadata exchange.WebHome]] might also be useful as a data collection format, but in the case where micro-data are aggregated, the use of[[SDMX>>doc:sdmx:Glossary.Statisticaldata and metadata exchange.WebHome]]will be similar to that described here.237 +In some other scenarios, SDMX might also be useful as a data collection format, but in the case where micro-data are aggregated, the use of SDMX will be similar to that described here.