11 Annex 1 – Content Oriented Guidelines (COG)
- Contents
11.1 Scope of the COG
The SDMX Content-Oriented Guidelines are a set of work products focused on the use of SDMX within statistical domains, to support harmonization of data collection and dissemination across those domains. These documents are frequently updated, and can be found on the SDMX website at: http://sdmx.org/?page_id=11.
SDMX assumes that organizations responsible for coordinating activities within their domains will be the producers of domain-specific SDMX artefacts such as DSDs and MSDs. The intent of the COG is to produce artefacts that generally apply across a wide number of statistical domains – there may be exceptions within specific domains. Thus, organizations are encouraged to use the terminology, concepts, codelists, etc. described in these documents where applicable.
There is a single covering document, Content-Oriented Guidelines, and five annexes:
Annex 1 – Cross-Domain Concepts
Annex 2 – Cross-Domain Code Lists
Annex 3 – Statistical Subject-Matter Domains
Annex 4 – Metadata Common Vocabulary
Annex 5 – SDMX-ML for Content-Oriented Guidelines
The last of these is simply a compressed .ZIP file containing the SDMX-ML rendering of each of the other annexes, so this does not need further description.
Each of these documents will be described in turn, to assist the reader in knowing how to use them in their own implementations.
11.2 Content-Oriented Guidelines
This document is the covering document for the entire set of guidelines. It provides an overview, including some background on the COG and their scope and purpose, and it introduces the three main threads of the work: (1) cross-domain concepts and codelists; (2) the classifications of statistical subject-matter domains; and (3) the relevant standard terminology/vocabulary. A fourth major section describes the governance model for the COG, and related topics.
This document is one which should be read before reading the specific annexes.
11.3 COG Annex 1 – Cross-Domain Concepts
11.3.1 Scope
This document provides a set of concepts which apply broadly across statistical domains, used both for describing statistical data sets (as dimensions, attributes, or measures in DSDs) and for describing statistical metadata not related directly to data sets (as metadata attributes in DSDs). There is a listing of 66 top-level concepts in the current draft, although this list will be growing. Some of these are broken out into subordinate uses of the concepts for different applications. As one would expect, these cover many obvious areas such as geography and time, but also include many of the standard concepts used in reporting data quality. If the definition of a cross-domain concept is not exactly what it needs to be for a specific use, then a similar concept should be defined, but its agency should reflect its origins, and the “sdmx” agency should not be used.
The Cross-Domain Concepts are the product of many years of discussion, and interagency agreement in terms of using SDMX and the earlier GESMES/TS for describing data, and in many senses are already well-implemented in existing SDMX systems.
11.3.2 Usage of Cross-Domain Concepts
Definitions comprise, for example, frequency, reference area, possibly currency etc. They also contain statistical concepts which are in general used for the measurement or declaration of data quality. Examples are accuracy", comparability or timeliness". These quality-related concepts are normally implemented by statistical organisations within their respective Metadata Structure Definitions. This enables the applying statistical organisation to measure and analyse data quality in a structured and harmonised manner across the whole statistical production of this organisation.
The Guidelines provide for each SDMX Cross-domain concept an ID and a detailed description, which may be complemented by additional comments. The table below presents the two Cross-Domain concepts “Accuracy” and “Currency”.
Concept ID: ACCURACY Description: Closeness of computations or estimates to the exact or true values that the statistics were intended to measure. Context: The accuracy of statistical information is the degree to which the information correctly describes the phenomena. It is usually characterized in terms of error in statistical estimates and is often decomposed into bias (systematic error) and variance (random error) components. Accuracy can contain either measures of accuracy (numerical results of the methods for assessing the accuracy of data) or qualitative assessment indicators. It may also be described in terms of the major sources of error that potentially cause inaccuracy (e.g., coverage, sampling, non response, response error). Accuracy is associated with the "reliability" of the data, which is defined as the closeness of the initial estimated value to the subsequent estimated value. This concept can be broken down into: Accuracy - overall (summary assessment); Accuracy - non-sampling error; Accuracy - sampling error. Presentation: Free text Concept ID: CURRENCY Description: Monetary denomination of the object being measured. Presentation: CL_CURRENCY Error! Reference source not found.The figure below provides a simplified view of how the cross-domain concepts are used for defining data/metadata structure definitions in the SDMX framework. In the data/metadata structures they are usually combined with domain specific concepts.
Figure 34: Use of Concepts Defining Data/Metadata Structure Definitions in the SDMX Framework
The illustration shows that cross-domain concepts can have two different roles:
- As structural metadata: as dimensions in a data structure definition, to identify each statistical observation. For example, a dimension named “Reference Area” would explain which country or geopolitical aggregate a specific statistical observation refers to.
- As Reference metadata: Attributes in a data structure definition or in a metadata structure definition. Attributes provide information about the data, thus qualifying the data further (example, “unit of measure”) or can be used to report metadata, for example with concepts such as timeliness, reference period, classification system and data compilation. The values of these concepts may be coded, but are more often free text.
In the case where a concept can be represented as coded, there must be a link to the code list containing valid values that may be reported. If the concept is used as an attribute, the attachment level must be indicated. This means an indication of the data object or structure, e.g. “time series” or “observation”, to which the concept is linked.
11.4 COG Annex 2 – Cross-Domain Code Lists
The companion to the Cross-Domain Concepts is their representation using code lists. This annex identifies some recommended code lists for use with several of the crossdomain concepts, or even independent of the standard concepts. At the current time, there are 9 recommended code lists, with some issues and supplementary values identified for them.
It should be noted that often, agreement on code lists is harder to achieve than agreement on concepts, as code lists vary more from organization to organization and domain to domain than do concepts.
Unless a recommended code list has all needed values, it should not be used. A code list differing in any way from the ones provided should be given its own name and agency – the “sdmx” agency should never be used for an altered code list. It is useful to note that having a few unused codes in an SDMX structure is typically not an issue – SDMX provides constraints to handle this situation, so it is only when a superset is needed that a different code list should be used.
11.5 COG Annex 3 – Statistical Subject-Matter Domains
This set of guidelines provides a break-down of all the domains of official statistics, at a high level. It is based on work done at the UN/ECE, which created a database tracking organizations and their statistical activities. Thus, it reflects the actual state of statistical data collection at a high level within the world of official statistics.
The Statistical Subject-Matter Domains breaks down statistics into high-level categories – no greater than three levels deep, and often only two. It is intended to be used as the basis for organizing collections of statistical data within repositories and registries, and as such is provided in the form of an SDMX category scheme when expressed as SDMX-ML. It will often be the case that specific domains or organizations may wish to drive this classification to a more detailed level for actual use. In such a case, the provided classification will need to be extended.
The subject matter domain scheme is shown below.
Domain Scheme1
11.6 COG Annex 4 - Metadata Common Vocabulary (MCV)
This document provides definitions and other information regarding many of the terms and concepts which are relevant when using SDMX. Very often, it takes terms from other sources (such as the OECD Glossary of Terms or Eurostat’s CODED database). It is not intended to be a comprehensive definition of all terms related to official statistics – these other sources are broader in scope. It is intended to reflect the standard meanings found within an SDMX context.
In SDMX-ML, it is represented as a reference metadata report. This simply provides a useful XML format for those wishing to load this into a database or otherwise process it.
- ^ Reference source: United Nations Economic Commission for Europe. This classification of statistical subject-matter domains is based on that used in the Database of International Statistical Activities (DISA - http://unece.unog.ch/disa/ ). The DISA classification includes two additional domains covering statistical methodology and strategic managerial issues, which do not relate directly to data or metadata, so are not considered relevant for SDMX purposes.