Wiki source code of Guidelines for the Creation and Management of SDMX Code Lists
Last modified by Artur K. on 2026/05/29 14:28
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | {{box title="**Contents**"}} | ||
| 2 | {{toc/}} | ||
| 3 | {{/box}} | ||
| 4 | |||
| 5 | = DOCUMENT HISTORY = | ||
| 6 | |||
| 7 | |**Version**|**Date**|**Comment** | ||
| 8 | |1.0|02/12/2013|Initial version for cross-domain code lists. | ||
| 9 | |2.0|15/01/2015|Adapted to be a guideline for all code lists, not only cross-domain | ||
| 10 | |3.0|19/01/2018|Clarified text on definitions. Revised allowed characters such as leading zeroes in codes (now allowed). Removed superfluous text. Improved examples. | ||
| 11 | |4.0|12/2/2025|((( | ||
| 12 | Added sections on: | ||
| 13 | New features from version 3.0 of SDMX | ||
| 14 | Other practical issues and recommendations | ||
| 15 | General improvements to the text | ||
| 16 | ))) | ||
| 17 | |||
| 18 | = INTRODUCTION = | ||
| 19 | |||
| 20 | These guidelines are intended to support the creation of SDMX code lists to be used all along the statistical business process and in particular when SDMX is implemented in statistical domains. They are strongly recommended for use when SDMX-compliant data structure definitions (DSDs) are built-up and implemented in statistical domains. | ||
| 21 | |||
| 22 | In the [[SDMX Checklist for Design Projects>>url:https://statswiki.unece.org/display/SDMXPM/Checklist+for+SDMX+Data+Providers]] and the [[modelling guidelines>>url:https://sdmx.org/?page_id=4345#Modelling]], the creation of code lists is done in the sub-process “Fully define code lists”. | ||
| 23 | |||
| 24 | Originally this document was named "//Guidelines for the Creation and Management of SDMX Cross-Domain Code Lists//". Later, experience showed that these guidelines were also used for the development of other types of SDMX code lists (shared code lists, domain-specific code lists). It was therefore decided to review the document in order to convert it into a guideline applicable to all types of SDMX code lists. | ||
| 25 | |||
| 26 | Code lists are created to group related codes{{footnote}}In the traditional classification sphere, the words "category" or "position" are generally preferred to the word "code"; however, the use of the term "category" could be confusing as it is already used in the SDMX information model within another context.{{/footnote}} in a meaningful, systematic and standard format. They provide lists of codes that objects corresponding to a specific concept can be classified into. Each code should be well described. | ||
| 27 | |||
| 28 | Code lists are primarily used to: | ||
| 29 | |||
| 30 | * collect, disseminate, exchange and organise information; | ||
| 31 | * aggregate and disaggregate datasets in a meaningful way for complex analysis; | ||
| 32 | * present statistical information in a standard way; | ||
| 33 | * support policy and decision-making; | ||
| 34 | * standardise the measurement process. | ||
| 35 | |||
| 36 | SDMX cross-domain code lists can be found at two distinct places: | ||
| 37 | |||
| 38 | * [[SDMX official website>>url:https://sdmx.org/?page_id=3215]] (files available in MS-Word format) | ||
| 39 | * [[SDMX Global Registry>>url:https://registry.sdmx.org/items/codelist.html]] (files available in SDMX-ML format) | ||
| 40 | |||
| 41 | Other SDMX code lists can be found in regional and other registries. | ||
| 42 | |||
| 43 | = BASIC PRINCIPLES FOR THE CREATION OF SDMX CODE LISTS = | ||
| 44 | |||
| 45 | 1. SDMX code lists should refer to clear and well-defined statistical concepts, enabling data users to understand the statistical concepts and finally the data sets. Already existing standards (e.g. international classifications) should be considered. | ||
| 46 | 1. Consistency of the SDMX code lists across statistical domains and over time should be ensured. | ||
| 47 | 1. Key terms | ||
| 48 | |||
| 49 | Below are the key terms used in this document to describe SDMX code lists. | ||
| 50 | |||
| 51 | * A **code list** is a predefined list from which some statistical coded concepts take their values. | ||
| 52 | ** A **code list identifier** is a unique identifier given to the code list. The code list identifier consists of three mandatory elements: an **id**, a **version number** and a **reference to a maintenance agency**. | ||
| 53 | ** An **id** is a language-independent set of letters, numbers and/or symbols. To give SDMX code lists a clear visual identity, the code list id should be prefixed with CL_. | ||
| 54 | ** A code list **name** describes the content of the artefact to which the name is attached in a synthetic and clear way. In principle, the default language for names is English (however exceptions are possible, e.g. when geographic entities are expressed in the national languages). Multilingual representations are possible. | ||
| 55 | ** A code list **description** allows to describe the content of the artefact to which it is attached in a more detailed fashion than the artefact name.** **Multilingual representations are possible. In the specific case of code lists, the description is generally used to precisely define the coverage of a code, identifying what is included and what is excluded (e.g. wooden shoes are not considered as shoes but as handicraft). | ||
| 56 | |||
| 57 | A code list may also contain annotations, a uri, a description, attributes indicating the period of validity (e.g. "//valid to//" and "//valid from//") and an attribute indicating whether the code list is final. | ||
| 58 | |||
| 59 | * A **code** (basic element of the code list) is also represented by a mandatory **id** and a mandatory **name, **and an optional** description **(see the descriptions of the code list elements). The code may also contain annotations, and a uri{{footnote}}Uri: Uniform Resource Identifier{{/footnote}}. | ||
| 60 | |||
| 61 | While code names and descriptions are meant for interpretation by humans, **ids** are primarily designed to be read by machines. Nevertheless, it can often be helpful for data users that they are meaningful in accordance with the default language used for the name. When choosing the best approach implementers should also consider the possible impact on the code identifiers length. | ||
| 62 | |||
| 63 | Example | ||
| 64 | |||
| 65 | [[image:1768480235130-791.png]] | ||
| 66 | |||
| 67 | = BASIC CHARACTERISTICS OF SDMX CODE LISTS = | ||
| 68 | |||
| 69 | Conceptually SDMX code lists can have the following characteristics: | ||
| 70 | |||
| 71 | * A code list (e.g. geographical entity) can be referenced by several statistical concepts (i.e. declaring country, country of birth, partner country, etc.). | ||
| 72 | * The codes used should cover exhaustively the part of reality that is intended to be described by the code list. | ||
| 73 | * The codes have to be clearly defined. Codes with different coverage must have different code identifiers and names (e.g. Europe including Greenland must have a different identifier and name than Europe excluding it). Different codes should not have the same meaning or coverage. | ||
| 74 | * The coverage of the codes may however overlap partially within one code list (but may not be identical, except when code lists are based on established standard classifications where such repetitions are common (see //Unpredictable and extendable breakdowns (CUSTOM_BREAKDOWN)// at [[OTHER PRACTICAL ISSUES AND RECOMMENDATIONS>>path:#_Toc179573467]] section ). This means that the content of the categories is not necessarily mutually exclusive as is the established rule in statistical classifications. | ||
| 75 | * Codes may be at different levels of granularity. | ||
| 76 | * Multiple hierarchies (hierarchical code lists{{footnote}}The "Hierarchical code list" construct used in SDMX should not be mixed up with the concept used in traditional statistical classifications. In the latter case, the codes are organised based on one strictly defined hierarchy only. In the former case, several hierarchies can be defined. An example of this is a list of codes that support the theme of geographical location. Such a hierarchical code list could be viewed according to many different hierarchies. A political hierarchy would comprise an administrative regional breakdown within a country, a geographical breakdown would comprise a placing of the countries in continents, and an economic breakdown might place the countries in one or more economic communities (e.g. many of the countries in "Europe" could be both a part of the European Union and the OECD communities).{{/footnote}}) can be defined on top of a flat code list. An SDMX code list should be flexible in terms of allowing different possible hierarchies and be extendible by additional codes that may disaggregate or aggregate codes that are already in the list as well as by codes that extend the coverage of the code list. Each possible hierarchy may use all codes from the flat code list or just a subset. A flat code list provides the reservoir of codes for the hierarchies. A forthcoming (at the time of writing) guideline on SDMX Hierarchies will be available to describe the recommended approach for creating and maintaining code list hierarchies. | ||
| 77 | * New codes or levels need to be accommodated if needed (i.e. if a new country is recognised as a sovereign state the geographical entity code list should have a new version including it). | ||
| 78 | * Re-use of code identifiers between versions of the same code list should be avoided. In ISO-2 country code CS - Czechoslovakia was suppressed in 1993 and reused as CS - Serbia and Montenegro in 2003. This change was/is extremely difficult to handle, as many data systems will contain historical data for Czechoslovakia. | ||
| 79 | |||
| 80 | = BASIC CRITERIA FOR THE DEVELOPMENT OF SDMX CODE LISTS = | ||
| 81 | |||
| 82 | The following basic criteria should be respected when defining SDMX code lists: | ||
| 83 | |||
| 84 | * It is highly recommended that code lists be consistent, to the largest extent possible, with internationally agreed standards, whenever they exist, e.g. International Organization for Standardization (ISO), United Nations and other international organisations code lists. It is of no use creating a new code list where one already exists. The following order of priority is suggested when considering the use of existing code lists: | ||
| 85 | ** international standard classifications or code lists; | ||
| 86 | ** international classifications or code lists supplemented by other international and/or regional institutions; | ||
| 87 | ** standardised classifications or code lists used by individual international institutions. | ||
| 88 | * If official classifications are used for defining the SDMX code lists, totals, aggregates and other additional codes should be added following the recommendations of the issuing organisation for adding new elements (e.g. the addition of the world regions EU27, EFTA, etc to the ISO 3166 alpha 2 country list); | ||
| 89 | * When designing code lists, the needs of all phases of the statistical business process and of all statistical domains using the respective SDMX code lists should be considered. However, due to legacy systems, it may not always be possible to accommodate all needs from the start (e.g. different habits in data exchange and in dissemination). In such cases, some transition period will often be necessary before reaching convergence. | ||
| 90 | |||
| 91 | Technically, SDMX code lists should respect the principles described in the following paragraphs. | ||
| 92 | |||
| 93 | * SDMX code ids take values from uppercase A to Z, 0 to 9 and "_" only{{footnote}}In the traditional classification sphere, the words "category" or "position" are generally preferred to the word "code"; however, the use of the term "category" could be confusing as it is already used in the SDMX information model within another context.{{/footnote}}. No other characters should be used. | ||
| 94 | * Even though technically allowed in the standard, it is highly recommended not to use lower case characters to avoid possible confusion and technical issues with upper case characters. Exceptions should only be made for code lists based on classifications managed by external bodies (such as ISO). In case an external classification is using lower case characters, they may also be used in the respective SDMX code lists to make their usage more intuitive for regular users of the classification. However, it is still not allowed to create two code list items that only differ by the case of the characters (e.g., "EN" and "en"). If for whatever reason an organisation opts for lower case characters, all coded information should then be in lower case characters. | ||
| 95 | * Accented characters for a code are not allowed by the standard. | ||
| 96 | |||
| 97 | * Underscore ("_") is generally used for the combination of codes (whether consecutive or not). | ||
| 98 | * A list of generic codes to be used for all SDMX code lists has been developed. This generic list includes concepts which can be expected to appear in many, if not, all code lists, e.g. "total", "non-response", "not allocated", "unknown", other, etc. | ||
| 99 | |||
| 100 | Code names should be between 1 and 254 characters. The characters used should belong to the UTF-8 character set. | ||
| 101 | |||
| 102 | * Code names are in general defined in English; other language versions may be added. Certain exceptions can be considered in the case of codes which are normally not translated (e.g. regions or territories for which the national designation could be used, agencies for which a national and English label could be used, etc.). | ||
| 103 | |||
| 104 | Descriptions could be used if more details on the contents or on the code descriptions are needed. Multilingual representations are possible. | ||
| 105 | |||
| 106 | == Meaningful code identifiers vs meaningless code identifiers == | ||
| 107 | |||
| 108 | Code identifiers can be either meaningful, i.e. the identifier is a shortened representation or expression of the code name (e.g. "DE_FIN" for "German Ministry of Finance"), or meaningless, i.e. the code identifier is chosen at random and conveys no information whatsoever as to the content of the code name (e.g. "6E" standing for "European Space Agency"). | ||
| 109 | |||
| 110 | In SDMX, both coding systems are acceptable, although it is recommended to use meaningful codes if possible. It is up to implementers to decide which coding system they want to use**, considering that there could be a trade-off between meaningfulness of identifiers and their length which impacts the maintainability of identifiers/key series codes, size and costs of data files and databases.** The choice of the coding system may be influenced by parameters like the number of identifiers to code, the complexity of the concepts to code, etc. | ||
| 111 | |||
| 112 | In cases where self-explanatory (meaningful) identifiers are chosen, implementers should pay attention not to use excessively long codes. | ||
| 113 | |||
| 114 | Meaningful codes can be self-explanatory in one language only; by default, this language is English. | ||
| 115 | |||
| 116 | === Expressing numeric ranges in code identifiers === | ||
| 117 | |||
| 118 | SDMX implementers may wish to express numeric ranges in code identifiers. This is allowed but in no way imposed. | ||
| 119 | |||
| 120 | As an example, let’s take a code list dealing with age classes. The basic units used for representing age are "Y" (Year), "M" (Month), "W" (Week) and "D" (Day). To express concepts like "from 15 years to 20 years excluding 16 years", "over 30 years", "less than 50 years", "two and three months", "four days or over", "three years or less", the following set of standard "operators" is proposed: | ||
| 121 | |||
| 122 | * **"T"** for expressing ranges (from 3 to 9), | ||
| 123 | * **"_" **for the combination of two codes, whether consecutive or not (5 and 6; A and F), | ||
| 124 | * **"X"** for expressing "except" or "excluding", | ||
| 125 | * **"GT"** for "greater than", **"LT"** for "less than", **"GE"** for "equal to or greater than" and **"LE"** for "equal to or less than". | ||
| 126 | |||
| 127 | The result for the examples mentioned in the previous paragraph is as follows: | ||
| 128 | |||
| 129 | * from 15 years to 20 years excluding 16 years: Y15T20X16 | ||
| 130 | * over 30 years: Y_GT30 (in this case the "_" is used to make a clear distinction between the unit and the "operator" "GT"). | ||
| 131 | * less than 50 years: Y_LT50 | ||
| 132 | * two and three months: M2_3 | ||
| 133 | * four days or over: D_GE4 | ||
| 134 | * three years or less: Y_LE3 | ||
| 135 | |||
| 136 | Such a coding system has limits in what it can do. It will most probably not be appropriate for coding ranges with decimals (ending up in too long code identifiers) or too complex concepts. Again, it will be up to implementers to decide whether this coding system can meet their specific needs. | ||
| 137 | |||
| 138 | === Generic codes === | ||
| 139 | |||
| 140 | These codes are recommended to be used when compiling code lists that require the items in the table below, e.g. when a "Not applicable" item is required. The codes provide code values for concepts referenced in a large number of code lists. | ||
| 141 | |||
| 142 | The codes proposed here are present in a very large number of SDMX code lists because they cover very general and extensively used concepts. Thus, the main purpose of this list of codes is to propose standardised code identifiers which can be extensively reused. | ||
| 143 | |||
| 144 | The leading underscore is used to avoid clashes with other codes and to singularise these concepts. | ||
| 145 | |||
| 146 | These codes should be considered as "reserved codes" or "protected codes", meaning that it is strongly recommended not to use them for other purposes than the ones described below. | ||
| 147 | |||
| 148 | The codes can also be used as suffixes or be augmented with sequential numbers, especially if a category needs to be used multiple times. For example, multiple sub-totals (e.g. CATTLE_S1, CATTLE_S2, PLANTS_S1) or multiple residual categories (e.g. CATTLE_O, PLANTS_O) may be required. | ||
| 149 | |||
| 150 | In cases where the use of the suffixes below could lead to confusions and misunderstandings, implementers may of course implement other solutions. Let us take the example of a code list where letters are used as codes (e.g. the United Nations ISIC classification): in this case, code element M_O could be interpreted either as "sections M and O" or as "section M not elsewhere classified". In this case implementers could create codes M_OTH or M_OTHER to differentiate the suffixes. | ||
| 151 | |||
| 152 | |**Recommended Code Value**|**Recommended Code Description**|**Annotation** | ||
| 153 | |**_L**|Local extension|((( | ||
| 154 | To be used to uniquely identify local (e.g. national, regional, sub-regional) extensions of SDMX code lists and by doing so: | ||
| 155 | |||
| 156 | * avoid conflict of local extensions with existing codes, and | ||
| 157 | * avoid conflict between revisions of the underlying code lists and the local extensions in terms of coding. | ||
| 158 | ))) | ||
| 159 | |**_N**|Non response|Failure to obtain a measurement on one or more study variables for one or more elements in a survey. | ||
| 160 | |**_O**|Other|Used to cover residual information not contained in other categories of the code list (in some contexts, e.g. classifications, referred to as n.e.s., not elsewhere specified, n.e.c., not elsewhere classified, etc.) | ||
| 161 | |**_S**|Subtotal|Used for expressing intermediate totals | ||
| 162 | |**_T**|Total|Used for expressing totals | ||
| 163 | |**_U**|No data/unknown|Failure to obtain a measurement (e.g. non response, no data available, information not known by the respondent unit, etc.) | ||
| 164 | |**_X**|Not allocated/unspecified|Used where the value for a particular variable falls outside the expected range. An example could be the failure to allocate a classification to a particular unit due to insufficient information, and/or if further breakdown over any related items mentioned in code list not available | ||
| 165 | |**_Z**|Not applicable|((( | ||
| 166 | Used in cases where the coding of a concept is technically required (dimension or mandatory attribute), but does not have a statistical meaning for a specific series or observation. | ||
| 167 | |||
| 168 | Examples of relevant usages of _Z are: | ||
| 169 | |||
| 170 | * In a survey that has questions on “Marital Status” and “Sex of Partner”, for a response of Marital Status: Single, the relevant response for Sex of Partner is “Not applicable” (_Z) | ||
| 171 | * In a labour statistics reporting framework that has a dimension “Outside Labour Force Reason”. If the observation entity is inside of the labour force, the response for Outside Labour Force Reason is “Not applicable” (_Z) | ||
| 172 | * In macro-economic statistics, a DSD may contain a dimension for breakdowns of economic activity. Some series cannot be broken down by economic activity (e.g. statistical discrepancy between GDP approaches, certain taxes and subsidies, producer price index). They could be coded as “Not applicable” (_Z) for the activity dimension in the data message. | ||
| 173 | |||
| 174 | The code _Z should be used sparsely{{footnote}}At DSD design time the pros and cons of using _Z should be considered carefully. The occurrences of _Z can be reduced by: | ||
| 175 | • using the semantically approximate _T instead of _Z; | ||
| 176 | • having a larger number of compact DSDs to avoid the need to introduce _Z; | ||
| 177 | • concepts with _Z could be defined as attributes rather than dimensions; | ||
| 178 | • a dimension containing _Z could be merged with another (conceptually related) dimension as a sub-hierarchy.{{/footnote}}, since it may complicate validation and transformation formulas or pivot visualisations based on SDMX messages. The code _T (Total) should be used instead in cases where the applicability of the concept is not entirely clear, and where the concept may be relevant only under certain circumstances, or where the applicability of a concept for a certain series or dataflow depends on the context (e.g. a tax breakdown that exists in one country but not in another) in order to simplify the coding across data flows and countries. | ||
| 179 | ))) | ||
| 180 | |||
| 181 | = MAINTENANCE OF SDMX CODE LISTS = | ||
| 182 | |||
| 183 | == Maintenance agency == | ||
| 184 | |||
| 185 | Within SDMX, each code list requires a maintenance agency. For SDMX cross-domain code lists, the maintenance agency (marked as "SDMX") is the SDMX Statistical Working Group (SWG). For other code lists used in international data exchange this is - in general - one or several international organisation(s) linked to SDMX. Clear rules for the maintenance of the SDMX code lists need to be established. There is a guideline{{footnote}}https://sdmx.org/wp-content/uploads/SDMX-Structural-Metadata-Governance.docx{{/footnote}} “A reference framework for SDMX structural metadata governance” that provides a best practice in establishing structural metadata governance. | ||
| 186 | |||
| 187 | Changes to a given code list will lead to a new version of the code list. SDMX code lists are stored in the SDMX Global Registry{{footnote}}https://registry.sdmx.org/items/codelist.html{{/footnote}} for making them accessible to SDMX implementers. Other code lists are stored in regional or other //ad hoc// registries. | ||
| 188 | |||
| 189 | == Versioning of code lists == | ||
| 190 | |||
| 191 | A general document providing guidelines on the versioning of SDMX artefacts can be found on the SDMX official website{{footnote}}https://sdmx.org/{{/footnote}}. | ||
| 192 | |||
| 193 | In general, versioning is based on the impact severity for clients resulting from any change. If a code is removed from a codelist or a code’s usage context is changed (Cat -> Dog), then backward compatibility is broken and it is a major change. If a code is added then there is no impact to backward compatibility and it is a minor change. If a code label is changed but the context remains the same (e.g. a typo fix), it is a patch change. | ||
| 194 | |||
| 195 | A recommended way to mark codes as removed without triggering a major version change is to prefix the code label with “[Deprecated] …”. E.g., “Cat” -> “[Deprecated] Cat”. At the next major version of the codelist, these deprecated codes should be entirely removed to avoid an increasing maintenance burden. | ||
| 196 | |||
| 197 | == No retroactivity in case of implementation of new code lists == | ||
| 198 | |||
| 199 | The adoption of a new code list has no retroactive effect on existing code lists. Therefore, the implementation of a new code list will not require that SDMX implementers revise existing code lists. | ||
| 200 | |||
| 201 | = New features from version 3.0 of SDMX = | ||
| 202 | |||
| 203 | == Code list extensions == | ||
| 204 | |||
| 205 | The aim of international statistical classifications is to provide a common framework for collecting and organising information about a particular statistical activity domain, concept or variable. Their use, either directly or through national adaptations, facilitates the exchange and comparability of data between countries. These reference data standards have generally been developed through extensive international consultation and have achieved broad acceptance and official agreement for use. | ||
| 206 | |||
| 207 | International Organizations and member countries should be able to report using international categories at least at the higher levels of a statistical classification and also having the ability to extend the classifications with additional categories representing the national context. | ||
| 208 | |||
| 209 | Where possible, a country should adopt, extend or adapt to these international standards. | ||
| 210 | |||
| 211 | * a) **Adopt** - If an international organization or country chooses to adopt an international code list then it accepts the international standard as published with no changes to the structure. This also means that the country will accept future changes to the standard as they are published. | ||
| 212 | * b) **Extend** – if an international organization or country chooses to extend an international code list then it will reference the existing international code list and add the additional codes to represent the international, regional, national or sub-national context. In most cases the extension would include minor version changes by using the reference //n//.0+.0 | ||
| 213 | |||
| 214 | E.g. Extending the official ISO Country code list with Sark & Kosovo | ||
| 215 | |||
| 216 | __ISO Country Code List (CL_ISO3166_2)__ | ||
| 217 | |||
| 218 | AF Afghanistan | ||
| 219 | |||
| 220 | AL Albania | ||
| 221 | |||
| 222 | AQ Antarctica | ||
| 223 | |||
| 224 | … | ||
| 225 | |||
| 226 | YE Yemen | ||
| 227 | |||
| 228 | YU Serbia and Montenegro [former] | ||
| 229 | |||
| 230 | ZM Zambia | ||
| 231 | |||
| 232 | __CA1:CL_AREA(1.0)__ | ||
| 233 | |||
| 234 | *inherit CL_ISO3166_2 | ||
| 235 | |||
| 236 | CQ Sark | ||
| 237 | |||
| 238 | XK Kosovo | ||
| 239 | |||
| 240 | * c) **Adapt** - If a country/organization adapts an international code list, then the country will develop a derived or related standard. Under this scenario, the country will define its own code for the code list and create a crosswalk that maps the countries’ code to the international standard code list. | ||
| 241 | |||
| 242 | For the Extend and Adapt scenarios, the agency needs to take ownership of the extended/adapted artefact by assigning an appropriate maintenance agency to it, replacing the original one. The original Id (e.g. CL_ACTIVITY) should be used, however if specific domain variants are required it is recommended to suffix the Id with the domain. E.g. CL_ACTIVITY_SEEA. Note that caution should be used when creating domain variants, as having separate codelists may make interoperability and harmonisation more difficult. | ||
| 243 | |||
| 244 | == Discriminated Union of Code Lists == | ||
| 245 | |||
| 246 | Combining code list extension with wildcarded constraints solves the discriminated union of code lists problem where a classification or breakdown has multiple “variants” which are all valid but mutually exclusive. A common example is the official reference area code list which combines both ISO 3166 categories to the UN M49 categories. See [[chapter 6.1 of the SDMX 3.0 technical notes>>url:https://sdmx.org/wp-content/uploads/SDMX_3-0-0_SECTION_6_FINAL-1_0.pdf]] for a full description of how SDMX 3.0 can handle conflicts arising from unions. | ||
| 247 | |||
| 248 | E.g. Combining the ISO 3166-2 country code list with the UN-M49 country code list to create the official CL_AREA country code list. | ||
| 249 | |||
| 250 | **CL_ISO3166_2** | ||
| 251 | |||
| 252 | AF Afghanistan | ||
| 253 | |||
| 254 | AL Albania | ||
| 255 | |||
| 256 | AQ Antarctica | ||
| 257 | |||
| 258 | … | ||
| 259 | |||
| 260 | YE Yemen | ||
| 261 | |||
| 262 | YU Serbia and Montenegro [former] | ||
| 263 | |||
| 264 | ZM Zambia | ||
| 265 | |||
| 266 | **CL_UN_M49** | ||
| 267 | |||
| 268 | 000 Total | ||
| 269 | |||
| 270 | 001 World | ||
| 271 | |||
| 272 | 002 Africa | ||
| 273 | |||
| 274 | 003 North America | ||
| 275 | |||
| 276 | 004 Afghanistan | ||
| 277 | |||
| 278 | … | ||
| 279 | |||
| 280 | 894 Zambia | ||
| 281 | |||
| 282 | 896 Areas not elsewhere specified | ||
| 283 | |||
| 284 | 898 Areas not specified | ||
| 285 | |||
| 286 | 899 Areas not elsewhere specified and unknown | ||
| 287 | |||
| 288 | **CL_AREA** | ||
| 289 | |||
| 290 | ~* Inherit CL_ISO3166_2 | ||
| 291 | |||
| 292 | ~* Inherit CL_UN_M49 | ||
| 293 | |||
| 294 | == Additional Code List features == | ||
| 295 | |||
| 296 | * a) **Prefix** – A prefix to be used for a code list in an extension to avoid code/category conflicts. This feature is useful in developing a discriminated union of code lists. An example to demonstrate the usefulness of this feature is with the ESTAT:CL_PRODUCT(1.3.0) which combines product categories from 2 different classifications; CPA (Statistical classification of products by activity), CPC (Central Product Classification). | ||
| 297 | |||
| 298 | **ESTAT:CL_PRODUCT(1.3.0)** | ||
| 299 | |||
| 300 | ~* inherit ESTAT:CL_CPA(1.0+.0) (code prefix CPA_) | ||
| 301 | |||
| 302 | ~* inherit UNSD:CL_CPC(1.0+.0) (code prefix CPC_) | ||
| 303 | |||
| 304 | * b) **Sequence** – The order that will be used when extending a code list for resolving code conflicts. The last code list from the sequence will override the previous code lists. An example to demonstrate the usefulness of this feature is by combining different versions of the ISIC classification. | ||
| 305 | |||
| 306 | **SDMX:CL_ACTIVITY_ISIC(1.0.0)** | ||
| 307 | |||
| 308 | ~* inherit SDMX:CL_ACTIVITY_ISIC_2(1.0+.0) (sequence = 1) | ||
| 309 | |||
| 310 | ~* inherit SDMX:CL_ACTIVITY_ISIC_3_1(1.0+.0) (sequence = 2) | ||
| 311 | |||
| 312 | ~* inherit SDMX:CL_ACTIVITY_ISIC_4(1.0+.0) (sequence = 3) | ||
| 313 | |||
| 314 | * c) **InclusiveCodeSelection / ExclusiveCodeSelection** - The subset of codes to be included/excluded when extending a code list. | ||
| 315 | ** a. **selectionValue** - A collection of values based on codes and their children. An example to demonstrate the usefulness of this feature is when creating a code list with only the Manufacturing categories from ISIC. | ||
| 316 | |||
| 317 | **SDMX:CL_MANUF(1.0.0)** | ||
| 318 | |||
| 319 | ~* inherit SDMX:CL_ACTIVITY_ISIC_4(1.0+.0) | ||
| 320 | |||
| 321 | InclusiveCodeSelection | ||
| 322 | |||
| 323 | selectionValue: C, C10, C104, …, C3290 | ||
| 324 | |||
| 325 | * | ||
| 326 | ** b. **cascadeValues** - A property to indicate if the child codes of the selected code shall be included in the selection. It is also possible to include children and exclude the code by using the 'excluderoot' value. An example to demonstrate the usefulness of this feature is when creating a code list with only the Manufacturing categories from ISIC. | ||
| 327 | |||
| 328 | **SDMX:CL_MANUF(1.0.0)** | ||
| 329 | |||
| 330 | ~* inherit SDMX: CL_ACTIVITY_ISIC_4(1.0+.0) | ||
| 331 | |||
| 332 | InclusiveCodeSelection | ||
| 333 | |||
| 334 | Value: C, cascadeValues: True | ||
| 335 | |||
| 336 | * | ||
| 337 | ** c. **Value** - The value of the code to include in the selection. It may include the ‘%’ character as a wildcard. An example to demonstrate the usefulness of this feature is when creating a code list with only the Manufacturing categories from ISIC. | ||
| 338 | |||
| 339 | **SDMX:CL_MANUF(1.0.0)** | ||
| 340 | |||
| 341 | ~* inherit SDMX:CL_ACTIVITY_ISIC_4(1.0+.0) | ||
| 342 | |||
| 343 | InclusiveCodeSelection | ||
| 344 | |||
| 345 | Value: C% | ||
| 346 | |||
| 347 | = OTHER PRACTICAL ISSUES AND RECOMMENDATIONS = | ||
| 348 | |||
| 349 | == Breakdowns with multiple variants == | ||
| 350 | |||
| 351 | Some SDMX concepts (e.g. Economic activity) may have global, regional and/or national classifications and variants (different versions of the same classification). This means that a DSD may need to reference multiple different representations of the concept of activity, namely a global one, a regional one and a national one. | ||
| 352 | |||
| 353 | There are options for dealing with this issue: | ||
| 354 | |||
| 355 | * The preferred approach is to create distinct DSDs which each have one "Economic activity" concept but with a different code list attached for the required variant. Validation and reporting is simple, and is especially useful when reporters may only send data for one variant. Alternatively; | ||
| 356 | * Create a single code list containing the required variants of the classification using prefixes to distinguish between the variants, e.g. in the ISIC classification version 3 for Transportation and storage is ISIC4_H. In NACE revision 2 the same item is NACE2_H.{{footnote}}Ideally, there should be a mapping created between the prefixed variant codes and the original classification codes, and made available in a SDMX public registry. This would avoid redoing the same mapping work by different institutions.{{/footnote}} This approach may be valid when reporters have to send data messages with more than one variant (however, if they can send more than one data message the preferred approach above should be used). | ||
| 357 | |||
| 358 | == Multiple breakdowns in a single concept (COMPOSITE_BREAKDOWN) == | ||
| 359 | |||
| 360 | A way to represent multiple breakdowns in a DSD is to use a COMPOSITE_BREAKDOWN concept. A use case is to reduce the number of dimensions in DSDs by using a single concept to represent breakdowns that do not have to be used together in each series. The mechanism uses a COMPOSITE_BREAKDOWN concept, and a CL_COMP_BREAKDOWN code list. | ||
| 361 | |||
| 362 | The code list enumerates several breakdowns. Codes in the CL_COMP_BREAKDOWN code list are prefixed with an abbreviation of the corresponding breakdown (as in the section **Discriminated Union of Code Lists**), and code names provide the name of the breakdown followed by name of the code, e.g. | ||
| 363 | |||
| 364 | |**Code**|**Name** | ||
| 365 | |**MOT_IWW**|Mode of Transport: Inland waterway transport | ||
| 366 | |**MOT_SEA**|Mode of Transport: Maritime | ||
| 367 | |**_IHR_01**|IHR Capacity: National legislation, policy and financing | ||
| 368 | |||
| 369 | Note: the COMPOSITE BREAKDOWN mechanism has the limitation that only one composite breakdown can be used for a series. Therefore, all series should be verified so that each series will not require more than one breakdown. If a series requires more than one breakdown described in COMPOSITE BREAKDOWN, then another breakdown dimension would need to be added to the DSD. | ||
| 370 | |||
| 371 | == **Unpredictable and extendable breakdowns (CUSTOM_BREAKDOWN)** == | ||
| 372 | |||
| 373 | The custom breakdown (CUST_BREAKDOWN) dimension serves the purpose of supporting volatile breakdowns, i.e. those whose code lists change often, lack international or even national classifications, or it can be used where a reporting framework is designed to be extendable by the reporter, e.g. new tax codes. | ||
| 374 | |||
| 375 | The CUST_BREAKDOWN dimension can be used in combination with the custom breakdown label attribute. Its generic code list (C01, C02, C03, …, C99) CL_CUST_BREAKDOWN is used to transmit the codes, while attribute CUST_BREAKDOWN_LB is used to transmit descriptions of the codes directly in the dataset. CUST_BREAKDOWN_LB should start with a description of breakdown followed by an entry description. For example, for a disaggregation by the source of light, the values of the CUST_BREAKDOWN dimension and CUST_BREAKDOWN_LB attribute in the dataset would look as follows: | ||
| 376 | |||
| 377 | |**CUST_BREAKDOWN code**|**CUST_BREAKDOWN_LB** | ||
| 378 | |**C01**|Source of light: Solar | ||
| 379 | |**C02**|Source of light: Publicly-provided electricity | ||
| 380 | |**C03**|Source of light: Privately-generated electricity | ||
| 381 | |**C04**|Source of light: Other C05 Source of light: None | ||
| 382 | |**C05**|Source of light: Kerosene lamp | ||
| 383 | |**C06**|Source of light: Candle | ||
| 384 | |**C07**|Source of light: Battery | ||
| 385 | |||
| 386 | Implementation of a breakdown through the CUST_BREAKDOWN dimension has no impact on the DSD, i.e. it does not result in any change to the DSD. The CL_CUST_BREAKDOWN code list is generic and stays intact, while the CUST_BREAKDOWN_LB attribute is set at the time the data is converted to SDMX. | ||
| 387 | |||
| 388 | Note that an uncoded dimension could be used instead of CL_CUST_BREAKDOWN; the trade-off is that there could be unlimited custom items, but the item id cannot be validated or controlled. | ||
| 389 | |||
| 390 | Please use this mechanism with care, as time series using a custom breakdown may not be consistent. The custom codes (C01, C02, C03, …, C99) might mean something different in one year than in the next year or between different reference areas in the same year. The usage should be clearly described in the respective data exchange agreement. As much as possible, the coding should be applied in such a way that time series integrity is maintained. | ||
| 391 | |||
| 392 | While the use of CUSTOM_BREAKDOWN may seem to reduce the maintenance burden and make it possible to use the DSD as support of national dissemination, it has these issues: | ||
| 393 | |||
| 394 | * It increases the complexity of the DSD, it is harder for users to understand this mechanism compared to DSDs with fixed concepts and codelists; | ||
| 395 | * If a CL_CUST_BREAKDOWN codelist is used (instead of an uncoded dimension), there should be as many codes created as reasonable needed, which may be a long list; | ||
| 396 | * It is difficult to fully validate the series, therefore use of CUSTOM BREAKDOWN places extra burden on the collector and may reduce data quality. | ||
| 397 | |||
| 398 | ---- | ||
| 399 | |||
| 400 | {{putFootnotes/}} |