Interoperability Basis Platform

= DOCUMENT HISTORY =

6

7

|**Version**|**Date**|**Comment**

8

|1.0|02/12/2013|Initial version for cross-domain code lists.

9

|2.0|15/01/2015|Adapted to be a guideline for all code lists, not only cross-domain

10

|3.0|19/01/2018|Clarified text on definitions. Revised allowed characters such as leading zeroes in codes (now allowed). Removed superfluous text. Improved examples.

11

|4.0|12/2/2025|(((

12

Added sections on:

13

New features from version 3.0 of SDMX

14

Other practical issues and recommendations

15

General improvements to the text

)))

= INTRODUCTION =

These guidelines are intended to support the creation of SDMX code lists to be used all along the statistical business process and in particular when SDMX is implemented in statistical domains. They are strongly recommended for use when SDMX-compliant data structure definitions (DSDs) are built-up and implemented in statistical domains.

21

22

In the [[SDMX Checklist for Design Projects>>url:https://statswiki.unece.org/display/SDMXPM/Checklist+for+SDMX+Data+Providers]] and the [[modelling guidelines>>url:https://sdmx.org/?page_id=4345#Modelling]], the creation of code lists is done in the sub-process “Fully define code lists”.

23

24

Originally this document was named "//Guidelines for the Creation and Management of SDMX Cross-Domain Code Lists//". Later, experience showed that these guidelines were also used for the development of other types of SDMX code lists (shared code lists, domain-specific code lists). It was therefore decided to review the document in order to convert it into a guideline applicable to all types of SDMX code lists.

25

26

Code lists are created to group related codes{{footnote}}In the traditional classification sphere, the words "category" or "position" are generally preferred to the word "code"; however, the use of the term "category" could be confusing as it is already used in the SDMX information model within another context.{{/footnote}} in a meaningful, systematic and standard format. They provide lists of codes that objects corresponding to a specific concept can be classified into. Each code should be well described.

27

28

Code lists are primarily used to:

29

30

* collect, disseminate, exchange and organise information;

31

* aggregate and disaggregate datasets in a meaningful way for complex analysis;

32

* present statistical information in a standard way;

33

* support policy and decision-making;

34

* standardise the measurement process.

35

36

SDMX cross-domain code lists can be found at two distinct places:

37

38

* [[SDMX official website>>url:https://sdmx.org/?page_id=3215]] (files available in MS-Word format)

39

* [[SDMX Global Registry>>url:https://registry.sdmx.org/items/codelist.html]] (files available in SDMX-ML format)

40

41

Other SDMX code lists can be found in regional and other registries.

42

43

= BASIC PRINCIPLES FOR THE CREATION OF SDMX CODE LISTS =

44

45

1. SDMX code lists should refer to clear and well-defined statistical concepts, enabling data users to understand the statistical concepts and finally the data sets. Already existing standards (e.g. international classifications) should be considered.

46

1. Consistency of the SDMX code lists across statistical domains and over time should be ensured.

47

1. Key terms

48

49

Below are the key terms used in this document to describe SDMX code lists.

50

51

* A **code list** is a predefined list from which some statistical coded concepts take their values.

52

** A **code list identifier** is a unique identifier given to the code list. The code list identifier consists of three mandatory elements: an **id**, a **version number** and a **reference to a maintenance agency**.

53

** An **id** is a language-independent set of letters, numbers and/or symbols. To give SDMX code lists a clear visual identity, the code list id should be prefixed with CL_.

54

** A code list **name** describes the content of the artefact to which the name is attached in a synthetic and clear way. In principle, the default language for names is English (however exceptions are possible, e.g. when geographic entities are expressed in the national languages). Multilingual representations are possible.

55

** A code list **description** allows to describe the content of the artefact to which it is attached in a more detailed fashion than the artefact name.** **Multilingual representations are possible. In the specific case of code lists, the description is generally used to precisely define the coverage of a code, identifying what is included and what is excluded (e.g. wooden shoes are not considered as shoes but as handicraft).

56

57

A code list may also contain annotations, a uri, a description, attributes indicating the period of validity (e.g. "//valid to//" and "//valid from//") and an attribute indicating whether the code list is final.

58

59

* A **code** (basic element of the code list) is also represented by a mandatory **id** and a mandatory **name, **and an optional** description **(see the descriptions of the code list elements). The code may also contain annotations, and a uri{{footnote}}Uri: Uniform Resource Identifier{{/footnote}}.

60

61

While code names and descriptions are meant for interpretation by humans, **ids** are primarily designed to be read by machines. Nevertheless, it can often be helpful for data users that they are meaningful in accordance with the default language used for the name. When choosing the best approach implementers should also consider the possible impact on the code identifiers length.

Example

[[image:1768480235130-791.png]]

66

67

= BASIC CHARACTERISTICS OF SDMX CODE LISTS =

68

69

Conceptually SDMX code lists can have the following characteristics:

70

71

* A code list (e.g. geographical entity) can be referenced by several statistical concepts (i.e. declaring country, country of birth, partner country, etc.).

72

* The codes used should cover exhaustively the part of reality that is intended to be described by the code list.

73

* The codes have to be clearly defined. Codes with different coverage must have different code identifiers and names (e.g. Europe including Greenland must have a different identifier and name than Europe excluding it). Different codes should not have the same meaning or coverage.

74

* The coverage of the codes may however overlap partially within one code list (but may not be identical, except when code lists are based on established standard classifications where such repetitions are common (see //Unpredictable and extendable breakdowns (CUSTOM_BREAKDOWN)// at [[OTHER PRACTICAL ISSUES AND RECOMMENDATIONS>>path:#_Toc179573467]] section ). This means that the content of the categories is not necessarily mutually exclusive as is the established rule in statistical classifications.

75

* Codes may be at different levels of granularity.

76

* Multiple hierarchies (hierarchical code lists{{footnote}}The "Hierarchical code list" construct used in SDMX should not be mixed up with the concept used in traditional statistical classifications. In the latter case, the codes are organised based on one strictly defined hierarchy only. In the former case, several hierarchies can be defined. An example of this is a list of codes that support the theme of geographical location. Such a hierarchical code list could be viewed according to many different hierarchies. A political hierarchy would comprise an administrative regional breakdown within a country, a geographical breakdown would comprise a placing of the countries in continents, and an economic breakdown might place the countries in one or more economic communities (e.g. many of the countries in "Europe" could be both a part of the European Union and the OECD communities).{{/footnote}}) can be defined on top of a flat code list. An SDMX code list should be flexible in terms of allowing different possible hierarchies and be extendible by additional codes that may disaggregate or aggregate codes that are already in the list as well as by codes that extend the coverage of the code list. Each possible hierarchy may use all codes from the flat code list or just a subset. A flat code list provides the reservoir of codes for the hierarchies. A forthcoming (at the time of writing) guideline on SDMX Hierarchies will be available to describe the recommended approach for creating and maintaining code list hierarchies.

77

* New codes or levels need to be accommodated if needed (i.e. if a new country is recognised as a sovereign state the geographical entity code list should have a new version including it).

78

* Re-use of code identifiers between versions of the same code list should be avoided. In ISO-2 country code CS - Czechoslovakia was suppressed in 1993 and reused as CS - Serbia and Montenegro in 2003. This change was/is extremely difficult to handle, as many data systems will contain historical data for Czechoslovakia.

79

80

= BASIC CRITERIA FOR THE DEVELOPMENT OF SDMX CODE LISTS =

81

82

The following basic criteria should be respected when defining SDMX code lists:

83

84

* It is highly recommended that code lists be consistent, to the largest extent possible, with internationally agreed standards, whenever they exist, e.g. International Organization for Standardization (ISO), United Nations and other international organisations code lists. It is of no use creating a new code list where one already exists. The following order of priority is suggested when considering the use of existing code lists:

85

** international standard classifications or code lists;

86

** international classifications or code lists supplemented by other international and/or regional institutions;

87

** standardised classifications or code lists used by individual international institutions.

88

* If official classifications are used for defining the SDMX code lists, totals, aggregates and other additional codes should be added following the recommendations of the issuing organisation for adding new elements (e.g. the addition of the world regions EU27, EFTA, etc to the ISO 3166 alpha 2 country list);

89

* When designing code lists, the needs of all phases of the statistical business process and of all statistical domains using the respective SDMX code lists should be considered. However, due to legacy systems, it may not always be possible to accommodate all needs from the start (e.g. different habits in data exchange and in dissemination). In such cases, some transition period will often be necessary before reaching convergence.

90

91

Technically, SDMX code lists should respect the principles described in the following paragraphs.

92

93

* SDMX code ids take values from uppercase A to Z, 0 to 9 and "_" only{{footnote}}In the traditional classification sphere, the words "category" or "position" are generally preferred to the word "code"; however, the use of the term "category" could be confusing as it is already used in the SDMX information model within another context.{{/footnote}}. No other characters should be used.

94

* Even though technically allowed in the standard, it is highly recommended not to use lower case characters to avoid possible confusion and technical issues with upper case characters. Exceptions should only be made for code lists based on classifications managed by external bodies (such as ISO). In case an external classification is using lower case characters, they may also be used in the respective SDMX code lists to make their usage more intuitive for regular users of the classification. However, it is still not allowed to create two code list items that only differ by the case of the characters (e.g., "EN" and "en"). If for whatever reason an organisation opts for lower case characters, all coded information should then be in lower case characters.

95

* Accented characters for a code are not allowed by the standard.

96

97

* Underscore ("_") is generally used for the combination of codes (whether consecutive or not).

98

* A list of generic codes to be used for all SDMX code lists has been developed. This generic list includes concepts which can be expected to appear in many, if not, all code lists, e.g. "total", "non-response", "not allocated", "unknown", other, etc.

99

100

Code names should be between 1 and 254 characters. The characters used should belong to the UTF-8 character set.

101

102

* Code names are in general defined in English; other language versions may be added. Certain exceptions can be considered in the case of codes which are normally not translated (e.g. regions or territories for which the national designation could be used, agencies for which a national and English label could be used, etc.).

103

104

Descriptions could be used if more details on the contents or on the code descriptions are needed. Multilingual representations are possible.

105

106

== Meaningful code identifiers vs meaningless code identifiers ==

107

108

Code identifiers can be either meaningful, i.e. the identifier is a shortened representation or expression of the code name (e.g. "DE_FIN" for "German Ministry of Finance"), or meaningless, i.e. the code identifier is chosen at random and conveys no information whatsoever as to the content of the code name (e.g. "6E" standing for "European Space Agency").

109

110

In SDMX, both coding systems are acceptable, although it is recommended to use meaningful codes if possible. It is up to implementers to decide which coding system they want to use**, considering that there could be a trade-off between meaningfulness of identifiers and their length which impacts the maintainability of identifiers/key series codes, size and costs of data files and databases.** The choice of the coding system may be influenced by parameters like the number of identifiers to code, the complexity of the concepts to code, etc.

111

112

In cases where self-explanatory (meaningful) identifiers are chosen, implementers should pay attention not to use excessively long codes.

113

114

Meaningful codes can be self-explanatory in one language only; by default, this language is English.

115

116

=== Expressing numeric ranges in code identifiers ===

117

118

SDMX implementers may wish to express numeric ranges in code identifiers. This is allowed but in no way imposed.

119

120

As an example, let’s take a code list dealing with age classes. The basic units used for representing age are "Y" (Year), "M" (Month), "W" (Week) and "D" (Day). To express concepts like "from 15 years to 20 years excluding 16 years", "over 30 years", "less than 50 years", "two and three months", "four days or over", "three years or less", the following set of standard "operators" is proposed:

121

122

* **"T"** for expressing ranges (from 3 to 9),

123

* **"_" **for the combination of two codes, whether consecutive or not (5 and 6; A and F),

124

* **"X"** for expressing "except" or "excluding",

125

* **"GT"** for "greater than", **"LT"** for "less than", **"GE"** for "equal to or greater than" and **"LE"** for "equal to or less than".

126

127

The result for the examples mentioned in the previous paragraph is as follows:

128

129

* from 15 years to 20 years excluding 16 years: Y15T20X16

130

* over 30 years: Y_GT30 (in this case the "_" is used to make a clear distinction between the unit and the "operator" "GT").

131

* less than 50 years: Y_LT50

132

* two and three months: M2_3

133

* four days or over: D_GE4

134

* three years or less: Y_LE3

135

136

Such a coding system has limits in what it can do. It will most probably not be appropriate for coding ranges with decimals (ending up in too long code identifiers) or too complex concepts. Again, it will be up to implementers to decide whether this coding system can meet their specific needs.

137

138

=== Generic codes ===

139

140

These codes are recommended to be used when compiling code lists that require the items in the table below, e.g. when a "Not applicable" item is required. The codes provide code values for concepts referenced in a large number of code lists.

141

142

The codes proposed here are present in a very large number of SDMX code lists because they cover very general and extensively used concepts. Thus, the main purpose of this list of codes is to propose standardised code identifiers which can be extensively reused.

143

144

The leading underscore is used to avoid clashes with other codes and to singularise these concepts.

145

146

These codes should be considered as "reserved codes" or "protected codes", meaning that it is strongly recommended not to use them for other purposes than the ones described below.

147

148

The codes can also be used as suffixes or be augmented with sequential numbers, especially if a category needs to be used multiple times. For example, multiple sub-totals (e.g. CATTLE_S1, CATTLE_S2, PLANTS_S1) or multiple residual categories (e.g. CATTLE_O, PLANTS_O) may be required.

149

150

In cases where the use of the suffixes below could lead to confusions and misunderstandings, implementers may of course implement other solutions. Let us take the example of a code list where letters are used as codes (e.g. the United Nations ISIC classification): in this case, code element M_O could be interpreted either as "sections M and O" or as "section M not elsewhere classified". In this case implementers could create codes M_OTH or M_OTHER to differentiate the suffixes.

151

152

|**Recommended Code Value**|**Recommended Code Description**|**Annotation**

153

|**_L**|Local extension|(((

154

To be used to uniquely identify local (e.g. national, regional, sub-regional) extensions of SDMX code lists and by doing so:

155

156

* avoid conflict of local extensions with existing codes, and

157

* avoid conflict between revisions of the underlying code lists and the local extensions in terms of coding.

158

)))

159

|**_N**|Non response|Failure to obtain a measurement on one or more study variables for one or more elements in a survey.

160

|**_O**|Other|Used to cover residual information not contained in other categories of the code list (in some contexts, e.g. classifications, referred to as n.e.s., not elsewhere specified, n.e.c., not elsewhere classified, etc.)

161

|**_S**|Subtotal|Used for expressing intermediate totals

162

|**_T**|Total|Used for expressing totals

163

|**_U**|No data/unknown|Failure to obtain a measurement (e.g. non response, no data available, information not known by the respondent unit, etc.)

164

|**_X**|Not allocated/unspecified|Used where the value for a particular variable falls outside the expected range. An example could be the failure to allocate a classification to a particular unit due to insufficient information, and/or if further breakdown over any related items mentioned in code list not available

165

|**_Z**|Not applicable|(((

166

Used in cases where the coding of a concept is technically required (dimension or mandatory attribute), but does not have a statistical meaning for a specific series or observation.

167

168

Examples of relevant usages of _Z are:

169

170

* In a survey that has questions on “Marital Status” and “Sex of Partner”, for a response of Marital Status: Single, the relevant response for Sex of Partner is “Not applicable” (_Z)

171

* In a labour statistics reporting framework that has a dimension “Outside Labour Force Reason”. If the observation entity is inside of the labour force, the response for Outside Labour Force Reason is “Not applicable” (_Z)

172

* In macro-economic statistics, a DSD may contain a dimension for breakdowns of economic activity. Some series cannot be broken down by economic activity (e.g. statistical discrepancy between GDP approaches, certain taxes and subsidies, producer price index). They could be coded as “Not applicable” (_Z) for the activity dimension in the data message.

173

174

The code _Z should be used sparsely{{footnote}}At DSD design time the pros and cons of using _Z should be considered carefully. The occurrences of _Z can be reduced by:

175

• using the semantically approximate _T instead of _Z;

176

• having a larger number of compact DSDs to avoid the need to introduce _Z;

177

• concepts with _Z could be defined as attributes rather than dimensions;

178

• a dimension containing _Z could be merged with another (conceptually related) dimension as a sub-hierarchy.{{/footnote}}, since it may complicate validation and transformation formulas or pivot visualisations based on SDMX messages. The code _T (Total) should be used instead in cases where the applicability of the concept is not entirely clear, and where the concept may be relevant only under certain circumstances, or where the applicability of a concept for a certain series or dataflow depends on the context (e.g. a tax breakdown that exists in one country but not in another) in order to simplify the coding across data flows and countries.

179

)))

180

181

= MAINTENANCE OF SDMX CODE LISTS =

182

183

== Maintenance agency ==

184

185

Within SDMX, each code list requires a maintenance agency. For SDMX cross-domain code lists, the maintenance agency (marked as "SDMX") is the SDMX Statistical Working Group (SWG). For other code lists used in international data exchange this is - in general - one or several international organisation(s) linked to SDMX. Clear rules for the maintenance of the SDMX code lists need to be established. There is a guideline{{footnote}}https://sdmx.org/wp-content/uploads/SDMX-Structural-Metadata-Governance.docx{{/footnote}} “A reference framework for SDMX structural metadata governance” that provides a best practice in establishing structural metadata governance.

186

187

Changes to a given code list will lead to a new version of the code list. SDMX code lists are stored in the SDMX Global Registry{{footnote}}https://registry.sdmx.org/items/codelist.html{{/footnote}} for making them accessible to SDMX implementers. Other code lists are stored in regional or other //ad hoc// registries.

188

189

== Versioning of code lists ==

190

191

A general document providing guidelines on the versioning of SDMX artefacts can be found on the SDMX official website{{footnote}}https://sdmx.org/{{/footnote}}.

192

193

In general, versioning is based on the impact severity for clients resulting from any change. If a code is removed from a codelist or a code’s usage context is changed (Cat -> Dog), then backward compatibility is broken and it is a major change. If a code is added then there is no impact to backward compatibility and it is a minor change. If a code label is changed but the context remains the same (e.g. a typo fix), it is a patch change.

194

195

A recommended way to mark codes as removed without triggering a major version change is to prefix the code label with “[Deprecated] …”. E.g., “Cat” -> “[Deprecated] Cat”. At the next major version of the codelist, these deprecated codes should be entirely removed to avoid an increasing maintenance burden.

196

197

== No retroactivity in case of implementation of new code lists ==

198

199

The adoption of a new code list has no retroactive effect on existing code lists. Therefore, the implementation of a new code list will not require that SDMX implementers revise existing code lists.

200

201

= New features from version 3.0 of SDMX =

202

203

== Code list extensions ==

204

205

The aim of international statistical classifications is to provide a common framework for collecting and organising information about a particular statistical activity domain, concept or variable. Their use, either directly or through national adaptations, facilitates the exchange and comparability of data between countries. These reference data standards have generally been developed through extensive international consultation and have achieved broad acceptance and official agreement for use.

206

207

International Organizations and member countries should be able to report using international categories at least at the higher levels of a statistical classification and also having the ability to extend the classifications with additional categories representing the national context.

208

209

Where possible, a country should adopt, extend or adapt to these international standards.

210

211

* a) **Adopt** - If an international organization or country chooses to adopt an international code list then it accepts the international standard as published with no changes to the structure. This also means that the country will accept future changes to the standard as they are published.

212

* b) **Extend** – if an international organization or country chooses to extend an international code list then it will reference the existing international code list and add the additional codes to represent the international, regional, national or sub-national context. In most cases the extension would include minor version changes by using the reference //n//.0+.0

213

214

E.g. Extending the official ISO Country code list with Sark & Kosovo

215

216

__ISO Country Code List (CL_ISO3166_2)__

AF Afghanistan

AL Albania

AQ Antarctica

…

YE Yemen

YU Serbia and Montenegro [former]

ZM Zambia

__CA1:CL_AREA(1.0)__

*inherit CL_ISO3166_2

CQ Sark

XK Kosovo

* c) **Adapt** - If a country/organization adapts an international code list, then the country will develop a derived or related standard. Under this scenario, the country will define its own code for the code list and create a crosswalk that maps the countries’ code to the international standard code list.

241

242

For the Extend and Adapt scenarios, the agency needs to take ownership of the extended/adapted artefact by assigning an appropriate maintenance agency to it, replacing the original one. The original Id (e.g. CL_ACTIVITY) should be used, however if specific domain variants are required it is recommended to suffix the Id with the domain. E.g. CL_ACTIVITY_SEEA. Note that caution should be used when creating domain variants, as having separate codelists may make interoperability and harmonisation more difficult.

243

244

== Discriminated Union of Code Lists ==

245

246

Combining code list extension with wildcarded constraints solves the discriminated union of code lists problem where a classification or breakdown has multiple “variants” which are all valid but mutually exclusive. A common example is the official reference area code list which combines both ISO 3166 categories to the UN M49 categories. See [[chapter 6.1 of the SDMX 3.0 technical notes>>url:https://sdmx.org/wp-content/uploads/SDMX_3-0-0_SECTION_6_FINAL-1_0.pdf]] for a full description of how SDMX 3.0 can handle conflicts arising from unions.

247

248

E.g. Combining the ISO 3166-2 country code list with the UN-M49 country code list to create the official CL_AREA country code list.

**CL_ISO3166_2**

AF Afghanistan

AL Albania

AQ Antarctica

…

YE Yemen

YU Serbia and Montenegro [former]

ZM Zambia

**CL_UN_M49**

000 Total

001 World

002 Africa

003 North America

004 Afghanistan

…

894 Zambia

896 Areas not elsewhere specified

283

284

898 Areas not specified

285

286

899 Areas not elsewhere specified and unknown

**CL_AREA**

~* Inherit CL_ISO3166_2

~* Inherit CL_UN_M49

== Additional Code List features ==

295

296

* a) **Prefix** – A prefix to be used for a code list in an extension to avoid code/category conflicts. This feature is useful in developing a discriminated union of code lists. An example to demonstrate the usefulness of this feature is with the ESTAT:CL_PRODUCT(1.3.0) which combines product categories from 2 different classifications; CPA (Statistical classification of products by activity), CPC (Central Product Classification).

297

298

**ESTAT:CL_PRODUCT(1.3.0)**

299

300

~* inherit ESTAT:CL_CPA(1.0+.0) (code prefix CPA_)

301

302

~* inherit UNSD:CL_CPC(1.0+.0) (code prefix CPC_)

303

304

* b) **Sequence** – The order that will be used when extending a code list for resolving code conflicts. The last code list from the sequence will override the previous code lists. An example to demonstrate the usefulness of this feature is by combining different versions of the ISIC classification.

305

306

**SDMX:CL_ACTIVITY_ISIC(1.0.0)**

307

308

~* inherit SDMX:CL_ACTIVITY_ISIC_2(1.0+.0) (sequence = 1)

309

310

~* inherit SDMX:CL_ACTIVITY_ISIC_3_1(1.0+.0) (sequence = 2)

311

312

~* inherit SDMX:CL_ACTIVITY_ISIC_4(1.0+.0) (sequence = 3)

313

314

* c) **InclusiveCodeSelection / ExclusiveCodeSelection** - The subset of codes to be included/excluded when extending a code list.

315

** a. **selectionValue** - A collection of values based on codes and their children. An example to demonstrate the usefulness of this feature is when creating a code list with only the Manufacturing categories from ISIC.

316

317

**SDMX:CL_MANUF(1.0.0)**

318

319

~* inherit SDMX:CL_ACTIVITY_ISIC_4(1.0+.0)

320

321

InclusiveCodeSelection

322

323

selectionValue: C, C10, C104, …, C3290

324

325

*

326

** b. **cascadeValues** - A property to indicate if the child codes of the selected code shall be included in the selection. It is also possible to include children and exclude the code by using the 'excluderoot' value. An example to demonstrate the usefulness of this feature is when creating a code list with only the Manufacturing categories from ISIC.

327

328

**SDMX:CL_MANUF(1.0.0)**

329

330

~* inherit SDMX: CL_ACTIVITY_ISIC_4(1.0+.0)

331

332

InclusiveCodeSelection

333

334

Value: C, cascadeValues: True

335

336

*

337

** c. **Value** - The value of the code to include in the selection. It may include the ‘%’ character as a wildcard. An example to demonstrate the usefulness of this feature is when creating a code list with only the Manufacturing categories from ISIC.

338

339

**SDMX:CL_MANUF(1.0.0)**

340

341

~* inherit SDMX:CL_ACTIVITY_ISIC_4(1.0+.0)

342

343

InclusiveCodeSelection

Value: C%

= OTHER PRACTICAL ISSUES AND RECOMMENDATIONS =

348

349

== Breakdowns with multiple variants ==

350

351

Some SDMX concepts (e.g. Economic activity) may have global, regional and/or national classifications and variants (different versions of the same classification). This means that a DSD may need to reference multiple different representations of the concept of activity, namely a global one, a regional one and a national one.

352

353

There are options for dealing with this issue:

354

355

* The preferred approach is to create distinct DSDs which each have one "Economic activity" concept but with a different code list attached for the required variant. Validation and reporting is simple, and is especially useful when reporters may only send data for one variant. Alternatively;

356

* Create a single code list containing the required variants of the classification using prefixes to distinguish between the variants, e.g. in the ISIC classification version 3 for Transportation and storage is ISIC4_H. In NACE revision 2 the same item is NACE2_H.{{footnote}}Ideally, there should be a mapping created between the prefixed variant codes and the original classification codes, and made available in a SDMX public registry. This would avoid redoing the same mapping work by different institutions.{{/footnote}} This approach may be valid when reporters have to send data messages with more than one variant (however, if they can send more than one data message the preferred approach above should be used).

357

358

== Multiple breakdowns in a single concept (COMPOSITE_BREAKDOWN) ==

359

360

A way to represent multiple breakdowns in a DSD is to use a COMPOSITE_BREAKDOWN concept. A use case is to reduce the number of dimensions in DSDs by using a single concept to represent breakdowns that do not have to be used together in each series. The mechanism uses a COMPOSITE_BREAKDOWN concept, and a CL_COMP_BREAKDOWN code list.

361

362

The code list enumerates several breakdowns. Codes in the CL_COMP_BREAKDOWN code list are prefixed with an abbreviation of the corresponding breakdown (as in the section **Discriminated Union of Code Lists**), and code names provide the name of the breakdown followed by name of the code, e.g.

363

364

|**Code**|**Name**

365

|**MOT_IWW**|Mode of Transport: Inland waterway transport

366

|**MOT_SEA**|Mode of Transport: Maritime

367

|**_IHR_01**|IHR Capacity: National legislation, policy and financing

368

369

Note: the COMPOSITE BREAKDOWN mechanism has the limitation that only one composite breakdown can be used for a series. Therefore, all series should be verified so that each series will not require more than one breakdown. If a series requires more than one breakdown described in COMPOSITE BREAKDOWN, then another breakdown dimension would need to be added to the DSD.

370

371

== **Unpredictable and extendable breakdowns (CUSTOM_BREAKDOWN)** ==

372

373

The custom breakdown (CUST_BREAKDOWN) dimension serves the purpose of supporting volatile breakdowns, i.e. those whose code lists change often, lack international or even national classifications, or it can be used where a reporting framework is designed to be extendable by the reporter, e.g. new tax codes.

374

375

The CUST_BREAKDOWN dimension can be used in combination with the custom breakdown label attribute. Its generic code list (C01, C02, C03, …, C99) CL_CUST_BREAKDOWN is used to transmit the codes, while attribute CUST_BREAKDOWN_LB is used to transmit descriptions of the codes directly in the dataset. CUST_BREAKDOWN_LB should start with a description of breakdown followed by an entry description. For example, for a disaggregation by the source of light, the values of the CUST_BREAKDOWN dimension and CUST_BREAKDOWN_LB attribute in the dataset would look as follows:

376

377

|**CUST_BREAKDOWN code**|**CUST_BREAKDOWN_LB**

378

|**C01**|Source of light: Solar

379

|**C02**|Source of light: Publicly-provided electricity

380

|**C03**|Source of light: Privately-generated electricity

381

|**C04**|Source of light: Other C05 Source of light: None

382

|**C05**|Source of light: Kerosene lamp

383

|**C06**|Source of light: Candle

384

|**C07**|Source of light: Battery

385

386

Implementation of a breakdown through the CUST_BREAKDOWN dimension has no impact on the DSD, i.e. it does not result in any change to the DSD. The CL_CUST_BREAKDOWN code list is generic and stays intact, while the CUST_BREAKDOWN_LB attribute is set at the time the data is converted to SDMX.

387

388

Note that an uncoded dimension could be used instead of CL_CUST_BREAKDOWN; the trade-off is that there could be unlimited custom items, but the item id cannot be validated or controlled.

389

390

Please use this mechanism with care, as time series using a custom breakdown may not be consistent. The custom codes (C01, C02, C03, …, C99) might mean something different in one year than in the next year or between different reference areas in the same year. The usage should be clearly described in the respective data exchange agreement. As much as possible, the coding should be applied in such a way that time series integrity is maintained.

391

392

While the use of CUSTOM_BREAKDOWN may seem to reduce the maintenance burden and make it possible to use the DSD as support of national dissemination, it has these issues:

393

394

* It increases the complexity of the DSD, it is harder for users to understand this mechanism compared to DSDs with fixed concepts and codelists;

395

* If a CL_CUST_BREAKDOWN codelist is used (instead of an uncoded dimension), there should be as many codes created as reasonable needed, which may be a long list;

396

* It is difficult to fully validate the series, therefore use of CUSTOM BREAKDOWN places extra burden on the collector and may reduce data quality.

----

Wiki source code of Guidelines for the Creation and Management of SDMX Code Lists