Possible Ways of Implementing the Observation Status Concept

Last modified by Artur K. on 2026/05/29 14:28

Contents

Document History
1) Introduction
2) One flag only per value
3) Multiple flagging
4) Conclusion

Document History

Version	Date	Comment
1.0	1/10/2014	Initial version
2.0	1/5/2019	Guideline title change (more accurate); add new OBS_STATUS codes to hierarchy; changed 3.1 approach to include OBS_STATUS code (this compatibility break is the major version change); text clarification; include a document history

1) Introduction

First of all, it is important to note that the "Observation status" code list (CL_OBS_STATUS¹) has an heterogeneous character as it mixes concepts which are not always mutually exclusive (e.g. a missing value can generate a break in time series, an estimated value can be of low reliability). Thus, to cope with the issue of allocating more than one flag to one statistical value, this code list should ideally be broken down into various sub-code lists corresponding to the various concepts covered. It was not done so because it was felt that it would unnecessarily increase the number of (very short) code lists for low benefits in terms of technical and conceptual orthodoxy.

However, in view of the central importance of this code list, it is essential to provide implementers with all possible ways of implementing this code list so that they can decide, based on their specific implementation needs, which option best suits their requirements. These various options are presented in the sections below, and their pros and contras explicated.

In case implementers are satisfied with one flag per observation value, they are invited to apply the recommended hierarchy proposed under "2) One flag only per value". In the case of multiple flagging, although the three options described below are in theory applicable, they should certainly not be considered equally; indeed, option 3.1 "Duplication approach" is to be considered as the recommended general solution to preserve backwards compatibility with SDMX-EDI; in cases where implementers do not think the recommended general solution can be applied, or is appropriate to apply in their particular context, or SDMX-EDI is not supported in the exchange, an alternative solution 3.2 "Decomposition approach" is also recommended. The third option 3.3 "Extended single code list approach", is documented here for the sake of completeness but strongly discouraged.

The SDMX standard allows for the use of zero or more observation level attributes, using any identifiers. However, SDMX-EDI imposes the mandatory use of the observation level attribute called OBS_STATUS. In the past, SDMX-EDI has limited itself, for practical reasons, to the use of the observation level attributes OBS_STATUS, CONF_STATUS, PRE_BREAK_VALUE and COMMENT_OBS, but SDMX-EDI can handle any number of observation level attributes, as long as OBS_STATUS is included.

2) One flag only per value

In case implementers want to use only one single flag per value, they should use the hierarchy below to determine the code to be used. This approach (choice of only one event, namely the most important one) offers a good compromise between simplicity for the user, completeness of provided information and presentational easiness of management on the user interface side. The main drawback of this approach is the loss of information resulting from the use of only one flag when several flags may apply to a given value.

Example: From now on, value x is compiled on the basis of a methodology diverging from the previous one (e.g. following an alignment with international standards), which generates a break in time series. However, the value in this period is suppressed, e.g. for confidentiality reasons. In this case, two flags, namely B (Time series break) and Q (missing value; suppressed), should be used. If only one flag is to be indicated, then use should be made of the hierarchy below to determine which flag to use. In this case, this would be B since B has precedence over Q in the hierarchy.

Observation status hierarchy	Relevant in conjunction with...
Observation status hierarchy	numeric values	missing values
B / time series break (highest importance)	Yes	Yes
O / missing value		Yes
M / missing value; data cannot exist		Yes
L / missing value; data exist but were not collected		Yes
H / missing value; holiday or weekend		Yes
Q / missing value; suppressed		Yes
J / derogation	Yes	Yes
S / strike and other special events	Yes	Yes
D / definition differs	Yes
K / data included in another category		Yes
W / Includes data from another category	Yes
I / imputed value	Yes
F / forecast value	Yes
E / estimated value	Yes
P / provisional value	Yes
N / not significant	Yes
U / low reliability	Yes
V / unvalidated value	Yes
G / experimental value	Yes
A / normal value	Yes

3) Multiple flagging

There might be cases however where implementers will want to attach multiple flags to one statistical value. To cope with this situation, three solutions have been analysed, based on:

a duplication approach;
a decomposition approach;
an extended single code list approach.

Technically the three approaches are possible. However, considering the severe limitations that the third approach would implicate, only one of the first two approaches will be recommended for use (as said earlier also with a view to improving harmonisation across implementations and backwards compatibility).

3.1) Duplication approach (recommended solution)

In this case, the OBS_STATUS concept is duplicated as many times as needed. These duplicated concepts should be named "OBS_STATUS", "OBS_STATUS_1", "OBS_STATUS_2", "OBS_STATUS_3", etc. All these concepts have to be inserted in the DSD and linked to the CL_OBS_STATUS code list. Only one value is allowed per code list. In order to have backwards compatibility with systems only processing a single observation status flag and also to keep DSDs with multiple flags compatible with DSDs using a single flag, it is strongly recommended to sort the flags according to the observation status hierarchy table above in a data message. For example, if the multiple flags should be G (experimental), V (unvalidated) and D (definition differs), then the order according to the hierarchy would be: OBS_STATUS = D, OBS_STATUS_1 = V, OBS_STATUS_2 = G. A system only parsing a single flag could still rely on the OBS_STATUS concept identifier to catch the flag with the highest priority.

The main advantages of this solution are its simplicity and the fact that it does not require listing the possible combinations.

Drawbacks are the multiplication of the same concept and the absence of implicit checks which makes it possible to enter aberrant combinations of codes (e.g. missing value and imputed value). It is thus recommended to perform validation checks before data processing to ensure that combinations contained in a data message make semantic sense.

This approach is the recommended general solution for implementations where multiple flagging is required.

3.2) Decomposition approach (recommended solution if exchange does not support SDMX-EDI)

Here, CL_OBS_STATUS code list is broken down into its basic components, distinguished on the basis of the different concepts used and their mutually exclusive character. The list of "building blocks" composing the CL_OBS_STATUS code list as it stands at present could be represented as separate concepts as follows:

Concept OBS_STATUS (Observation status) à code list CL_OBS_MAIN (A,E,G,H,I,J,K,W,M,O, L,Q,S)²: these codes can be grouped in one single code list because they are mutually exclusive: a normal value cannot be estimated nor imputed nor missing; an estimated value cannot be normal nor imputed nor missing, an imputed value cannot be normal nor estimated nor missing, etc.

For the other status codes, a single Boolean code list³ can be created to enable / disable a specific flag:

Concept OBS_BREAK à code list CL_BOOLEAN, with code Y corresponding to flag B (Time series break);
Concept OBS_DEF_DIFFERS à code list CL_BOOLEAN, with code Y corresponding to flag D (Definition differs);
Concept OBS_FORECAST à code list CL_BOOLEAN, with code Y corresponding to flag°F (Forecast value);
Concept OBS_PROV à code list CL_BOOLEAN, with code Y corresponding to flag P (Provisional value);
Concept OBS_NOTSIGNIFICANT à code list CL_BOOLEAN, with code Y corresponding to flag N (Not significant);
Concept OBS_UNVALIDATED à code list CL_BOOLEAN, with code Y corresponding to flag V (Unvalidated value);
Concept OBS_LOWRELIABILITY à code list CL_BOOLEAN, with code Y corresponding to flag U (Low reliability).

If additional flags are needed, more concepts can be defined accordingly. All these concepts have to be inserted in the DSD and linked to CL_BOOLEAN.

The main advantage of this proposal is its full compliance with the technical standards and the content-oriented guidelines which insist on separating concepts which are different in content. Drawbacks are the multiplication of (very) small code lists and the absence of implicit checks which makes it possible to enter aberrant combinations of codes (e.g. normal value and low reliability). Furthermore, any new code will require reconsidering the content of the various sub-code lists.

Although not recommended as the preferred solution, this approach can be implemented in cases where the general solution cannot be applied, or is not the appropriate solution, in a particular context.

Comments on the choice of the recommended solution

Both "Decomposition" and "Duplication" options provide acceptable workarounds to the problem of multiple flagging, and appear to be quite similar in practice. The trade-off in this context was between orthodoxy and ease of implementation.
Conceptually the "Decomposition" approach is the strongest of the two as it not only allows separating concepts, but also helps arranging codes into more homogeneous code lists. It also requires that implementers define pure concepts and name them accordingly.
This document recommends the "Duplication" approach mainly on the practical grounds of ease of implementation because its use of “OBS_STATUS” is compatible with the cross-domain concept scheme, whereas the Decomposition approach is not. The recommended approach could be reconsidered in the future, would the technical standard better accommodate the decomposition approach.

3.3) Extended single code list approach (strongly discouraged)

The extended version of CL_OBS_STATUS (see below) provides the full list of logically possible combinations of codes in a specific SDMX implementation.

An advantage of this solution would be that only meaningful combinations of flags are included in the list. Users would not be able to choose combinations which would not make sense (such as "missing" and "estimated").

However, there are several drawbacks related to the technical implementation of this solution:

relative complexity for users to find the right combination of flags
maintenance burden in case of revision of the code list
presentational complexity of management on user interface side
very complex SDMX query message would be needed to query for data according to flags

Thus, this approach is not recommended to be used. For completeness the table shows a possible implementation of this approach:

Code	Description
A	Normal
B	Time series break
BD	Time series break, Definition differs
BDE	Time series break, Estimated value, Definition differs
etc.
D	Definition differs
DE	Definition differs, Estimated value
DEP	Definition differs, Estimated value, Provisional value
etc.
E	Estimated value
EP	Estimated value, Provisional value
etc.

If further combinations are needed, these can be created on an ad hoc basis by selecting the necessary codes from the basic code list and sorting them alphabetically. Inversely, implementers might wish to reduce the list of possible options, would all options above not be necessary for their specific needs.

Considering the severe limitations implied by the third approach, only the first two approaches are recommended for use.

4) Conclusion

From the analysis of the various approaches presented above, it appears clearly that the extended single code list approach cannot be recommended for use.

Although the two remaining approaches, i.e. the duplication approach and the decomposition approach , may qualify for being recommended, it is preferable to give precedence to one approach in order to improve harmonisation across implementations. Considering its relative simplicity in terms of maintenance, the recommended option is option based on the "duplication approach".

5) Synthetic overview of solutions proposed and suggested recommendations

Colour key

Recommended solution
Recommended solution if exchange does not support SDMX-EDI
Strongly discouraged

^ https://sdmx.org/?page_id=3215
^ Other grouping are thinkable, for example, all codes related to missing values could be moved to a different concept and code list OBS_MISSING and CL_OBS_MISSING.
^ CL_BOOLEAN with 2 codes: Y (Yes), N (No)