Last modified by Artur K. on 2026/05/29 14:28

Hide last authors
Helena K. 1.1 1 {{box title="**Contents**"}}
2 {{toc/}}
3 {{/box}}
4
Helena K. 3.2 5 = R1. Data completeness - rate =
6
Helena K. 1.1 7 |**Name:**|**R1. Data completeness - rate**
8 |Definition:|The ratio of the number of data cells (entities to be specified by the Eurostat domain manager) provided to the number of data cells required by Eurostat or relevant. The ratio is computed for a chosen dataset and a given period.
9 |Applicability:|(((
10 The rate of available data is applicable:
11
12 * to all statistical processes (including use of administrative sources);
13 * to users and producers, with different focus and calculation formulae.
14
15 Computed only by Eurostat but recommended also for inclusion in national quality reports.
16 )))
17 |(((
18 Calculation formulae:
19 )))|(((
20 **For a specific key variable:
21 For producers:**
22
Helena K. 1.4 23 [[image:1768514198039-473.png]]
Helena K. 1.1 24
Helena K. 1.4 25 //D^^rqd^^// in the denominator is the set of data cells required (i.e. excl. derogations/confidentiality) and # //A,,D,,^^rqd^^// in the numerator is the corresponding subset of __available/provided__ data cells. The notation #//D //means the number of elements in the set //D// (the cardinality).
Helena K. 1.1 26
27 **For users**
28
Helena K. 1.4 29 [[image:1768514265308-305.png]]
Helena K. 1.1 30
Helena K. 2.1 31 //D^^rel^^// in the denominator is the set of relevant data cells (full coverage, i.e. excl. only those entities for which the data wouldn't be relevant like e.g. fishing fleet in Hungary) and //A,,D,,^^rel^^// in the numerator is the corresponding subset of available/provided data cells. The notation #//D// means the number of elements in the set //D// (the cardinality).
Helena K. 1.1 32
33 The main difference between the two formulas lies in the selection of the denominators' datasets.
34
35 Regarding the first formula, for **producers**, this set comprises the required data cells excluding derogations/confidentiality, since producers are interested in assessing the level of compliance with the requirements.
36
37 On the other hand, for **users**, the formula gives the rate of provided data cells to the ones that are theoretically relevant, meaning that missing cells due to derogations/confidentiality or any other reason for missing data are included here, leaving out only those cells for which data wouldn't be relevant like e.g. fishing fleet in Hungary.
38 )))
39 |Target value:|The target value for this indicator is 1 meaning that 100% of the required or relevant data cells are available.
Helena K. 1.5 40 |Aggregation levels and principles:|The calculation is done, for a meaningful choice by the domain manger, at subject matter domain level. Aggregations are recommended at EU level for the user-oriented indicator.(((
Helena K. 1.1 41 The number of data cells provided and the number of data cells required/relevant are aggregated separately, from which a ratio is then computed.
42 )))
Helena K. 2.1 43 |(% style="width:204px" %)Interpretation:|(% style="width:1526px" %)(((
Helena K. 1.1 44 The indicator shows to what extent statistics are available compared to what should be available.
45
46 **For producers:**
47
48 It can be used to evaluate the degree of compliance by a given Member
49
50 State for a given dataset and period to be specified by the domain manager.
51
52 **For users:**
53
54 At EU level, it can be used to
55
56 * identify whether important variables are missing for some individual Member State or alternatively
57 * give users an overall measurement (aggregate across countries and/or key variables) of the availability of statistics.
58 )))
59 |(% style="width:204px" %)Specific guidance:|(% style="width:1526px" %)(((
60 The indicator should be accompanied by information about which variable are missing and the reasons for incompleteness as well as, where relevant, the impact of the missing data on the EU aggregate and plans for improving completeness in the future.
61
62 Calculation would need intervention by the Eurostat domain manager at the initial stage (to define the key variables and the period to be monitored). Later on, the indicators should be calculated automatically.
63
64 Both formulas are to be computed per key variable, nevertheless an aggregate for all variables can be calculated.
65
66 **For producers:**
67
68 This indicator forms part of Eurostat compliance monitoring, thus for producers it should be computed per Member State.
69
70 **For users:**
71
72 If certain relevant variables are not reported, the statistics are incomplete. This can be due to data not being collected or data being of low quality or confidential. For users an aggregate across countries for all the key variables could suffice.
73 )))
74 |(% style="width:204px" %)References:|(% style="width:1526px" %)(((
75 * ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
76 * ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.5 77 * ISO/IEC FDIS 11179-1 "Information technology – Metadata registries – Part 1: Framework", March 2004 (according to the SDMX Metadata Common Vocabulary draft Febr. 2008).
Helena K. 1.1 78 )))
79
Helena K. 1.5 80 = A1. Sampling error - indicators =
81
Helena K. 1.1 82 |(% style="width:204px" %)**Name:**|(% style="width:1526px" %)**A1. Sampling error - indicators**
83 |(% style="width:204px" %)(((
84 Definition:
85
86
87 )))|(% style="width:1526px" %)(((
88 The sampling error can be expressed:
89
Helena K. 1.5 90 * (a) in relative terms, in which case the relative standard error or, synonymously, the coefficient of variation (CV) is used. (The standard error of the estimator [[image:1768514600810-952.png]] is the square root of its variance [[image:1768514579421-571.png]]) The estimated relative standard error (the estimated CV) is the estimated standard error of the estimator divided by the estimated value of the parameter, see calculation formulae below.
91 * (b) in terms of confidence intervals, i.e. an interval that includes with a given level of confidence the true value of a parameter [[image:1768514627669-968.png]]. The width of the interval is related to the standard error.
Helena K. 1.1 92
93 The estimator should take into account the sampling design and should further integrate the effect on precision of adjustments for non-response, corrections for misclassifications, use of auxiliary information through calibration methods etc.
94 )))
95 |(% style="width:204px" %)Applicability:|(% style="width:1526px" %)(((
96 Sampling errors indicator are applicable:
97
Helena K. 3.3 98 * to statistical processes based on probability samples or other sampling procedures allowing computation of such information.
99 * to users and producers, with different level of details given.
Helena K. 1.1 100 )))
101 |(% style="width:204px" %)(((
102 Calculation formulae:
103 )))|(% style="width:1526px" %)(((
104 **Coefficient of variation:**
105
Helena K. 1.5 106 [[image:1768514655382-302.png]]
Helena K. 1.1 107
108 Remark: The subscript "e" stands for estimate.
109
110 **Confidence interval, symmetric:**
111
Helena K. 1.5 112 [[image:1768514683778-619.png]]
Helena K. 1.1 113
Helena K. 1.5 114 The length of the interval, which is 2∙d, depends on the confidence level (e.g. 95%), the assumptions convening the distribution of the estimator of the parameter, and the sampling error. In many cases d has the form below, where t depends on the distribution and the confidence level.
Helena K. 1.1 115
Helena K. 1.5 116 [[image:1768514716641-941.png]]
117
Helena K. 1.1 118 In case of totals, means and ratios, formulas for aggregation of coefficients of variation at EU level can be found in the third reference below.
119
Helena K. 1.5 120 The calculation formulae depend on the sampling design, the estimator, and the method chosen for estimating the variance [[image:1768514734614-397.png]].
Helena K. 1.1 121 )))
122 |(% style="width:204px" %)Target value:|(% style="width:1526px" %)(((
123 The smaller the CV, the standard error, and the width of the confidence interval, the more accurate is the estimator. Survey regulations may include specifications for precision thresholds at different population levels.
124 )))
Helena K. 1.6 125 |(% style="width:204px" %)Aggregation levels and principles:|(% style="width:1526px" %)The calculation is done for all statistics based on probability sample surveys or equivalent. Aggregations are possible at Member State and EU levels, depending on estimators and degree of harmonisation.(((
Helena K. 1.1 126 The principle for computing the coefficient of variation of an aggregate depends on the method for aggregation of the estimator belonging to that variable.
127 )))
128 |(% style="width:204px" %)(((
Helena K. 3.2 129 Interpretation:
Helena K. 1.1 130 )))|(% style="width:1526px" %)(((
Helena K. 2.1 131 The CV is a relative (dimensionless) measure of the precision of a statistical estimator, often expressed as a percentage. More specifically, it has the property of eliminating measurement units from precision measures and one of its roles is to make possible comparisons between precision of estimates of different indicators.
Helena K. 1.1 132
133 However, this property has no value added in case of proportions (which are by definition dimensionless indicators).
134 )))
135 |(% style="width:204px" %)(((
136 Specific guidance:
137 )))|(% style="width:1526px" %)(((
138 There are several precision measures which can be used to estimate the random variation of an estimator due to sampling, such as coefficients of variation, standard errors and confidence intervals.
139
140 The coefficient of variation is suitable for quantitative variables with large positive values. It is not robust for percentages or changes and is not usable for data estimates of negative values, where they may be substituted by absolute measures of precision (standard errors or confidence intervals).
141
142 The confidence interval is usually the precision measure preferred by data users. It is the clearest way of understanding and interpreting the sampling variability.
143
144 Provision of confidence intervals is voluntary.
145
146 The CV has the advantage of being dimensionless. The standard error or a confidence interval is sometimes preferable, as discussed.
147 )))
148 |(% style="width:204px" %)Reference:|(% style="width:1526px" %)(((
149 * ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
150 * ESS Standard for Quality Reports – 2009 Edition (Eurostat).
151 * Variance estimation methods in the European Union, Monographs of official Statistics, 2002 edition.
152 )))
153
Helena K. 1.5 154 = A2. Over-coverage - rate =
Helena K. 1.1 155
156 |(% style="width:200px" %)(((
157 **~ Name: **
158 )))|(% style="width:1530px" %)(((
159 **~ A2. Over-coverage - rate**
160 )))
161 |(% style="width:200px" %)(((
162 Definition:
163 )))|(% style="width:1530px" %)(((
164 The rate of over-coverage is the proportion of units accessible via the frame that do not belong to the target population (are out-of-scope).
165
166 The //target population** **//is the population for which inferences are made. The //frame// (or frames) is a device that permits access to population units. The //frame population** **//is the set of population units which can be accessed through the frame. The concept of a frame is traditionally used for sample surveys, but applies equally to several other statistical processes, e.g. censuses, processes using administrative sources, and processes involving multiple data sources. Coverage deficiencies may be due to delays in reporting (typical for business statistics) and to errors in unit identification, classification, coding etc. This is the case also when administrative data are used.
167
168 The rate may be calculated either as un-weighted or as weighted to refer to the overall level (frame/population rather than sample). Units of unknown eligibility provide an inherent difficulty; see below.
169 )))
170 |(% style="width:200px" %)Applicability :|(% style="width:1530px" %)(((
171 The rate of over-coverage is applicable:
172
173 * to all statistical processes (including use of administrative sources);
174 * to producers.
175
176 If the survey has more than one unit type, a rate may be calculated for each type.
177
178 If there is more than one frame or if over-coverage rates vary strongly between sub-populations, rates should be separated.
179 )))
180 |(% style="width:200px" %)Calculation formulae:|(% style="width:1530px" %)(((
181 The over-coverage rate has three main versions written in one and the same formula as the weighted over-coverage rate,, ,,//OCr,,w,,//
182
Helena K. 1.7 183 [[image:1768514791781-241.png]]
Helena K. 1.1 184
Helena K. 2.1 185 O set of out-of-scope units (over-coverage, resolved and not belonging to the target population)
186 E set of in-scope units (resolved units belonging to the target population; eligible units)
187 Q set of units of unknown eligibility.
Helena K. 1.7 188 //w,,j,,// weight of unit //j//, described below.
Helena K. 2.1 189 α The estimated proportion of cases of unknown eligibility that are actually eligible. It should be set equal 1 unless there is strong evidence at country level for assuming otherwise.
Helena K. 1.7 190
Helena K. 1.1 191 The three main cases are:
192
193 Un-weighted rate: //w,,j ,,//=1
194
195 Design-weighted rate: //w,,j ,,//=//d ,,j ,,//where basically //d ,,j ,,//=1π//,,j ,,//, meaning that the design weight is the inverse of the selection probability.
196
197 Size-weighted rate: //w,,j ,,//=//d ,,j ,,x ,,j ,,//where //x,,j,,// is the value of a variable X.
198
199 The variable X, which is chosen subjectively, shows the size or importance of the units. The value should be known for all units. X is auxiliary information, often available in the frame. Examples are turnover for businesses and population for municipalities.
200
201 For the over-coverage rate the un-weighted and the design-weighted alternatives are the ones mostly used, see Interpretation below.
202
203 The design-weighted rate is mainly used for samples surveys, but it may apply also, e.g., for price index processes or processes with multiple data sources. The weight //d ,,j,,// is a “raising” factor when unit //j// represents more than itself. Otherwise //d ,,j ,,//is equal to one. Hence, when dealing with administrative sources the un-weighted and the size-weighted versions of the rate are normally the interesting one.
204 )))
205 |(% style="width:202px" %)Target value:|(% style="width:1528px" %)The target value of this indicator is as much as possible close to 0.
206 |(% style="width:202px" %)(((
207 Aggregation levels and principles:
208 )))|(% style="width:1528px" %)(((
209 * MS: the indicator is to be calculated for frame populations where meaningful, e.g. over industries. Then separate frame populations are treated as one frame population.
Helena K. 1.8 210 * EU: the indicator can be aggregated across countries only where statistical production processes are fully harmonised. For the statistical processes involved, the separate frame populations are treated as one frame population. Where production processes differ across countries, lower and higher over-coverage rates can be shown to indicate the range.
Helena K. 1.7 211 )))
212 |(% style="width:202px" %)Interpretation:|(% style="width:1528px" %)(((
Helena K. 1.1 213 //Over-coverage//: there are units accessible via the frame, which do not belong to the target population (e.g., deceased persons still listed in a Population Register or no longer operating enterprises still in the Business Register).
214
215 The interest of the indicator depends on the statistical process and the ways of identification of over-coverage. If administrative data are used also to define the target population, this indicator normally has little value added, except possibly duplicates, if they are found. It may provide an overall idea of the quality of the register/frame and the rate of change of the population.
216
217 The un-weighted over-coverage rate gives the number of units that have been found not belonging to the target in proportion to the total number of observed units. The number refers to the sample, the census or the register population studied.
218
219 The design-weighted over-coverage rate is an estimate for the frame population in comparison with the target population, based on the information at hand, usually a sample.
220
221 The size-weighted over-coverage rate expresses the rate in terms of a chosen size variable, e.g. turnover in business statistics. (This case is less interesting for over-coverage than for non-response.)
222 )))
223 |(% style="width:202px" %)(((
224 Specific guidance:
225 )))|(% style="width:1528px" %)-
Helena K. 3.4 226 |(% style="width:202px" %)References:|(% style="width:1528px" %)§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
227 § ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.9 228
229 = A3. Common units - proportion =
230
Helena K. 1.1 231 |(% style="width:202px" %)(((
Helena K. 1.9 232 **Name:**
Helena K. 1.1 233 )))|(% style="width:1528px" %)**A3. Common units - proportion**
234 |(% style="width:202px" %)(((
235 Definition:
236 )))|(% style="width:1528px" %)The proportion of units covered by both the survey and the administrative sources in relation to the total number of units in the survey.
237 |(% style="width:202px" %)Applicability:|(% style="width:1528px" %)(((
238 The proportion is applicable
239
240 * to mixed statistical processes where some variables or data for some units come from survey data and others from administrative source(s);
241 * to producers.
242 )))
243 |(% style="width:202px" %)Calculation formulae:|(% style="width:1528px" %)(((
Helena K. 1.10 244 [[image:1768515091912-523.png]]
Helena K. 1.1 245 )))
246 |(% style="width:202px" %)Target value:|(% style="width:1528px" %)-
247 |(% style="width:202px" %)Aggregation levels and principles::|(% style="width:1528px" %)-
248 |(% style="width:202px" %)(((
249 Interpretation:
250 )))|(% style="width:1528px" %)(((
251 The indicator is used when administrative data is combined with survey data in such a way that data on unit level are obtained from both the survey and one or more administrative sources (some variables come from the survey and other variables from the administrative data) or when data for part of the units come from survey data and for another part of the units from one or more administrative sources.
252
253 The indicator provides an idea of completeness/coverage of the sources – to what extent units exist in both administrative data and survey data. This indicator does not apply if administrative data is used only to produce estimates without being combined with survey data.
254 )))
255 |(% style="width:202px" %)Specific guidance:|(% style="width:1528px" %)(((
256 Common units refer to those units that are included in the data stemming from an administrative source and survey data.
257
258 For the purpose of this indicator, the “unique units in survey data” in the denominator means that if a unit exists in more than one source it should only be counted once.
259
260 If only a survey is conducted not for all of the units in the administrative source (e.g. conducting a survey only for larger enterprises), this indicator should be calculated only for the relevant subset.
261
262 Linking errors should be detected and resolved before this indicator is calculated.
263
264 If there are few common units due to the design of the statistical output (e.g. a combination of survey and administrative data), this should be explained.
265 )))
266 |(% style="width:202px" %)(((
267 References:
268 )))|(% style="width:1528px" %)ESSNet use of administrative and accounts data in business statistics, WP6 Quality Indicators when using Administrative Data in Statistical Operations, November 2010.
269
Helena K. 1.10 270 = A4. Unit non-response - rate =
Helena K. 1.1 271
Helena K. 1.9 272 |(% style="width:199px" %)(((
Helena K. 1.2 273 **~ Name: **
Helena K. 1.9 274 )))|(% style="width:1531px" %)(((
275 **~ A4. Unit non-response - rate**
Helena K. 1.1 276 )))
Helena K. 1.9 277 |(% style="width:199px" %)Definition:|(% style="width:1531px" %)The ratio of the number of units with no information or not usable information (non-response, etc.) to the total number of in-scope (eligible) units. The ratio can be weighted or un-weighted.
278 |(% style="width:199px" %)Applicability:|(% style="width:1531px" %)(((
Helena K. 1.1 279 The unit non-response rate is applicable:
280
281 * to all statistical processes (including direct data collection and administrative data; the terminology varies between statistical processes, but the basic principle is the same; it may in some cases be difficult to distinguish between unit non-response and undercoverage, especially for administrative data sources (in the former case units are known to exist but data are missing, e.g. due to very late reporting or so low quality that the information is useless – in the latter case the units are not known at the frame construction);
282 * to users and producers, with different level of details given.
283 )))
Helena K. 1.9 284 |(% style="width:199px" %)Calculation formulae:|(% style="width:1531px" %)(((
Helena K. 1.1 285 The non-response rate has three main versions written in one and the same formula as the weighted unit non-response rate //NRr,,w,,//
286
Helena K. 1.9 287 [[image:1768515164238-720.png]]
Helena K. 1.1 288
Helena K. 3.2 289 R the set of responding eligible units
Helena K. 1.1 290 NR the set of non-responding eligible units
291 Q the set of selected units with unknown eligibility (un-resolved selected units)
292 //w,,j,,// weight of unit //j//, described below
Helena K. 2.1 293 α The estimated proportion of cases of unknown eligibility that are actually eligible. It should be set equal 1 unless there is strong evidence at country level for assuming otherwise.
Helena K. 1.1 294
295 The three main cases are:
296
297 Un-weighted rate: //w,,j ,,//=1
298
299 Design-weighted rate: //w,,j ,,//=//d ,,j ,,//where basically //d ,,j ,,//=1π//,,j ,,//, meaning that the design weight is the inverse of the selection probability.
300
301 Size-weighted rate: //w,,j ,,//=//d ,,j ,,x ,,j ,,//where //x,,j,,// is the value of a variable X.
302
303 The variable X, which is chosen subjectively, shows the size or importance of the units. The value should be known for all units. X is auxiliary information, often available in the frame. Examples are turnover for businesses and population for municipalities.
304
305 For the unit non-response rate all three alternatives are frequently used, see Interpretation below.
306
Helena K. 1.2 307 The design-weighted rate is mainly used for samples surveys, but it may apply also, e.g., for price index processes or processes with multiple data sources. The weight //d ,,j,,// is a “raising” factor when unit //j// represents more than itself. Otherwise //d ,,j ,,//is equal to one. Hence, when dealing with administrative sources the un-weighted and the size-weighted versions of the rate are normally the interesting one.
Helena K. 1.1 308 )))
Helena K. 2.2 309 |(% style="width:198px" %)Target value:|(% style="width:1532px" %)(((
Helena K. 1.1 310 The target value for this indicator is as close to 0 as possible.
311 )))
Helena K. 2.2 312 |(% style="width:198px" %)Aggregation levels and principles:|(% style="width:1532px" %)(((
Helena K. 1.1 313 * MS: the indicator is to be calculated at statistical process level
314 * EU: rather than aggregating this indicator over countries or to calculate a mean, lower and higher unit non-response rates can be shown by Eurostat for a given variable at statistical process level.
315 )))
Helena K. 2.2 316 |(% style="width:198px" %)(((
Helena K. 1.1 317 Interpretation:
Helena K. 2.2 318 )))|(% style="width:1532px" %)(((
Helena K. 1.1 319 Unit non-response occurs when no data about an eligible unit are recorded (or data are so few or so low in quality that they are deleted).
320
Helena K. 2.1 321 The un-weighted unit non-response rate shows the result of the data collection in the sample (the units included), rather than an indirect measure of the potential bias associated with non-response. If α=1, it assumes that all the units with unknown eligibility are eligible, so it provides a conservative estimate of A4 with regard to other choices of α .
Helena K. 1.1 322
323 The design-weighted unit non-response rate shows how well the data collection worked considering the population of interest.
324
325 The size-weighted unit non-response rate would represent an indirect indicator of potential bias caused by non-response prior to any calibration adjustments.
326
327 Note overall that the bias may be low even if the non-response rate is high, depending on the pattern of the non-responses and the possibilities to adjust successfully for non-response.
328 )))
Helena K. 2.2 329 |(% style="width:198px" %)Specific guidance:|(% style="width:1532px" %)(((
Helena K. 1.1 330 Non-response is a source of errors in survey statistics mainly for two reasons:
331
Helena K. 2.2 332 * it reduces the number of responses and therefore the precision of the estimates (this may be particularly relevant when samples are used);
Helena K. 3.1 333 * it might introduce bias. The size of bias depends on the non-response rate but also on the differences between the respondents and the non-respondents with respect to the variable of interest; furthermore on the strength of auxiliary information.
Helena K. 1.1 334 )))
Helena K. 2.2 335 |(% style="width:198px" %)(((
Helena K. 1.1 336 References:
Helena K. 2.2 337 )))|(% style="width:1532px" %)(((
Helena K. 1.1 338 * ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
339 * ESS Standard for Quality Reports – 2009 Edition (Eurostat).
340 * U.S. Census Bureau Statistical Quality Standards, Reissued 2010.
341 * Trépanier, Julien, and Kovar. “Reporting Response Rates when Survey and Administrative Data are Combined.” //Proceedings of the Federal Committee on Statistical Methodology Research Conference 2005.//
342 )))
343
Helena K. 1.10 344 = A5. Item non-response - rate =
345
Helena K. 1.1 346 |(((
Helena K. 1.2 347 **~ Name: **
Helena K. 1.1 348 )))|(((
Helena K. 1.2 349 **~ A5. Item non-response - rate**
Helena K. 1.1 350 )))
351 |(((
352 Definition:
353 )))|The item non-response rate for a given variable is defined as the (weighted) ratio between in-scope units that have not responded and in-scope units that are required to respond to the particular item.
354 |Applicability :|(((
355 The item non-response rate is applicable:
356
357 * to all statistical processes (including direct data collection and administrative data; the terminology varies between statistical processes, but the basic principle is the same;
358 * to users and producers, for selected key variables or for variables with very high item non-response rates, and with different level of details given.
359
360 If the survey has more than one unit type or data sources, a rate may be calculated for each type or data source.
361
362 If there is more than one frame, or if rates vary strongly between subpopulations, rates should (also) be calculated for separate sub-populations (or strata, groups).
363 )))
364 |(((
365 Calculation formulae:
366 )))|(((
367 The item non-response rate has three main versions written in one and the same formula as the weighted item non-response rate //NR,,Y ,,r,,w,,// ,which is calculated as follows:
368
Helena K. 1.10 369 [[image:1768515257783-690.png]]
Helena K. 1.1 370
Helena K. 2.1 371 //R,,Y,,// the set of eligible units responding to item //Y// (as required)
372 //NR,,Y,,// the set of eligible units not responding to item //Y// although this item is required. – The denominator corresponds to the set of units for which item //Y// is required. (Other units do not get this item because their answers to earlier items gave them a skip past this item; they were “filtered away”.)
Helena K. 1.1 373 //w,,j,,// weight of unit //j//, described below
374
375 The three main cases are:
376
377 Un-weighted rate: //w,,j ,,//=1
378
379 Design-weighted rate: //w,,j ,,//=//d ,,j ,,//where basically //d ,,j ,,//=1π//,,j ,,//, meaning that the design weight is the inverse of the selection probability.
380
381 Size-weighted rate: //w,,j ,,//=//d ,,j ,,x ,,j ,,//where //x,,j,,// is the value of a variable X.
382
383 The variable X, which is chosen subjectively, shows the size or importance of the units. The value should be known for all units. X is auxiliary information, often available in the frame. Examples are turnover for businesses and population for municipalities.
384
385 The design weight may in the computation of final estimates be modified to correct for non-response, under-coverage etc. This design weight should be used if the rates are to apply to final estimates.
386
387 The design-weighted rate is mainly used for samples surveys, but it may apply also, e.g., for price index processes or processes with multiple data sources.
388
389 The weight //d ,,j,,// is a “raising” factor when unit //j// represents more than itself.
Helena K. 1.2 390
Helena K. 1.1 391 Otherwise //d ,,j ,,//is equal to one. Hence, when dealing with administrative sources the un-weighted and the size-weighted versions of the rate are normally the interesting one.
392 )))
393 |(((
394 Target value:
395 )))|(((
396 The target value for this indicator is as close to 0 as possible.
397 )))
398 |(((
399 Aggregation levels and principles:
400 )))|(((
401 * MS: the indicator is to be calculated at statistical process level for key variables and variables with low rates.
402 * EU: rather than to aggregate this indicator over countries or to calculate a mean, lower and higher item non-response rates can be shown by Eurostat for a given variable at statistical process level.
403 )))
404 |(((
405 Interpretation:
406 )))|(((
Helena K. 1.2 407 A high item non-response rate indicates difficulties in providing information, e.g. a sensitive question or unclear wording for social statistics or information not available in the accounting system for business statistics.
Helena K. 1.1 408
409 The indicator is a proxy indicator of the possible bias caused by item nonresponse. In spite of the low item response rate, the bias may still be low, depending on causes, response pattern, and auxiliary information to adjust/impute.
410 )))
411 |(((
412 Specific guidance
413 )))|The un-weighted** **item non-response rate should be calculated before the data editing and imputation in order to measure the impact of item non-response for the key variables.
414 |References|(((
Helena K. 2.1 415 * ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
416 * ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.1 417 * U.S. Census Bureau Statistical Quality Standards, Reissued 2010.
418 * Trépanier, Julien, and Kovar. “Reporting Response Rates when Survey and Administrative Data are Combined.” //Proceedings of the Federal//
419
420 //Committee on Statistical Methodology Research Conference 2005.//
421 )))
422
Helena K. 1.12 423 = A6. Data revision - average size =
Helena K. 1.1 424
Helena K. 1.12 425 |(% style="width:155px" %)(((
Helena K. 1.2 426 **Name: **
Helena K. 1.12 427 )))|(% colspan="4" style="width:1575px" %)(((
Helena K. 1.11 428 **~ A6. Data revision - average size**
Helena K. 1.1 429 )))
Helena K. 1.12 430 |(% style="width:155px" %)(((
Helena K. 1.1 431 Definition:
Helena K. 1.12 432 )))|(% colspan="4" style="width:1575px" %)(((
Helena K. 1.1 433 The average over a time period of the revisions of a key indicator. The “revision” is defined as the difference between a later and an earlier estimate of the key item.
434
435 The number of releases (//K//) of a key item (number of times it is published) is fixed and specified in the revision policy. Usually, revisions involve a time series: when publishing an estimate of the key indicator referring to time //t//, it is a common practice to release the revised version of the indicator referring to a set of previous periods.
436
437 In the following table this situation is illustrated for a revision analysis where the policy has K revisions and //n// reference periods are included in the analysis.
438
Helena K. 2.1 439 Reference periods
Helena K. 1.1 440
Helena K. 2.1 441 [[image:1768515339448-502.png]]
Helena K. 1.1 442 )))
Helena K. 1.12 443 |(% style="width:155px" %)Applicability:|(% colspan="4" style="width:1575px" %)(((
Helena K. 1.1 444 The average size of revisions is applicable:
445
446 * to statistical processes where initial and subsequent (revised) estimates are published according to a revision policy (quarterly national accounts, short term statistics);
447 * to users and producers, with different level of details given.
448 )))
Helena K. 1.12 449 |(% style="width:155px" %)(((
Helena K. 1.1 450 Calculation formulae:
Helena K. 1.12 451 )))|(% colspan="4" style="width:1575px" %)(((
Helena K. 1.1 452 With the reference to the two-dimensional situation described in the definition there are several strategies to compute indicators: with or without sign, absolute or relative values, for specific pairs of revisions over time or over a sequence of revisions etc. The main suggestion here is to consider an average for a given revision step over a set of //n// reference periods.
453
454 **MAR (Mean Absolute Revision):**
455
Helena K. 1.11 456 [[image:1768515365840-725.png]]
Helena K. 1.1 457
458 where:
459
460 //X ,,Lt,,// “later” estimate, //L//^^th ^^release of the item at time reference //t//;
Helena K. 3.2 461 //X ,,Pt,,// “earlier” estimate, //P//^^th ^^release of the item at time reference //t//;
Helena K. 1.1 462 //n// = No. of estimates (reference periods) in the time series taken into account. //n//≥ 20 is recommended for quarterly estimates while //n//≥ 30 is recommended for monthly estimates. The indicator is not recommended for annual estimates.
463
464 MAR provides and idea of the average size of a given revision step.
465
466 This indicator can alternatively be expressed in relative terms:
467
468 **RMAR: Relative Mean Absolute Revision**
469
Helena K. 1.11 470 [[image:1768515433151-996.png]]
Helena K. 1.1 471
472 In addition – at the level of Eurostat – and where the sign is interesting, there is the mean revision from Release //P// to Release //L// over the //n// reference periods:
473
Helena K. 1.11 474 **MR (Mean Revision): **
Helena K. 1.1 475
Helena K. 1.11 476 [[image:1768515469825-885.png]]
Helena K. 1.1 477
478 Different combinations of //P// and //L// can be considered. For instance OECD suggests to compare the following releases:
479
Helena K. 1.12 480 [[image:1768515547480-980.png]]
Helena K. 1.1 481 )))
482 |Target value:|-| |
483 |Aggregation levels and principles: |(% colspan="3" %)(((
484 * MS: the indicator is to be calculated at statistical process level.
485 * EU: the indicator is calculated on the revisions made on the EU aggregate/indicator.
486 )))
487 |(((
488 Interpretation:
489 )))|(% colspan="3" %)(((
490 **MAR** provides an idea of the average size of a given revision step for a key item step over the time.
491
492 The **RMAR** indicator normalises the MAR measure using the final estimates. It facilitates international comparisons and comparisons over time periods. When estimating growth rates this measure corrects the MAR for the size of growth and, so, takes account of the fact that revisions might be expected to be larger in periods of high growth than in periods of slow growth.
493
Helena K. 1.2 494 Both MAR and RMAR indicators provide information on the stability of the estimates. They do not provide information on the direction of revisions, since the absolute values of revisions are considered. Such information is provided by **MR**. A positive sign means upwards revision (underestimation), and a negative sign indicates overestimation in the first case. MR sometimes is referred to as ‘average bias’, but a nonzero MR is not sufficient to establish whether the size of revisions is systematically biased in a given direction. To ascertain the presence of bias it has to be assessed whether MR is statistically different from zero (given no changes in definitions, methodologies, etc.).
Helena K. 1.1 495 )))
496 |Specific guidance:|(% colspan="3" %)Either MAR or RMAR should be presented under this indicator. In addition MR could also be calculated at EU-level.
497 |(((
498 References:
499 )))|(% colspan="3" %)§ OECD: [[http:~~/~~/stats.oecd.org/mei/default.asp?rev=1>>url:http://stats.oecd.org/mei/default.asp?rev=1]][[url:http://stats.oecd.org/mei/default.asp?rev=1]]
500
Helena K. 1.13 501 = A7. Imputation - rate =
Helena K. 1.1 502
503 |(((
Helena K. 1.2 504 **~ Name: **
Helena K. 1.1 505 )))|(((
Helena K. 1.12 506 **~ A7. Imputation - rate**
Helena K. 1.1 507 )))
508 |(((
509 Definition:
510 )))|(((
511 Imputation is the process used to assign replacement values for missing, invalid or inconsistent data that have failed edits. This includes automatic and manual imputations; it excludes follow-up with respondents and the corresponding corrections (if applicable). Thus, imputation as defined above occurs after data collection, no matter from which source or mix of sources the data have been obtained, including administrative data. After imputation, the data file should normally only contain plausible and internally consistent data records.
512
513 This indicator is influenced both by the item non-response and the editing process. It measures both the relative amount of imputed values and the relative influence on the final estimates from the imputation procedures.
514
515 The un-weighted imputation rate for a variable is the ratio of the number of imputed values to the total number of values requested for the variable.
516
517 The weighted rate shows the relative contribution to a statistic from imputed values; typically a total for a quantitative variable. For a qualitative variable, the relative contribution is based on the number of units with an imputed value for the qualitative item.
518 )))
519 |Applicability :|(((
520 The imputation rate is applicable
521
Helena K. 2.1 522 * to all statistical processes (with micro data; hence, e.g., direct data collection and administrative data);
Helena K. 1.2 523 * to producers.
Helena K. 1.1 524 )))
525 |(((
526 Calculation formulae:
527 )))|(((
Helena K. 1.13 528 1.Un-weighted on the statistical process and variable level:
Helena K. 1.1 529
530 [[image:1768512459010-647.jpeg]]
531
Helena K. 1.12 532 //n,,AV,, //and //n,,OV,, //are the numbers of assigned values and observed values, respectively.
Helena K. 1.1 533
Helena K. 1.13 534 2. The contribution of imputed values is calculated in an analogous way, but weighted and with variable values.
Helena K. 1.1 535
Helena K. 1.2 536 Here, //AV //and //OV //are the sets of units with assigned and observed values, respectively. In addition, //j w //is the weight (normally the weight used for estimation takes into account the sample design as well as adjustment for unit non response and final calibration) of the unit j. In case of a qualitative variable, the value of y equals 1.
Helena K. 1.1 537
538 In case of a qualitative variable, the value of //y ,,j ,,//=1 if the //j//th unit shows a given characteristic and 0 otherwise.
539
Helena K. 1.2 540 **~ **When imputation is counted the following changes have to be considered:
Helena K. 1.1 541
Helena K. 1.13 542 i. imputation of a (non-blank) value for a missing item
543 ii. imputation of a (non-blank) value to correct an observed invalid (non-blank) value
544 iii. imputation of a blank value to correct an undue invalid (nonblank) response.
Helena K. 1.1 545
546 The two main cases for the imputation rate are:
547
548 Design-weighted rate: //w,,j ,,//=//d ,,j ,,//where basically//d ,,j ,,//=1π//,,j ,,//, meaning that the design weight is the inverse of the selection probability.
549
550 Size-weighted rate: //w,,j ,,//=//d ,,j ,,x ,,j ,,//where //x,,j,,// is the value of a variable X
551 )))
552 |Target value:|A value equal or close to zero is desirable; imputation indicates missing and invalid values.
553 |Aggregation levels and principles:|(((
554 * MS: The calculation is done for key variables at statistical process level.
555 * EU: Aggregations can be made at the level of EU on the basis of harmonised statistical production processes across Member States, considering this as a single statistical process. Alternatively, Eurostat can report lower and higher imputation rates for a given variable at statistical process level.
556 )))
557 |(((
558 Interpretation:
559 )))|(((
560 The un-weighted rate shows, for a particular variable, the proportion of units for which a value has been imputed due to the original value being a missing, implausible, or inconsistent value in comparison with the number of units with a value for this variable. Units with imputation of a blank value to correct an undue invalid (non-blank) response (type iii) have to be included in both numerator and denominator.
561
562 The weighted rate shows, for a particular variable, the relative contribution of imputed values to the estimate of this item/variable. Obviously this weighted indicator is meaningful when the objective of a survey is that of estimating the total amount or the average of a variable. When the objective of the estimation is that of estimating complex indices, the weighted indicator is not meaningful.
563 )))
564 |Specific guidance:|-
565 |References:|(((
Helena K. 2.1 566 * ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
567 * ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.1 568 * Statistics Canada Quality Guidelines, Fifth Edition – October 2009
569 )))
570
Helena K. 1.13 571 = TP1. Time lag - first results =
572
Helena K. 1.1 573 |(((
574 **Name:**
575 )))|(((
Helena K. 1.2 576 **~ TP1. Time lag - first results **
Helena K. 1.1 577 )))
578 |(((
579 Definition:
580 )))|(((
581 //General definition~://
582
583 The timeliness** **of statistical outputs is the length of time between the end of the event or phenomenon they describe and their availability.
584
585 //Specific definition~://
586
587 The number of days (or weeks or months) from the last day of the reference period to the day of publication of first results.
588 )))
Helena K. 1.2 589 |Applicability:|(((
Helena K. 1.1 590 This indicator is applicable:
591
Helena K. 1.2 592 * to all statistical processes with **preliminary data releases**;
593 * to producers.
Helena K. 1.1 594
595 T1 is **not** applicable for statistical processes with only one, directly final, set of results/statistics – then only T2 is used.
596 )))
597 |(((
598 Calculation formulae:
599 )))|(((
Helena K. 1.14 600 //T//,,1,, = //d,,frst,, //− //d,,refp,,//
Helena K. 1.1 601 //d,,frst,,// … Release date of first results;
602 //d,,refp,,//… Last day (date) of the reference period of the statistics
603
604 //Measurement units//: datum format (calendar days; if the number of days is large, it may be converted into weeks or months )
605
Helena K. 1.15 606 Instead of a period, the reference can also be a time point.
Helena K. 1.1 607 )))
608 |Target value:|The target values usually are fixed by legislation or gentlemen's agreement. Nevertheless, smaller values denote higher timeliness.
609 |Aggregation levels and principles: |(((
610 The calculation is done, for a meaningful choice, at subject matter domain level. It could refer to the current production round or be an average over a time period. Aggregations are possible at EU and domain (e.g. social statistics, business statistics) level.
611 )))
612 |(((
613 Interpretation:
614 )))|(((
615 This indicator quantifies the gap between the release date of first results and the date of reference for the data.
616
617 Comparisons could be made among statistical processes with the same periodicity.
618 )))
619 |(((
620 Specific guidance
621 )))|(((
622 The reasons for possible long production times should be explained and efforts to improve the situation should be described.
623
624 For annual statistics or where timeliness is measured in years rather than in days a sentence stating timeliness would be sufficient.
625 )))
Helena K. 4.1 626 |References:|§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
627 § ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.1 628
Helena K. 3.1 629 = TP2. Time lag - final results =
Helena K. 1.1 630
Helena K. 3.3 631 |(% style="width:206px" %)(((
Helena K. 1.2 632 **Name: **
Helena K. 3.3 633 )))|(% style="width:1524px" %)(((
Helena K. 1.2 634 **~ TP2. Time lag - final results**
Helena K. 1.1 635 )))
Helena K. 3.3 636 |(% style="width:206px" %)(((
Helena K. 1.1 637 Definition:
Helena K. 3.3 638 )))|(% style="width:1524px" %)(((
Helena K. 1.1 639 //General definition~://
640
641 The timeliness** **of statistical outputs is the length of time between the end of the event or phenomenon they describe and their availability.
642
643 //Specific definition~://
644
645 The number of days (or weeks or months) from the last day of the reference period to the day of publication of complete and final results.
646 )))
Helena K. 3.3 647 |(% style="width:206px" %)Applicability :|(% style="width:1524px" %)(((
Helena K. 1.1 648 This indicator is applicable:
649
650 * to all statistical processes;
651 * to users and producers, with different level of details given.
652 )))
Helena K. 3.3 653 |(% style="width:206px" %)(((
Helena K. 1.1 654 Calculation formulae:
Helena K. 3.3 655 )))|(% style="width:1524px" %)(((
Helena K. 1.16 656 //T//,,2,, = //d,,finl,, //− //d,,refp,,//
Helena K. 1.1 657
658 //d,,finl,,// … Release date of final results ;
659 //d,,refp,,//… Last day (date) of the reference period of the statistics
660
661 //Measurement units//: datum format (calendar days; if the number of days is large, it may be converted into weeks or months)
662
663 Instead of a period, the reference can also be a time point.
664 )))
Helena K. 3.3 665 |(% style="width:206px" %)Target value:|(% style="width:1524px" %)The target values usually are fixed by legislation or gentlemen's agreement. Nevertheless, smaller values denote higher timeliness.
666 |(% style="width:206px" %)Aggregation levels and principles: |(% style="width:1524px" %)The calculation is done, for a meaningful choice, at subject matter domain level. It could refer to the current production round or be an average over a time period. Aggregations are possible at EU and domain (e.g. social statistics, business statistics) level.
667 |(% style="width:206px" %)(((
Helena K. 1.1 668 Interpretation:
Helena K. 3.3 669 )))|(% style="width:1524px" %)(((
Helena K. 1.1 670 This indicator quantifies the gap between the release date of the final results and the end of the reference period.
671
672 Comparisons could be made among statistical processes with the same periodicity
673 )))
Helena K. 3.3 674 |(% style="width:206px" %)Specific guidance|(% style="width:1524px" %)(((
Helena K. 1.1 675 The reasons for possible long production times should be explained and efforts to improve the situation should be described.
676
677 To be further defined by subject matter domain, taking the revisions’ policy into account, what could be considered by "final results".
678
679 For annual statistics or where timeliness is measured in years rather than in days a sentence stating timeliness would be sufficient.
680 )))
Helena K. 3.3 681 |(% style="width:206px" %)References:|(% style="width:1524px" %)§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
Helena K. 2.1 682 § ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.1 683
Helena K. 3.1 684 = TP3. Punctuality - delivery and publication =
Helena K. 1.1 685
Helena K. 3.3 686 |(% style="width:202px" %)(((
Helena K. 1.2 687 **Name: **
Helena K. 3.3 688 )))|(% style="width:1528px" %)(((
Helena K. 1.2 689 **~ TP3. Punctuality - delivery and publication**
Helena K. 1.1 690 )))
Helena K. 3.3 691 |(% style="width:202px" %)(((
Helena K. 1.1 692 Definition:
Helena K. 3.3 693 )))|(% style="width:1528px" %)Punctuality is the time lag between the delivery/release date of data and the target date for delivery/release as agreed for delivery or announced in an official release calendar, laid down by Regulations or previously agreed among partners.
694 |(% style="width:202px" %)Applicability :|(% style="width:1528px" %)(((
Helena K. 1.1 695 The punctuality of publication is applicable:
696
697 * to all statistical processes with fixed/pre-announced release dates,
698 * to users and producers, with different aspects and calculation formulae.
699
700 Computed only by Eurostat but recommended also for inclusion in national quality reports.
701 )))
Helena K. 3.3 702 |(% style="width:202px" %)(((
Helena K. 1.1 703 Calculation formulae:
Helena K. 3.3 704 )))|(% style="width:1528px" %)(((
Helena K. 1.1 705 **For producers:**
706
707 **Punctuality of data delivery P3 **
708
Helena K. 1.17 709 //P//,,3,, = //d,,act,, //− //d,,sch,,//
Helena K. 1.1 710
Helena K. 1.18 711 d,,act,, .. Actual date of the effective provision of the statistic
Helena K. 2.1 712 d,,sch,,…Scheduled date of the effective provision of the statistics
Helena K. 1.1 713
Helena K. 1.2 714 // Measurement units//: datum format (calendar days)
Helena K. 1.1 715
Helena K. 2.1 716 **~ For users:**
Helena K. 1.1 717
Helena K. 1.2 718 **~ Rate of punctuality of data publication** **P3,,R,,** Relevant for a group of statistics/results
Helena K. 1.1 719
720 P3,,R,, is the rate of datasets that have met the release calendar date in a group of datasets. m
721
Helena K. 1.19 722 [[image:1768515946337-210.png]]
Helena K. 1.1 723
Helena K. 2.1 724 m,,pc,,… Number of statistics/results that have been published on the date announced in the calendar or have been released earlier (punctual)
725 m,,up,,… Number of statistics/results that have not met the date announced in the calendar (unpunctual)
Helena K. 1.1 726 )))
Helena K. 3.3 727 |(% style="width:202px" %)Target value:|(% style="width:1528px" %)(((
Helena K. 1.1 728 The target value for P3 is 0 meaning that there is no delay on the delivery/transmission of data.
729
730 For P3,,R,, the target value is 1 meaning that 100% of the items were published on the pre-fixed calendar date.
731 )))
Helena K. 3.3 732 |(% style="width:202px" %)Aggregation levels and principles: |(% style="width:1528px" %)(((
Helena K. 2.1 733 There are two aspects:
Helena K. 1.1 734
Helena K. 1.2 735 * National data deliveries to Eurostat (producer-oriented),
736 * Publication/release by Eurostat (user oriented),
Helena K. 1.1 737
738 The calculation is done at statistical process level. Aggregations are to be made at EU-level over countries and over domains.
739 )))
Helena K. 3.3 740 |(% style="width:202px" %)(((
Helena K. 1.1 741 Interpretation:
Helena K. 3.3 742 )))|(% style="width:1528px" %)(((
Helena K. 1.1 743 The indicator **Punctuality of data delivery** quantifies the difference (time lag) between actual and target date.
744
745 This should be interpreted according to the periodicity of the statistical process.
746
Helena K. 1.2 747 The indicator **Rate of punctuality** of release (P3,,R,,),, ,,evaluates the punctuality of release of a group of particular datasets.
Helena K. 1.1 748 )))
Helena K. 3.3 749 |(% style="width:202px" %)(((
Helena K. 1.19 750 Specific guidance
Helena K. 3.3 751 )))|(% style="width:1528px" %)(((
Helena K. 1.1 752 **For producers:**
753
754 For compliance monitoring purposes Eurostat domain managers should monitor this indicator for individual countries. This information can be pre-filled by Eurostat as it is known when data are received from the MS. Formula P3 should be applied in this case.
755
756 This indicator can be presented in table format for the different MS.
757
758 The reasons for late or non-punctual delivery should be stated along with their effect on the statistical product, meaning that because of late data deliveries the quality assurance procedures for the whole product/series might not be completed.
759
760 **For users:**
761
762 Enough to compile this indicator as an aggregate at ESTAT level. Formula P3,,R,, should be applied in this case.
763
764 Some explanations should be given to users concerning non-punctual publication.
765 )))
Helena K. 3.3 766 |(% style="width:202px" %)References:|(% style="width:1528px" %)§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
Helena K. 2.1 767 § ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.1 768
Helena K. 1.19 769 = CC1. Asymmetry for mirror flows statistics - coefficient =
770
Helena K. 3.3 771 |(% style="width:205px" %)(((
Helena K. 1.2 772 **Name: **
Helena K. 3.3 773 )))|(% style="width:1525px" %)(((
Helena K. 1.2 774 **~ CC1. Asymmetry for mirror flows statistics - coefficient**
Helena K. 1.1 775 )))
Helena K. 3.3 776 |(% style="width:205px" %)(((
Helena K. 1.1 777 Definition:
Helena K. 3.3 778 )))|(% style="width:1525px" %)(((
Helena K. 1.1 779 //General definition~://
780
781 Discrepancies between data related to flows, e.g. for pairs of countries.
782
783 //Specific definition (a few versions are provided) Bilateral mirror statistics~://
784
785 The difference or the absolute difference of inbound and outbound flows between a pair of countries divided by the average of these two values.
786
787 //Comment//
788
789 Outbound and inbound flows should be considered to be any kind of flows specific to each subject matter domain (amounts of products traded, number of people visiting a country for tourism purposes, etc.)
790 )))
Helena K. 3.3 791 |(% style="width:205px" %)Applicability:|(% style="width:1525px" %)(((
Helena K. 1.1 792 The asymmetries for statistics mirror flows is applicable:
793
Helena K. 1.2 794 * to domains in which mirror statistics (flows concerning trade, migration, tourism statistics, FATS, balance of payment etc) are available
795 * to producers.
Helena K. 1.1 796
797 Computed by Eurostat (pre-filled in quality report)
798 )))
Helena K. 3.3 799 |(% style="width:205px" %)(((
Helena K. 1.1 800 Calculation formulae:
Helena K. 3.3 801 )))|(% style="width:1525px" %)(((
Helena K. 1.1 802 **Bilateral mirror statistics:**
803
804 For each pair of countries, suppose:
805
Helena K. 1.19 806 A – Country A
807 B – Country B
Helena K. 1.1 808
Helena K. 1.19 809 [[image:1768516072576-881.png]]
Helena K. 1.1 810
811 A joint measure can be obtained from the two differences in relation to an average flow (several possibilities, one is given below):
812
Helena K. 1.19 813 [[image:1768516134346-935.png]]
Helena K. 1.1 814
Helena K. 1.20 815 OF,,AB,, - outbound flow going from country A to country B 
816 m IF,,AB – ,,mirror inbound flow
817 IF,,BA,, - mirror inbound flow to country B from country A
818 m OF,,AB - ,,mirror outbound flow
Helena K. 1.1 819
820 **Multilateral mirror statistics: **
821
Helena K. 1.20 822 OF,,AiOj,, - outbound flow going from country A,,i,, to any other country O,,i,, 
823 mIF,,AiOj – ,,mirror inbound flow
Helena K. 1.1 824 Ai – country Ai
825 Oj – Another country Oj
Helena K. 1.20 826 K – the number of countries country A,,i,, may have contacts with
827 C – group of countries EU + EFTA
Helena K. 1.1 828
Helena K. 1.20 829 [[image:1768516229122-771.png]]
Helena K. 1.1 830 )))
831
Helena K. 3.3 832 |(% style="width:203px" %)Target value:|(% style="width:1528px" %)The value of this indicator should be as close to zero as possible, since – at least in theory – the value of inbound and outbound flows between pairs of countries should match.
833 |(% style="width:203px" %)Aggregation levels and principles:|(% style="width:1528px" %)(((
Helena K. 1.1 834 * MS: The calculation is done for key variables/sub-series to be selected by the Eurostat domain manager.
835 * EU: Aggregations are possible at EU-level (see multilateral mirror statistics formulae). Alternatively, where e.g. not all information is available, lower and higher values of bilateral mirror statistics can be reported to indicate the range.
836 )))
Helena K. 3.3 837 |(% style="width:203px" %)(((
Helena K. 1.1 838 Interpretation:
Helena K. 3.3 839 )))|(% style="width:1528px" %)(((
Helena K. 1.1 840 In domains where mirror statistics are available it is possible to assess geographical comparability measuring the discrepancies between inbound and outbound flows for pairs of countries.
841
842 Mirror data can help checking the consistency of data reporting, of data, of the reporting process and the definitions used. Finally, they can help to estimate missing data. For the users the asymmetries indicators provide some indication of overall data credibility.
843
844 There is perfect symmetry (outbound flows are equal to mirror inbound flows) when the coefficient is equal to zero. The more the coefficient diverges from zero, the more the asymmetry between outbound flows and mirror inbound flows becomes important.
845 )))
Helena K. 3.3 846 |(% style="width:203px" %)(((
Helena K. 1.1 847 Specific guidance:
Helena K. 3.3 848 )))|(% style="width:1528px" %)(((
Helena K. 2.1 849 CC2A,,B,, and CC2B,,A ,,indicators can be negative or positive. Indicator CC2AB is always non-negative.
Helena K. 1.1 850
851 Outbound flows from Member State A to Member State B, as reported by A, should be almost equal to inbound flows into B coming from A, as reported by B. Because some domains use a different valuation principle, inbound flows can be slightly different from outbound flows. Therefore comparisons dealing with mirror statistics have to be made cautiously and should take into account the existence of these discrepancies.
852
Helena K. 2.1 853 The asymmetry coefficient CC2AB is useful because it can be monitored over time.
Helena K. 1.1 854
Helena K. 2.1 855 Indicators CC2A,,B,, and CC2B,,A,, can be either positive or negative and can be used to estimate if a country is globally declaring higher or lower level of flows compared with the mirror flows declared by its partner countries. Indicators CC2A,,B,, and CC2B,,A,, should be presented in a table (example foreign trade statistics).
Helena K. 1.1 856 )))
Helena K. 3.3 857 |(% style="width:203px" %)References:|(% style="width:1528px" %)(((
Helena K. 1.1 858 * ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
859 * ESS Standard for Quality Reports – 2009 Edition (Eurostat).
860 * International trade in services statistics - Monitoring progress on implementation of the Manual and assessing data quality – OECD Eurostat Trade in services experts meeting 2005.
861 )))
862
Helena K. 1.21 863 = CC2. Length of comparable time series =
Helena K. 1.20 864
Helena K. 3.3 865 |(% style="width:203px" %)(((
Helena K. 1.2 866 **~ Name: **
Helena K. 3.3 867 )))|(% style="width:1528px" %)(((
Helena K. 1.2 868 **~ CC2. Length of comparable time series **
Helena K. 1.1 869 )))
Helena K. 3.3 870 |(% style="width:203px" %)(((
Helena K. 1.1 871 Definition:
Helena K. 3.3 872 )))|(% style="width:1528px" %)(((
Helena K. 1.1 873 Number of reference periods in time series from last break.
874
875 //Comment//
876
877 Breaks in statistical time series may occur when there is a change in the definition of the parameter to be estimated (e.g. variable or population) or the methodology used for the estimation. Sometimes a break can be prevented, e.g. by linking.
878 )))
Helena K. 3.3 879 |(% style="width:203px" %)Applicability:|(% style="width:1528px" %)(((
Helena K. 1.1 880 The length of comparable time series is applicable:
881
882 * to all statistical processes producing time-series;
883 * to users and producers, with different level of details given.
884
885 Computed only by Eurostat but recommended also for inclusion in national quality reports.
886 )))
Helena K. 3.3 887 |(% style="width:203px" %)(((
Helena K. 1.1 888 Calculation formula:
Helena K. 3.3 889 )))|(% style="width:1528px" %)(((
Helena K. 1.1 890 The reference periods are numbered.
891
Helena K. 2.1 892 //CC//,,1,, = //J,,last,, //− //J,,first,, //+1
Helena K. 1.1 893 //J,,last,,// …number of the last reference period with disseminated statistics.
894 //J,,first,,//,, ,,…number of the first reference period with comparable statistics.
895 )))
Helena K. 3.3 896 |(% style="width:203px" %)Target value:|(% style="width:1528px" %)A long time series may seem desirable, but it may be motivated to make changes, e.g. since reality motivates new concepts or to achieve coherence with other statistics.
897 |(% style="width:203px" %)Aggregation levels and principles:|(% style="width:1528px" %)(((
Helena K. 1.1 898 The calculation is done at statistical process level. Aggregations are possible at MS, EU, and Domain (e.g. social statistics, business statistics) level.
899
900 The indicator for the EU or domain level should be calculated by Eurostat considering the time series of the EU aggregate.
901 )))
Helena K. 3.3 902 |(% style="width:203px" %)(((
Helena K. 1.1 903 Interpretation:
Helena K. 3.3 904 )))|(% style="width:1528px" %)If there has not been any break, the indicator is equal to the number of the time points in the time series.
905 |(% style="width:203px" %)Specific guidance:|(% style="width:1528px" %)(((
Helena K. 1.1 906 The length of the series with comparable statistics is expressed as the number of time periods (points) in this series. It is counted from the first time period with statistics after the break onwards. The result does not depend on the length of the reference period.
907
908 Only applicable for the statistical data disseminated in the sequence of regular time periods (points).
909
910 If more than one series exist for one statistical process the domain manager should select the appropriate ones for calculation.
911 )))
Helena K. 3.3 912 |(% style="width:203px" %)(((
Helena K. 1.1 913 References:
Helena K. 3.3 914 )))|(% style="width:1528px" %)§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
Helena K. 3.1 915 § ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.1 916
Helena K. 1.21 917 = AC1. Data tables – consultations =
918
Helena K. 3.3 919 |(% style="width:205px" %)(((
Helena K. 1.2 920 **~ Name: **
Helena K. 3.3 921 )))|(% style="width:1526px" %)**AC1. Data tables – consultations{{footnote}}The indicator must be collected in collaboration with Unit D4 - Dissemination.{{/footnote}} **
922 |(% style="width:205px" %)(((
Helena K. 1.1 923 Definition:
Helena K. 3.3 924 )))|(% style="width:1526px" %)(((
Helena K. 1.1 925 Number of consultations of data tables within a statistical domain for a given time period.
926
927 By "number of consultations" it is meant number of data tables views, where multiples views in a single session count only once.
928
929 Some information available through the monthly Monitoring report on
930
931 Eurostat Electronic Dissemination and its excel files with detailed figures.
932 )))
Helena K. 3.3 933 |(% style="width:205px" %)Applicability:|(% style="width:1526px" %)(((
Helena K. 1.1 934 The number of consultations of data tables is applicable:
935
936 * to all statistical processes using on-line data tables for dissemination of statistics;
937 * to producers (Eurostat domain managers).
938
939 Computed only by Eurostat but recommended also for inclusion in national quality reports.
940 )))
Helena K. 3.3 941 |(% style="width:205px" %)(((
Helena K. 1.1 942 Calculation formulae:
Helena K. 3.3 943 )))|(% style="width:1526px" %)(((
Helena K. 1.1 944 AC2 = #//CONS//
945
946 where #//CONS//,, ,,denotes the absolute number of elements in the set CONS (this is also called cardinality of the set). In this case CONS represents the consultations of a data table for specific subject-matter domain. The frequency of collection of the figures for this indicator should be monthly.
947
948 Remark: internal page views will be excluded.
949 )))
Helena K. 3.3 950 |(% style="width:205px" %)Target value:|(% style="width:1526px" %)There is no immediate interpretation of low and high values of this indicator, and there is no particular target.
951 |(% style="width:205px" %)Aggregation levels and principles: |(% style="width:1526px" %)(((
Helena K. 1.1 952 The calculation is done at statistical process level. Aggregation is possible at the following level:
953
954 * Domains specific data tables.
955 * Annual aggregation.
956
957 The principle is to calculate the number of consultations of data tables by subject matter.
958 )))
Helena K. 3.3 959 |(% style="width:205px" %)Interpretation:|(% style="width:1526px" %)(((
Helena K. 1.1 960 This indicator should be carefully analysed and combined with other information that will complement the analysis.
961
962 The indicator contributes to the assessment of users' demand of data (level of interest), for the assessment of the relevance of subject-matter domains.
963
964 A ratio can be computed to give insight to the proportion of consultation of the ESMS files in question in comparison to the total number of consultations for all the domains.
965 )))
Helena K. 3.3 966 |(% style="width:205px" %)Specific guidance: |(% style="width:1526px" %)(((
Helena K. 1.1 967 An informative and straightforward way to represent the output of this indicator is by plotting the figures over time in a graph. In particular, it would be a graph where the horizontal (x) axis would represent months and the vertical (y) axis would represent the number of datasets consulted. It would be possible to monitor the interest of users for each dataset at the domain specific level.
968
969 A graph of both the number of consultations of data tables and ESMS files (AC1), with the appropriate tuning, would be interesting to display.
970 )))
Helena K. 3.3 971 |(% style="width:205px" %)(((
Helena K. 1.1 972 References:
Helena K. 4.1 973 )))|(% style="width:1526px" %)§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
974 § ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.1 975
Helena K. 2.1 976 = AC2. Metadata - consultations =
977
Helena K. 3.3 978 |(% style="width:201px" %)(((
Helena K. 1.2 979 **Name: **
Helena K. 3.3 980 )))|(% style="width:1529px" %)(((
Helena K. 1.3 981 **~ AC2. Metadata - consultations{{footnote}}The indicator must be collected in collaboration with Unit D4 - Dissemination.{{/footnote}}**
Helena K. 1.1 982 )))
Helena K. 3.3 983 |(% style="width:201px" %)(((
Helena K. 1.1 984 Definition:
Helena K. 3.3 985 )))|(% style="width:1529px" %)(((
Helena K. 1.1 986 Number of metadata consultations (ESMS) within a statistical domain for a given time period.
987
988 By "number of consultations" it is meant the number of times a metadata file is viewed.
989
990 Some information is available through the monthly Monitoring report on
991
992 Eurostat Electronic Dissemination and its excel files with detailed figures.
993 )))
Helena K. 3.3 994 |(% style="width:201px" %)Applicability|(% style="width:1529px" %)(((
Helena K. 1.1 995 This indicator is applicable:
996
997 * to all statistical processes;
998 * to producers (Eurostat domain managers).
999
1000 Computed only by Eurostat.
1001 )))
Helena K. 3.3 1002 |(% style="width:201px" %)(((
Helena K. 1.1 1003 Calculation formulae:
Helena K. 3.3 1004 )))|(% style="width:1529px" %)(((
Helena K. 1.1 1005 AC1 = #//ESMS//
1006
Helena K. 2.1 1007 where #//ESMS//,, ,,denotes the absolute number of elements in the set ESMS (this is also called cardinality of the set). In this case the set ESMS represents the ESMS files consulted for a specific subject-matter domain for a given time period.
Helena K. 1.1 1008
1009 Remark: internal page views will be excluded.
1010 )))
Helena K. 3.3 1011 |(% style="width:201px" %)Target value:|(% style="width:1529px" %)There is no immediate interpretation of low and high values of this indicator, and there is no particular target.
1012 |(% style="width:201px" %)Aggregation levels and principles:|(% style="width:1529px" %)(((
Helena K. 1.1 1013 The calculation is done at statistical process level. Aggregation is possible at the following levels:
1014
1015 * Domains specific ESMS files.
1016 * Annual aggregation.
1017
1018 The principle is to calculate the number of consultations of ESMS files by subject matter domains.
1019 )))
Helena K. 3.3 1020 |(% style="width:201px" %)(((
Helena K. 1.1 1021 Interpretation:
Helena K. 3.3 1022 )))|(% style="width:1529px" %)(((
Helena K. 1.1 1023 The indicator contributes to the assessment of users' demand of metadata (level of interest), for the assessment of the relevance of subject-matter domains.
1024
1025 A ratio can be computed to give insight to the proportion of consultation of the ESMS files in question in comparison to the total number of consultations for all the domains.
1026 )))
Helena K. 3.3 1027 |(% style="width:201px" %)(((
Helena K. 1.1 1028 Specific guidance
Helena K. 3.3 1029 )))|(% style="width:1529px" %)(((
Helena K. 1.1 1030 An informative and straightforward way to represent the output of this indicator is by plotting the figures over time in a graph. In particular, it would be a graph where the horizontal (x) axis would represent months and the vertical (y) axis would represent the number of ESMS files consulted. It would be possible to monitor the interest of users for each ESMS file at the domain specific level.
1031
1032 A graph of both the number of consultations of data tables (indicator AC2) and metadata (ESMS) files with a correspondence, with the appropriate tuning, would be interesting to display, over time.
1033 )))
Helena K. 3.3 1034 |(% style="width:201px" %)References:|(% style="width:1529px" %)§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
Helena K. 2.1 1035 § ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.1 1036
Helena K. 2.1 1037 = AC3. Metadata completeness - rate =
1038
Helena K. 3.3 1039 |(% style="width:201px" %)(((
Helena K. 1.3 1040 **Name: **
Helena K. 3.3 1041 )))|(% style="width:1529px" %)**AC3. Metadata completeness - rate**
1042 |(% style="width:201px" %)(((
Helena K. 1.1 1043 Definition:
Helena K. 3.3 1044 )))|(% style="width:1529px" %)The ratio of the number of metadata elements provided to the total number of metadata elements applicable.
1045 |(% style="width:201px" %)Applicability:|(% style="width:1529px" %)(((
Helena K. 1.1 1046 The rate of completeness of metadata is applicable:
1047
1048 * to all statistical processes;
1049 * to producers (Eurostat domain managers).
1050
1051 Computed only by Eurostat** **but recommended also for inclusion in national quality reports.
1052 )))
Helena K. 3.3 1053 |(% style="width:201px" %)(((
Helena K. 1.1 1054 Calculation formulae:
Helena K. 3.3 1055 )))|(% style="width:1529px" %)(((
Helena K. 2.1 1056 [[image:1768516468361-156.png]]
Helena K. 1.1 1057
1058 //L// in the denominator is the set of applicable metadata elements under consideration and //M ,,L,,// in the numerator is the subset of //L //of available metadata elements. The notation #//L //means the number of elements in the set //L// (the cardinality). Letter C in the left-hand side of the formula stands for both EU and EFTA countries.
1059
1060 The set //L //is obtained by calculation for a group of metadata elements as explained below over a geographical entity (MS or the EU+EFTA), a statistical domain, etc.
1061
1062 There are three groups of metadata, described below together with a categorisation using the current EURO-SDMX concepts (only the main concepts are included in the following breakdown).
1063
1064 1. Metadata about statistical outputs; concepts 3, 4, 5, 8.1, 9, 10;
1065 1. Metadata about statistical processes; concepts 11, 20.1, 20.2, 20.3, 20.4, 20.5, 20.6;
1066 1. Metadata about quality: concepts 12-19
1067
1068 Computations are made separately for each of the three groups and for each of the combinations (group of metadata, EU level, etc.)
1069 )))
Helena K. 3.3 1070 |(% style="width:201px" %)Target value:|(% style="width:1529px" %)The target value is 1 meaning that 100% of metadata is available from what is required/applicable to the statistical process, or aggregate, in question.
1071 |(% style="width:201px" %)Aggregation levels and principles: |(% style="width:1529px" %)(((
Helena K. 1.1 1072 The calculation is done at the level of ESMS files.
1073
1074 Aggregations are possible at MS, EU, and Domain (e.g. social statistics, business statistics) level.
1075
1076 The principle is to calculate the indicators as an un-weighted rate at the level of MS and EU for a statistical domain (social statistics, business statistics etc.).
1077 )))
Helena K. 3.3 1078 |(% style="width:201px" %)(((
Helena K. 1.1 1079 Interpretation:
Helena K. 3.3 1080 )))|(% style="width:1529px" %)(((
Helena K. 1.1 1081 Each indicator shows to what extent metadata of a specific type is available compared to what should be available.
1082
Helena K. 1.3 1083 This indicator should be carefully analysed since this rate only reflects the existing amount of metadata for a certain statistical process but not the quality of that information.
Helena K. 1.1 1084 )))
Helena K. 3.3 1085 |(% style="width:201px" %)Specific guidance:|(% style="width:1529px" %)(((
Helena K. 1.1 1086 All the information is to be retrieved from ESMS files.
1087
1088 In case the ESMS is empty for the different categories specified previously no calculation is needed but a descriptive text should be replaced.
1089
1090 Concerning Eurostat, it is possible to have direct access to those files through Eurostat's website whereas for MS it will be possible to have access to ESMS files, in the near future, through the National RME tool.
1091
1092 It should be taken into account what availability of metadata actually means.
1093 )))
Helena K. 3.3 1094 |(% style="width:201px" %)(((
Helena K. 1.1 1095 References:
Helena K. 3.3 1096 )))|(% style="width:1529px" %)(((
Helena K. 2.1 1097 * ESS Handbook for Quality Reports – 2009 Edition (Eurostat).
1098 * ESS Standard for Quality Reports – 2009 Edition (Eurostat).
Helena K. 1.1 1099 * Euro SDMX Metadata Structure, version March 2009.
1100 )))
1101
Helena K. 2.1 1102 ----
Helena K. 1.1 1103
Helena K. 1.3 1104 {{putFootnotes/}}
© Semantic R&D Group, 2026