Interoperability Basis Platform

|**Name:**|**R1. Data completeness - rate**

|Definition:|The ratio of the number of data cells (entities to be specified by the Eurostat domain manager) provided to the number of data cells required by Eurostat or relevant. The ratio is computed for a chosen dataset and a given period.

7

|Applicability:|(((

8

The rate of available data is applicable:

9

10

* to all statistical processes (including use of administrative sources);

11

* to users and producers, with different focus and calculation formulae.

12

13

Computed only by Eurostat but recommended also for inclusion in national quality reports.

14

)))

15

|(((

16

Calculation formulae:

)))|(((

**For a specific key variable:

26

For producers:**

27

28

#//A,,D,,rqd//,, ,,//R//1//^^PDR ^^//= #//Drqd//

29

30

//D^^rqd^^// in the denominator is the set of data cells required (i.e. excl. derogations/confidentiality) and # //A,,D,,^^rqd^^// in the numerator is the corresponding subset of available/provided data cells. The notation #//D //means the number of elements in the set //D// (the cardinality).

**For users**

#//ADrel//,, ,,//R//1//^^U ^^//= #//Drel//

35

36

//D^^rel^^// in the denominator is the set of relevant data cells (full coverage, i.e. excl. only those entities for which the data wouldn't be relevant like e.g. fishing fleet in Hungary) and //A,,D,,^^rel^^// in the numerator is the corresponding subset of available/provided data cells. The notation #//D// means the number of elements in the set //D// (the cardinality).

37

38

39

The main difference between the two formulas lies in the selection of the denominators' datasets.

40

41

Regarding the first formula, for **producers**, this set comprises the required data cells excluding derogations/confidentiality, since producers are interested in assessing the level of compliance with the requirements.

42

43

On the other hand, for **users**, the formula gives the rate of provided data cells to the ones that are theoretically relevant, meaning that missing cells due to derogations/confidentiality or any other reason for missing data are included here, leaving out only those cells for which data wouldn't be relevant like e.g. fishing fleet in Hungary.

44

)))

45

|Target value:|The target value for this indicator is 1 meaning that 100% of the required or relevant data cells are available.

46

|Aggregation levels and principles:|The calculation is done, for a meaningful choice by the domain manger, at (((

47

subject matter domain level. Aggregations are recommended at EU level for the user-oriented indicator.

48

49

The number of data cells provided and the number of data cells required/relevant are aggregated separately, from which a ratio is then computed.

50

)))

51

52

|(% style="width:204px" %)Interpretation: |(% style="width:1526px" %)(((

53

The indicator shows to what extent statistics are available compared to what should be available.

**For producers:**

It can be used to evaluate the degree of compliance by a given Member

58

59

State for a given dataset and period to be specified by the domain manager.

**For users:**

At EU level, it can be used to

64

65

* identify whether important variables are missing for some individual Member State or alternatively

66

* give users an overall measurement (aggregate across countries and/or key variables) of the availability of statistics.

67

)))

68

|(% style="width:204px" %)Specific guidance:|(% style="width:1526px" %)(((

69

The indicator should be accompanied by information about which variable are missing and the reasons for incompleteness as well as, where relevant, the impact of the missing data on the EU aggregate and plans for improving completeness in the future.

70

71

Calculation would need intervention by the Eurostat domain manager at the initial stage (to define the key variables and the period to be monitored). Later on, the indicators should be calculated automatically.

72

73

Both formulas are to be computed per key variable, nevertheless an aggregate for all variables can be calculated.

**For producers:**

This indicator forms part of Eurostat compliance monitoring, thus for producers it should be computed per Member State.

**For users:**

If certain relevant variables are not reported, the statistics are incomplete. This can be due to data not being collected or data being of low quality or confidential. For users an aggregate across countries for all the key variables could suffice.

82

)))

83

|(% style="width:204px" %)References:|(% style="width:1526px" %)(((

84

* ESS Handbook for Quality Reports – 2009 Edition (Eurostat).

85

* ESS Standard for Quality Reports – 2009 Edition (Eurostat).

86

* ISO/IEC FDIS 11179-1 "Information technology – Metadata registries – Part 1: Framework", March 2004 (according to the SDMX Metadata

87

88

Common Vocabulary draft Febr. 2008).

89

)))

90

91

|(% style="width:204px" %)**Name:**|(% style="width:1526px" %)**A1. Sampling error - indicators**

92

|(% style="width:204px" %)(((

Definition:

)))|(% style="width:1526px" %)(((

97

The sampling error can be expressed:

98

99

1. in relative terms, in which case the relative standard error or, synonymously, the coefficient of variation (CV) is used. (The standard error of the estimator ,,θ,,^^ˆ^^ is the square root of its variance V(,,θ,,^^ˆ^^) .) The estimated relative standard error (the estimated CV) is the estimated standard error of the estimator divided by the estimated value of the parameter, see calculation formulae below.

100

1. in terms of confidence intervals, i.e. an interval that includes with a given level of confidence the true value of a parameter θ. The width of the interval is related to the standard error.

101

102

The estimator should take into account the sampling design and should further integrate the effect on precision of adjustments for non-response, corrections for misclassifications, use of auxiliary information through calibration methods etc.

103

)))

104

|(% style="width:204px" %)Applicability:|(% style="width:1526px" %)(((

105

Sampling errors indicator are applicable:

106

107

* to statistical processes based on probability samples or other sampling procedures allowing computation of such information. - to users and producers, with different level of details given.

)))

|(% style="width:204px" %)(((

112

Calculation formulae:

113

114

115

)))|(% style="width:1526px" %)(((

116

**Coefficient of variation:**

117

118

119

Remark: The subscript "e" stands for estimate.

120

121

**Confidence interval, symmetric:**

122

123

**~ **[,,θ,,ˆ −//d//;,,θ,,ˆ +//d//] or ,,θ,,ˆ,,±,,//d//

124

125

The length of the interval, which is 2∙d, depends on the confidence level (e.g. 95%), the assumptions convening the distribution of the estimator of the parameter, and the sampling error. In many cases d has the form below, where t depends on the distribution and the confidence level. //d //= //t//× //V//^^ˆ^^(,,θ,,^^ˆ^^)

126

127

In case of totals, means and ratios, formulas for aggregation of coefficients of variation at EU level can be found in the third reference below.

128

129

The calculation formulae depend on the sampling design, the estimator, and the method chosen for estimating the variance //V//(,,θ,,ˆ).

130

)))

131

|(% style="width:204px" %)Target value:|(% style="width:1526px" %)(((

132

The smaller the CV, the standard error, and the width of the confidence interval, the more accurate is the estimator. Survey regulations may include specifications for precision thresholds at different population levels.

133

)))

134

|(% style="width:204px" %)Aggregation levels and principles:|(% style="width:1526px" %)The calculation is done for all statistics based on probability sample (((

135

surveys or equivalent. Aggregations are possible at Member State and EU levels, depending on estimators and degree of harmonisation.

136

137

The principle for computing the coefficient of variation of an aggregate depends on the method for aggregation of the estimator belonging to that variable.

138

)))

139

|(% style="width:204px" %)(((

Interpretation:

)))|(% style="width:1526px" %)(((

144

The CV is a relative (dimensionless) measure of the precision of a statistical estimator, often expressed as a percentage. More specifically, it has the property of eliminating measurement units from precision measures and one of its roles is to make possible comparisons between precision of estimates of different indicators.

145

146

However, this property has no value added in case of proportions (which are by definition dimensionless indicators).

147

)))

148

|(% style="width:204px" %)(((

Specific guidance:

)))|(% style="width:1526px" %)(((

153

There are several precision measures which can be used to estimate the random variation of an estimator due to sampling, such as coefficients of variation, standard errors and confidence intervals.

154

155

The coefficient of variation is suitable for quantitative variables with large positive values. It is not robust for percentages or changes and is not usable for data estimates of negative values, where they may be substituted by absolute measures of precision (standard errors or confidence intervals).

156

157

The confidence interval is usually the precision measure preferred by data users. It is the clearest way of understanding and interpreting the sampling variability.

158

159

Provision of confidence intervals is voluntary.

160

161

The CV has the advantage of being dimensionless. The standard error or a confidence interval is sometimes preferable, as discussed.

162

)))

163

|(% style="width:204px" %)Reference:|(% style="width:1526px" %)(((

164

* ESS Handbook for Quality Reports – 2009 Edition (Eurostat).

165

* ESS Standard for Quality Reports – 2009 Edition (Eurostat).

166

* Variance estimation methods in the European Union, Monographs of official Statistics, 2002 edition.

167

)))

168

169

= A2. Over-coverage - rate =

170

171

|(% style="width:200px" %)(((

172

**~ Name: **

173

)))|(% style="width:1530px" %)(((

174

**~ A2. Over-coverage - rate**

175

)))

176

|(% style="width:200px" %)(((

Definition:

)))|(% style="width:1530px" %)(((

181

The rate of over-coverage is the proportion of units accessible via the frame that do not belong to the target population (are out-of-scope).

182

183

The //target population** **//is the population for which inferences are made. The //frame// (or frames) is a device that permits access to population units. The //frame population** **//is the set of population units which can be accessed through the frame. The concept of a frame is traditionally used for sample surveys, but applies equally to several other statistical processes, e.g. censuses, processes using administrative sources, and processes involving multiple data sources. Coverage deficiencies may be due to delays in reporting (typical for business statistics) and to errors in unit identification, classification, coding etc. This is the case also when administrative data are used.

184

185

The rate may be calculated either as un-weighted or as weighted to refer to the overall level (frame/population rather than sample). Units of unknown eligibility provide an inherent difficulty; see below.

186

)))

187

|(% style="width:200px" %)Applicability :|(% style="width:1530px" %)(((

188

The rate of over-coverage is applicable:

189

190

* to all statistical processes (including use of administrative sources);

191

* to producers.

192

193

If the survey has more than one unit type, a rate may be calculated for each type.

194

195

If there is more than one frame or if over-coverage rates vary strongly between sub-populations, rates should be separated.

196

)))

197

|(% style="width:200px" %)Calculation formulae:|(% style="width:1530px" %)(((

198

The over-coverage rate has three main versions written in one and the same formula as the weighted over-coverage rate,, ,,//OCr,,w,,//

199

200

//OCrw //= ∑∑//O Owj //+//j //∑//E wj //+∑∑//Q Qwwj j// //w //+(1−α)

201

202

O set of out-of-scope units (over-coverage, resolved and not belonging to the target population)

203

204

E set of in-scope units (resolved units belonging to the target population; eligible units)

205

206

Q set of units of unknown eligibility. //w,,j,,// weight of unit //j//, described below.

207

208

α The estimated proportion of cases of unknown eligibility that are actually eligible. It should be set equal 1 unless there is strong evidence at country level for assuming otherwise.

209

210

The three main cases are:

211

212

Un-weighted rate: //w,,j ,,//=1

213

214

Design-weighted rate: //w,,j ,,//=//d ,,j ,,//where basically //d ,,j ,,//=1π//,,j ,,//, meaning that the design weight is the inverse of the selection probability.

215

216

Size-weighted rate: //w,,j ,,//=//d ,,j ,,x ,,j ,,//where //x,,j,,// is the value of a variable X.

217

218

The variable X, which is chosen subjectively, shows the size or importance of the units. The value should be known for all units. X is auxiliary information, often available in the frame. Examples are turnover for businesses and population for municipalities.

219

220

For the over-coverage rate the un-weighted and the design-weighted alternatives are the ones mostly used, see Interpretation below.

221

222

The design-weighted rate is mainly used for samples surveys, but it may apply also, e.g., for price index processes or processes with multiple data sources. The weight //d ,,j,,// is a “raising” factor when unit //j// represents more than itself. Otherwise //d ,,j ,,//is equal to one. Hence, when dealing with administrative sources the un-weighted and the size-weighted versions of the rate are normally the interesting one.

223

)))

224

225

|(% style="width:202px" %)Target value:|(% style="width:1528px" %)The target value of this indicator is as much as possible close to 0.

226

|(% style="width:202px" %)(((

227

Aggregation levels and principles:

228

229

230

)))|(% style="width:1528px" %)(((

231

* MS: the indicator is to be calculated for frame populations where meaningful, e.g. over industries. Then separate frame populations are treated as one frame population.

232

* EU: the indicator can be aggregated across countries only where statistical production processes are fully harmonised. For the statistical processes involved, the separate frame populations are treated as one frame population. Where production processes differ across countries, lower and higher over-coverage rates can be shown to indicate the range.Interpretation:

233

234

//Over-coverage//: there are units accessible via the frame, which do not belong to the target population (e.g., deceased persons still listed in a Population Register or no longer operating enterprises still in the Business Register).

235

236

The interest of the indicator depends on the statistical process and the ways of identification of over-coverage. If administrative data are used also to define the target population, this indicator normally has little value added, except possibly duplicates, if they are found. It may provide an overall idea of the quality of the register/frame and the rate of change of the population.

237

238

The un-weighted over-coverage rate gives the number of units that have been found not belonging to the target in proportion to the total number of observed units. The number refers to the sample, the census or the register population studied.

239

240

The design-weighted over-coverage rate is an estimate for the frame population in comparison with the target population, based on the information at hand, usually a sample.

241

242

The size-weighted over-coverage rate expresses the rate in terms of a chosen size variable, e.g. turnover in business statistics. (This case is less interesting for over-coverage than for non-response.)

243

)))

244

|(% style="width:202px" %)(((

Specific guidance:

)))|(% style="width:1528px" %)-

249

|(% style="width:202px" %)References:|(% style="width:1528px" %)§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat). § ESS Standard for Quality Reports – 2009 Edition (Eurostat).

250

|(% style="width:202px" %)(((

251

**~ Name:**

252

)))|(% style="width:1528px" %)**A3. Common units - proportion**

253

|(% style="width:202px" %)(((

254

Definition:

255

)))|(% style="width:1528px" %)The proportion of units covered by both the survey and the administrative sources in relation to the total number of units in the survey.

256

|(% style="width:202px" %)Applicability:|(% style="width:1528px" %)(((

257

The proportion is applicable

258

259

* to mixed statistical processes where some variables or data for some units come from survey data and others from administrative source(s);

260

* to producers.

261

)))

262

|(% style="width:202px" %)Calculation formulae:|(% style="width:1528px" %)(((

263

//Ad //,,= ,,No. of common units across survey data and admin. sources

264

265

No. of unique units in survey data

**~ **

)))

|(% style="width:202px" %)Target value:|(% style="width:1528px" %)-

270

|(% style="width:202px" %)Aggregation levels and principles::|(% style="width:1528px" %)-

271

|(% style="width:202px" %)(((

Interpretation:

)))|(% style="width:1528px" %)(((

276

The indicator is used when administrative data is combined with survey data in such a way that data on unit level are obtained from both the survey and one or more administrative sources (some variables come from the survey and other variables from the administrative data) or when data for part of the units come from survey data and for another part of the units from one or more administrative sources.

277

278

The indicator provides an idea of completeness/coverage of the sources – to what extent units exist in both administrative data and survey data. This indicator does not apply if administrative data is used only to produce estimates without being combined with survey data.

279

)))

280

|(% style="width:202px" %)Specific guidance:|(% style="width:1528px" %)(((

281

Common units refer to those units that are included in the data stemming from an administrative source and survey data.

282

283

284

For the purpose of this indicator, the “unique units in survey data” in the denominator means that if a unit exists in more than one source it should only be counted once.

285

286

287

If only a survey is conducted not for all of the units in the administrative source (e.g. conducting a survey only for larger enterprises), this indicator should be calculated only for the relevant subset.

288

289

290

Linking errors should be detected and resolved before this indicator is calculated.

291

292

293

If there are few common units due to the design of the statistical output (e.g. a combination of survey and administrative data), this should be explained.

294

)))

295

|(% style="width:202px" %)(((

References:

)))|(% style="width:1528px" %)ESSNet use of administrative and accounts data in business statistics, WP6 Quality Indicators when using Administrative Data in Statistical Operations, November 2010.

**~ **

|(((

**~ **

**Name:**

**~ **

)))|(((

**~ **

**A4. Unit non-response - rate **

**~ **

)))

|Definition:|The ratio of the number of units with no information or not usable information (non-response, etc.) to the total number of in-scope (eligible) units. The ratio can be weighted or un-weighted.

318

|Applicability:|(((

319

The unit non-response rate is applicable:

320

321

* to all statistical processes (including direct data collection and administrative data; the terminology varies between statistical processes, but the basic principle is the same; it may in some cases be difficult to distinguish between unit non-response and undercoverage, especially for administrative data sources (in the former case units are known to exist but data are missing, e.g. due to very late reporting or so low quality that the information is useless – in the latter case the units are not known at the frame construction);

322

* to users and producers, with different level of details given.

)))

|Calculation formulae:|(((

327

The non-response rate has three main versions written in one and the same formula as the weighted unit non-response rate //NRr,,w,,//

328

329

//NRrw //=1− //wj //+∑∑//NR//R//wwjj //+α∑//Q wj//

∑//,,R,,//

**~ **

R the set of responding eligible units

336

337

NR the set of non-responding eligible units

338

339

Q the set of selected units with unknown eligibility (un-resolved selected units)

340

341

//w,,j,,// weight of unit //j//, described below

342

343

α The estimated proportion of cases of unknown eligibility that are actually eligible. It should be set equal 1 unless there is strong evidence at country level for assuming otherwise.

**~ **

The three main cases are:

348

349

Un-weighted rate: //w,,j ,,//=1

350

351

Design-weighted rate: //w,,j ,,//=//d ,,j ,,//where basically //d ,,j ,,//=1π//,,j ,,//, meaning that the design weight is the inverse of the selection probability.

352

353

Size-weighted rate: //w,,j ,,//=//d ,,j ,,x ,,j ,,//where //x,,j,,// is the value of a variable X.

354

355

356

The variable X, which is chosen subjectively, shows the size or importance of the units. The value should be known for all units. X is auxiliary information, often available in the frame. Examples are turnover for businesses and population for municipalities.

357

358

359

For the unit non-response rate all three alternatives are frequently used, see Interpretation below.

360

361

362

The design-weighted rate is mainly used for samples surveys, but it may apply also, e.g., for price index processes or processes with multiple data sources. The weight //d ,,j,,// is a “raising” factor when unit //j// represents more

)))

| |(((

than itself. Otherwise //d ,,j ,,//is equal to one. Hence, when dealing with administrative sources the un-weighted and the size-weighted versions of the rate are normally the interesting one.

)))

|Target value:|(((

The target value for this indicator is as close to 0 as possible.

)))

|Aggregation levels and principles: |(((

378

* MS: the indicator is to be calculated at statistical process level

379

* EU: rather than aggregating this indicator over countries or to calculate a mean, lower and higher unit non-response rates can be shown by Eurostat for a given variable at statistical process level.

)))

|(((

Interpretation:

)))|(((

Unit non-response occurs when no data about an eligible unit are recorded (or data are so few or so low in quality that they are deleted).

387

388

389

The un-weighted unit non-response rate shows the result of the data collection in the sample (the units included), rather than an indirect measure of the potential bias associated with non-response. If α=1, it assumes that all the units with unknown eligibility are eligible, so it provides a conservative estimate of A4 with regard to other choices of α .

390

391

392

The design-weighted unit non-response rate shows how well the data collection worked considering the population of interest.

393

394

395

The size-weighted unit non-response rate would represent an indirect indicator of potential bias caused by non-response prior to any calibration adjustments.

396

397

398

Note overall that the bias may be low even if the non-response rate is high, depending on the pattern of the non-responses and the possibilities to adjust successfully for non-response.

399

)))

400

|Specific guidance:|(((

401

Non-response is a source of errors in survey statistics mainly for two reasons:

402

403

- it reduces the number of responses and therefore the precision of the estimates (this may be particularly relevant when samples are used); - it might introduce bias. The size of bias depends on the non-response rate but also on the differences between the respondents and the non- respondents with respect to the variable of interest; furthermore on the strength of auxiliary information.

)))

|(((

References:

)))|(((

* ESS Handbook for Quality Reports – 2009 Edition (Eurostat).

411

* ESS Standard for Quality Reports – 2009 Edition (Eurostat).

412

* U.S. Census Bureau Statistical Quality Standards, Reissued 2010.

413

* Trépanier, Julien, and Kovar. “Reporting Response Rates when Survey and Administrative Data are Combined.” //Proceedings of the Federal Committee on Statistical Methodology Research Conference 2005.//

)))

|(((

**~ **

**Name:**

**~ **

)))|(((

**~ **

**A5. Item non-response - rate**

)))

|(((

Definition:

)))|The item non-response rate for a given variable is defined as the (weighted) ratio between in-scope units that have not responded and in-scope units that are required to respond to the particular item.

434

|Applicability :|(((

435

The item non-response rate is applicable:

436

437

* to all statistical processes (including direct data collection and administrative data; the terminology varies between statistical processes, but the basic principle is the same;

438

* to users and producers, for selected key variables or for variables with very high item non-response rates, and with different level of details given.

If the survey has more than one unit type or data sources, a rate may be calculated for each type or data source.

443

444

If there is more than one frame, or if rates vary strongly between subpopulations, rates should (also) be calculated for separate sub-populations (or strata, groups).

445

)))

446

|(((

447

Calculation formulae:

)))|(((

The item non-response rate has three main versions written in one and the same formula as the weighted item non-response rate //NR,,Y ,,r,,w,,// ,which is calculated as follows:

∑R//Y wj //

//NRY rwREQ //=1− R//Y wj //+∑//N//R//Y wj//

∑

//R,,Y,,// the set of eligible units responding to item //Y// (as required)

468

469

//NR,,Y,,// the set of eligible units not responding to item //Y// although this item is required. – The denominator corresponds to the set of units for which item //Y// is required. (Other units do not get this item because their answers to earlier items gave them a skip past this item; they were “filtered away”.)

470

471

//w,,j,,// weight of unit //j//, described below

472

473

474

The three main cases are:

475

476

Un-weighted rate: //w,,j ,,//=1

477

478

Design-weighted rate: //w,,j ,,//=//d ,,j ,,//where basically //d ,,j ,,//=1π//,,j ,,//, meaning that the design weight is the inverse of the selection probability.

479

480

Size-weighted rate: //w,,j ,,//=//d ,,j ,,x ,,j ,,//where //x,,j,,// is the value of a variable X.

481

482

483

The variable X, which is chosen subjectively, shows the size or importance of the units. The value should be known for all units. X is auxiliary information, often available in the frame. Examples are turnover for businesses and population for municipalities.

484

485

486

The design weight may in the computation of final estimates be modified to correct for non-response, under-coverage etc. This design weight should be used if the rates are to apply to final estimates.

487

488

489

The design-weighted rate is mainly used for samples surveys, but it may apply also, e.g., for price index processes or processes with multiple data sources.

490

491

The weight //d ,,j,,// is a “raising” factor when unit //j// represents more than itself.

492

)))

493

| |(((

494

Otherwise //d ,,j ,,//is equal to one. Hence, when dealing with administrative sources the un-weighted and the size-weighted versions of the rate are normally the interesting one.

)))

|(((

Target value:

)))|(((

The target value for this indicator is as close to 0 as possible.

)))

|(((

Aggregation levels and principles:

)))|(((

* MS: the indicator is to be calculated at statistical process level for key variables and variables with low rates.

517

* EU: rather than to aggregate this indicator over countries or to calculate a mean, lower and higher item non-response rates can be shown by Eurostat for a given variable at statistical process level.

)))

|(((

Interpretation:

)))|(((

A high item non-response rate indicates difficulties in providing information,

525

526

e.g. a sensitive question or unclear wording for social statistics or information not available in the accounting system for business statistics.

527

528

529

The indicator is a proxy indicator of the possible bias caused by item nonresponse. In spite of the low item response rate, the bias may still be low, depending on causes, response pattern, and auxiliary information to adjust/impute.

)))

|(((

Specific guidance

)))|The un-weighted** **item non-response rate should be calculated before the data editing and imputation in order to measure the impact of item non-response for the key variables.

536

|References|(((

537

* ESS Handbook for Quality Reports – 2009 Edition (Eurostat).

538

* ESS Standard for Quality Reports – 2009 Edition (Eurostat).

539

* U.S. Census Bureau Statistical Quality Standards, Reissued 2010.

540

* Trépanier, Julien, and Kovar. “Reporting Response Rates when Survey and Administrative Data are Combined.” //Proceedings of the Federal//

541

542

//Committee on Statistical Methodology Research Conference 2005.//

)))

**~ **

|(((

**~ **

**Name:**

**~ **

)))|(% colspan="4" %)(((

555

**~ **

556

557

**A6. Data revision - average size**

**~ **

)))

|(((

Definition:

)))|(% colspan="4" %)(((

566

The average over a time period of the revisions of a key indicator. The “revision” is defined as the difference between a later and an earlier estimate of the key item.

567

568

569

The number of releases (//K//) of a key item (number of times it is published) is fixed and specified in the revision policy. Usually, revisions involve a time series: when publishing an estimate of the key indicator referring to time //t//, it is a common practice to release the revised version of the indicator referring to a set of previous periods.

570

571

572

In the following table this situation is illustrated for a revision analysis where the policy has K revisions and //n// reference periods are included in the analysis.

Reference periods

Releases 1 … //t// … //n//

579

)))

580

| | 1^^st^^ release //X//,,11,, …|//X//1//t//|…|//X//1//n//

581

| | … … …|…|…|…

582

| | //k//th release //X ,,k,,//,,1,, …|//X,,kt,,//|…|//X ,,kn,,//

583

| | … … …|…|…|…

584

| |(% colspan="4" %)(((

585

//K//th and final release //X,,K,,//,,1,, … //X ,,Kt,,// … //X ,,Kn,,//

586

587

588

Different indicators can be derived by different ways of averaging the revisions for a time series (revisions can be averaged in absolute value or not, the indicator can be absolute or relative).

589

)))

590

|Applicability:|(% colspan="4" %)(((

591

The average size of revisions is applicable:

592

593

* to statistical processes where initial and subsequent (revised) estimates are published according to a revision policy (quarterly national accounts, short term statistics);

594

* to users and producers, with different level of details given.

)))

|(((

Calculation formulae:

)))|(% colspan="4" %)(((

611

With the reference to the two-dimensional situation described in the definition there are several strategies to compute indicators: with or without sign, absolute or relative values, for specific pairs of revisions over time or over a sequence of revisions etc. The main suggestion here is to consider an average for a given revision step over a set of //n// reference periods.

612

613

614

**MAR (Mean Absolute Revision):**

**~ **

//MAR//=1∑//tn//=1//X Lt //−//X Pt//** **//n//

where:

//X ,,Lt,,// “later” estimate, //L//^^th ^^release of the item at time reference //t//;

624

625

//X ,,Pt,,// “earlier” estimate, //P//^^th ^^ release of the item at time reference

//t//;

)))

| |(% colspan="3" %)(((

635

//n// = No. of estimates (reference periods) in the time series taken into account. //n//≥ 20 is recommended for quarterly estimates while //n//≥ 30 is recommended for monthly estimates. The indicator is not recommended for annual estimates.

636

637

638

MAR provides and idea of the average size of a given revision step.

639

640

641

This indicator can alternatively be expressed in relative terms:

642

643

644

**RMAR: Relative Mean Absolute Revision**

**~ **

//RMAR//=∑ 

//,,t,,n//=,,1 ,,^^ ^^//X LtX//−//,,Lt,,X Pt //∑//,,t,,nX//=,,1,,//LtX ,,Lt ,,//,,,,,,,,^^= ^^∑//tn//∑=1 //X,,t,,n//=,,1,,//LtX//−//,,Lt,,X Pt//





In addition – at the level of Eurostat – and where the sign is interesting, there is the mean revision from Release //P// to Release //L// over the //n// reference periods:

658

659

660

[[image:file:///C:/Users/axyli/AppData/Local/Temp/msohtmlclip1/01/clip_image014.gif]]**MR (Mean Revision):**

//MR //** **

Different combinations of //P// and //L// can be considered. For instance OECD suggests to compare the following releases:

666

667

668

**Monthly data** **Quarterly data**

669

)))

670

| |**//Release L//**|**//Release P//**|**//Release L//**// **Release P**//

671

| |After 2 Months|First|After 5 Months First

672

| |After 3 Months|First|After 1 Year After 5 Months

673

| |After 3 Months|After 2 Months|After 1 Year First

674

| |After 1 Year|First|After 2 Years First

675

| |After 2 Years|First|Latest available First

676

| |Latest available|First|After 2 Years After 1 Year

| |(((

After 2 Years

)))|After 1 Year|

|Target value:|-| |

|Aggregation levels and principles: |(% colspan="3" %)(((

685

* MS: the indicator is to be calculated at statistical process level.

686

* EU: the indicator is calculated on the revisions made on the EU aggregate/indicator.

)))

|(((

Interpretation:

)))|(% colspan="3" %)(((

695

**MAR** provides an idea of the average size of a given revision step for a key item step over the time.

696

697

698

The **RMAR** indicator normalises the MAR measure using the final estimates. It facilitates international comparisons and comparisons over time periods. When estimating growth rates this measure corrects the MAR for the size of growth and, so, takes account of the fact that revisions might be expected to be larger in periods of high growth than in periods of slow growth.

699

700

701

Both MAR and RMAR indicators provide information on the stability of

702

)))

703

| |(% colspan="3" %)(((

704

the estimates. They do not provide information on the direction of revisions, since the absolute values of revisions are considered. Such information is provided by **MR**. A positive sign means upwards revision (underestimation), and a negative sign indicates overestimation in the first case. MR sometimes is referred to as ‘average bias’, but a nonzero MR is not sufficient to establish whether the size of revisions is systematically biased in a given direction. To ascertain the presence of bias it has to be assessed whether MR is statistically different from zero (given no changes in definitions, methodologies, etc.).

)))

|Specific guidance:|(% colspan="3" %)Either MAR or RMAR should be presented under this indicator. In addition MR could also be calculated at EU-level.

|(((

References:

)))|(% colspan="3" %)§ OECD: [[http:~~/~~/stats.oecd.org/mei/default.asp?rev=1>>url:http://stats.oecd.org/mei/default.asp?rev=1]][[url:http://stats.oecd.org/mei/default.asp?rev=1]]

**~ **

**~ **

|(((

**~ **

**Name:**

**~ **

)))|(((

**~ **

**A7. Imputation - rate **

**~ **

)))

|(((

Definition:

)))|(((

Imputation is the process used to assign replacement values for missing, invalid or inconsistent data that have failed edits. This includes automatic and manual imputations; it excludes follow-up with respondents and the corresponding corrections (if applicable). Thus, imputation as defined above occurs after data collection, no matter from which source or mix of sources the data have been obtained, including administrative data. After imputation, the data file should normally only contain plausible and internally consistent data records.

739

740

741

This indicator is influenced both by the item non-response and the editing process. It measures both the relative amount of imputed values and the relative influence on the final estimates from the imputation procedures.

742

743

744

The un-weighted imputation rate for a variable is the ratio of the number of imputed values to the total number of values requested for the variable.

745

746

747

The weighted rate shows the relative contribution to a statistic from imputed values; typically a total for a quantitative variable. For a qualitative variable, the relative contribution is based on the number of units with an imputed value for the qualitative item.

748

)))

749

|Applicability :|(((

750

The imputation rate is applicable

751

752

− to all statistical processes (with micro data; hence, e.g., direct data collection and administrative data);

− to producers.

)))

|(((

Calculation formulae:

)))|(((

1. Un-weighted on the statistical process and variable level:

765

766

[[image:1768512459010-647.jpeg]]

767

768

//nAV //and //nOV //are the numbers of assigned values and observed values, respectively.

769

770

771

1. The contribution of imputed values is calculated in an analogous way, but weighted and with variable values.

[[image:file:///C:/Users/axyli/AppData/Local/Temp/msohtmlclip1/01/clip_image019.jpg]]Here, //AV //and //OV //are the sets of units with assigned and observed values, respectively. In addition, //j w //is the weight (normally the weight used for estimation takes into account the sample design as well as adjustment for unit non response and final calibration) of the unit j. In case of a qualitative variable, the value of y equals 1.

777

778

779

In case of a qualitative variable, the value of //y ,,j ,,//=1 if the //j//th unit shows a given characteristic and 0 otherwise.

**~ **

When imputation is counted the following changes have to be considered:

784

)))

785

| |(((

786

1. imputation of a (non-blank) value for a missing item

787

1. imputation of a (non-blank) value to correct an observed invalid

788

789

(non-blank) value iii. imputation of a blank value to correct an undue invalid (nonblank) response.

790

791

792

The two main cases for the imputation rate are:

793

794

795

Design-weighted rate: //w,,j ,,//=//d ,,j ,,//where basically//d ,,j ,,//=1π//,,j ,,//, meaning that the design weight is the inverse of the selection probability.

796

797

Size-weighted rate: //w,,j ,,//=//d ,,j ,,x ,,j ,,//where //x,,j,,// is the value of a variable X

798

)))

799

|Target value:|A value equal or close to zero is desirable; imputation indicates missing and invalid values.

800

|Aggregation levels and principles:|(((

801

* MS: The calculation is done for key variables at statistical process level.

802

* EU: Aggregations can be made at the level of EU on the basis of harmonised statistical production processes across Member States, considering this as a single statistical process. Alternatively, Eurostat can report lower and higher imputation rates for a given variable at statistical process level.

)))

|(((

Interpretation:

)))|(((

The un-weighted rate shows, for a particular variable, the proportion of units for which a value has been imputed due to the original value being a missing, implausible, or inconsistent value in comparison with the number of units with a value for this variable. Units with imputation of a blank value to correct an undue invalid (non-blank) response (type iii) have to be included in both numerator and denominator.

810

811

The weighted rate shows, for a particular variable, the relative contribution of imputed values to the estimate of this item/variable. Obviously this weighted indicator is meaningful when the objective of a survey is that of estimating the total amount or the average of a variable. When the objective of the estimation is that of estimating complex indices, the weighted indicator is not meaningful.

812

)))

813

|Specific guidance:|-

814

|References:|(((

815

* ESS Handbook for Quality Reports – 2009 Edition (Eurostat).

816

* ESS Standard for Quality Reports – 2009 Edition (Eurostat).

817

* Statistics Canada Quality Guidelines, Fifth Edition – October 2009

)))

|(((

**~ **

**Name:**

)))|(((

**~ **

**TP1. Time lag - first results**

**~ **

)))

|(((

Definition:

)))|(((

//General definition~://

841

842

The timeliness** **of statistical outputs is the length of time between the end of the event or phenomenon they describe and their availability.

843

844

845

//Specific definition~://

846

847

The number of days (or weeks or months) from the last day of the reference period to the day of publication of first results.

848

)))

849

|Applicability :|(((

850

This indicator is applicable:

851

852

- to all statistical processes with **preliminary data releases**; - to producers.

853

854

855

T1 is **not** applicable for statistical processes with only one, directly final, set of results/statistics – then only T2 is used.

856

)))

857

|(((

858

Calculation formulae:

)))|(((

//T//1 =//d frst //−//drefp//

867

868

869

//d,,frst,,// … Release date of first results;

870

871

//d,,refp,,//… Last day (date) of the reference period of the statistics

872

873

874

//Measurement units//: datum format (calendar days; if the number of days is large, it may be converted into weeks or months )

875

876

Instead of a period, the reference can also be a time point.

877

)))

878

|Target value:|The target values usually are fixed by legislation or gentlemen's agreement. Nevertheless, smaller values denote higher timeliness.

879

|Aggregation levels and principles: |(((

880

The calculation is done, for a meaningful choice, at subject matter domain level. It could refer to the current production round or be an average over a time period. Aggregations are possible at EU and domain (e.g. social statistics, business statistics) level.

)))

|(((

Interpretation:

)))|(((

This indicator quantifies the gap between the release date of first results and the date of reference for the data.

890

891

892

Comparisons could be made among statistical processes with the same periodicity.

)))

|(((

Specific guidance

)))|(((

The reasons for possible long production times should be explained and efforts to improve the situation should be described.

900

901

902

For annual statistics or where timeliness is measured in years rather than in days a sentence stating timeliness would be sufficient.

903

)))

904

|References:|§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat). § ESS Standard for Quality Reports – 2009 Edition (Eurostat).

**~ **

**~ **

|(((

**~ **

**Name:**

**~ **

)))|(((

**~ **

**TP2. Time lag - final results**

)))

|(((

Definition:

)))|(((

//General definition~://

928

929

The timeliness** **of statistical outputs is the length of time between the end of the event or phenomenon they describe and their availability.

930

931

932

//Specific definition~://

933

934

The number of days (or weeks or months) from the last day of the reference period to the day of publication of complete and final results.

935

)))

936

|Applicability :|(((

937

This indicator is applicable:

938

939

* to all statistical processes;

940

* to users and producers, with different level of details given.

941

)))

942

|(((

943

Calculation formulae:

)))|(((

//T//2 =//d finl //−//drefp//

950

951

//d,,finl,,// … Release date of final results ;

952

953

//d,,refp,,//… Last day (date) of the reference period of the statistics

954

955

956

//Measurement units//: datum format (calendar days; if the number of days is large, it may be converted into weeks or months)

957

958

Instead of a period, the reference can also be a time point.

959

)))

960

|Target value:|The target values usually are fixed by legislation or gentlemen's agreement. Nevertheless, smaller values denote higher timeliness.

961

|Aggregation levels and principles: |The calculation is done, for a meaningful choice, at subject matter domain level. It could refer to the current production round or be an average over a time period. Aggregations are possible at EU and domain (e.g. social statistics, business statistics) level.

|(((

Interpretation:

)))|(((

This indicator quantifies the gap between the release date of the final results and the end of the reference period.

968

969

970

Comparisons could be made among statistical processes with the same periodicity

971

)))

972

|Specific guidance|(((

973

The reasons for possible long production times should be explained and efforts to improve the situation should be described.

974

975

976

To be further defined by subject matter domain, taking the revisions’ policy into account, what could be considered by "final results".

977

978

979

For annual statistics or where timeliness is measured in years rather than in days a sentence stating timeliness would be sufficient.

980

)))

981

|References:|§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat). § ESS Standard for Quality Reports – 2009 Edition (Eurostat).

**~ **

|(((

**~ **

**Name:**

**~ **

)))|(((

**~ **

**TP3. Punctuality - delivery and publication**

)))

|(((

Definition:

)))|Punctuality is the time lag between the delivery/release date of data and the target date for delivery/release as agreed for delivery or announced in an official release calendar, laid down by Regulations or previously agreed among partners.

1002

|Applicability :|(((

1003

The punctuality of publication is applicable:

1004

1005

* to all statistical processes with fixed/pre-announced release dates,

1006

* to users and producers, with different aspects and calculation formulae.

Computed only by Eurostat but recommended also for inclusion in national quality reports.

1011

)))

1012

|(((

1013

Calculation formulae:

)))|(((

**For producers:**

**Punctuality of data delivery P3 **

1021

1022

//P//3 = //dact //− //dsch//

1023

1024

d,,act,, .. Actual date of the effective provision of the statistics d,,sch,,…Scheduled date of the effective provision of the statistics

// //

//Measurement units//: datum format (calendar days)

**~ **

**For users: **

**~ **

**Rate of punctuality of data publication** **P3,,R,,** Relevant for a group of statistics/results

1037

1038

P3,,R,, is the rate of datasets that have met the release calendar date in a group of datasets. m

1039

1040

//P//3//R //= mpc +pcmup

1041

1042

m,,pc,,… Number of statistics/results that have been published on the date announced in the calendar or have been released earlier (punctual) m,,up,,… Number of statistics/results that have not met the date announced in the calendar (unpunctual)

1043

)))

1044

|Target value:|(((

1045

The target value for P3 is 0 meaning that there is no delay on the delivery/transmission of data.

1046

1047

1048

For P3,,R,, the target value is 1 meaning that 100% of the items were published on the pre-fixed calendar date.

)))

|Aggregation levels and principles: |(((

1053

There are two aspects:

1054

1055

- National data deliveries to Eurostat (producer-oriented), - Publication/release by Eurostat (user oriented),

1056

1057

1058

The calculation is done at statistical process level. Aggregations are to be made at EU-level over countries and over domains.

)))

|(((

Interpretation:

)))|(((

The indicator **Punctuality of data delivery** quantifies the difference (time lag) between actual and target date.

1066

1067

1068

This should be interpreted according to the periodicity of the statistical process.

)))

| |(((

The indicator **Rate of punctuality** of release (P3,,R,,),, ,,evaluates the punctuality of release of a group of particular datasets.

)))

|(((

Specific guidance

)))|(((

**For producers:**

For compliance monitoring purposes Eurostat domain managers should monitor this indicator for individual countries. This information can be pre-filled by Eurostat as it is known when data are received from the MS. Formula P3 should be applied in this case.

1088

1089

1090

This indicator can be presented in table format for the different MS.

1091

1092

1093

The reasons for late or non-punctual delivery should be stated along with their effect on the statistical product, meaning that because of late data deliveries the quality assurance procedures for the whole product/series might not be completed.

**For users:**

Enough to compile this indicator as an aggregate at ESTAT level. Formula P3,,R,, should be applied in this case.

1099

1100

1101

Some explanations should be given to users concerning non-punctual publication.

)))

|References:|§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat). § ESS Standard for Quality Reports – 2009 Edition (Eurostat).

|(((

**~ **

**Name:**

**~ **

)))|(((

**~ **

**CC1. Asymmetry for mirror flows statistics - coefficient**

)))

|(((

Definition:

)))|(((

//General definition~://

1129

1130

Discrepancies between data related to flows, e.g. for pairs of countries.

1131

1132

1133

//Specific definition (a few versions are provided) Bilateral mirror statistics~://

1134

1135

The difference or the absolute difference of inbound and outbound flows between a pair of countries divided by the average of these two values.

//Comment//

Outbound and inbound flows should be considered to be any kind of flows specific to each subject matter domain (amounts of products traded, number of people visiting a country for tourism purposes, etc.)

1141

)))

1142

|Applicability :|(((

1143

The asymmetries for statistics mirror flows is applicable:

1144

1145

- to domains in which mirror statistics (flows concerning trade, migration, tourism statistics, FATS, balance of payment etc) are available - to producers.

1146

1147

1148

Computed by Eurostat (pre-filled in quality report)

1149

)))

1150

|(((

1151

Calculation formulae:

)))|(((

**Bilateral mirror statistics:**

1171

1172

For each pair of countries, suppose:

1. – Country A

1. – Country B

//CC//2//AB //= //OFOFABAB //+− //mIFmIFABAB//

2

//CC//2//BA //= //OFOFBABA //+− //mIFmIFBABA//

2

A joint measure can be obtained from the two differences in relation to an average flow (several possibilities, one is given below):

1188

1189

//CC//2//AB //= //OFOFABAB //−+//mIFmIFABAB //++ //OFOFBABA //+−//mIFmIFBABA//

2 2

OF,,AB,, - outbound flow going from country A to country B m IF,,AB – ,,mirror inbound flow

1194

1195

IF,,BA,, - mirror inbound flow to country B from country A m OF,,AB - ,,mirror outbound flow

1196

1197

1198

**Multilateral mirror statistics: **

1199

1200

OF,,AiOj,, - outbound flow going from country A,,i,, to any other country O,,i,, mIF,,AiOj – ,,mirror inbound flow

Ai – country Ai

Oj – Another country Oj

1205

1206

K – the number of countries country A,,i,, may have contacts with C – group of countries EU + EFTA

)))

| |(((

//C K//

∑∑ //OFAiOj //− //mIFAiOj//

|(((

~=

)))

|(((

~=

)))

|(((

~=

)))

|(((

//C//

)))

|(((

//K//

)))

|(((

//i//

)))

|(((

//j//

)))

|(((

//C//

)))

|(((

1

)))

|(((

1

)))

|(((

2

)))

//CC OFAiOj //+ //mIFAiOj//

1264

^^∑∑^^= = 2 //i //1 //j //1

1265

)))

1266

|Target value:|The value of this indicator should be as close to zero as possible, since – at least in theory – the value of inbound and outbound flows between pairs of countries should match.

1267

|Aggregation levels and principles:|(((

1268

* MS: The calculation is done for key variables/sub-series to be selected by the Eurostat domain manager.

1269

* EU: Aggregations are possible at EU-level (see multilateral mirror statistics formulae). Alternatively, where e.g. not all information is available, lower and higher values of bilateral mirror statistics can be reported to indicate the range.

)))

|(((

Interpretation:

)))|(((

In domains where mirror statistics are available it is possible to assess geographical comparability measuring the discrepancies between inbound and outbound flows for pairs of countries.

1277

1278

1279

Mirror data can help checking the consistency of data reporting, of data, of the reporting process and the definitions used. Finally, they can help to estimate missing data. For the users the asymmetries indicators provide some indication of overall data credibility.

1280

1281

1282

There is perfect symmetry (outbound flows are equal to mirror inbound flows) when the coefficient is equal to zero. The more the coefficient diverges from zero, the more the asymmetry between outbound flows and mirror inbound flows becomes important.

)))

|(((

Specific guidance:

)))|(((

CC2A,,B,, and CC2B,,A ,,indicators can be negative or positive. Indicator CC2AB is always non-negative.

1290

1291

1292

Outbound flows from Member State A to Member State B, as reported by A, should be almost equal to inbound flows into B coming from A, as reported by B. Because some domains use a different valuation principle, inbound flows can be slightly different from outbound flows. Therefore comparisons dealing with mirror statistics have to be made cautiously and should take into account the existence of these discrepancies.

1293

1294

1295

The asymmetry coefficient CC2AB is useful because it can be monitored over time.

1296

1297

1298

Indicators CC2A,,B,, and CC2B,,A,, can be either positive or negative and can be used to estimate if a country is globally declaring higher or lower level of flows compared with the mirror flows declared by its partner countries. Indicators CC2A,,B,, and CC2B,,A,, should be presented in a table (example foreign trade statistics).

1299

)))

1300

|References:|(((

1301

* ESS Handbook for Quality Reports – 2009 Edition (Eurostat).

1302

* ESS Standard for Quality Reports – 2009 Edition (Eurostat).

1303

* International trade in services statistics - Monitoring progress on implementation of the Manual and assessing data quality – OECD Eurostat Trade in services experts meeting 2005.

)))

|(((

**~ **

**Name:**

**~ **

)))|(((

**~ **

**CC2. Length of comparable time series **

**~ **

)))

|(((

Definition:

)))|(((

Number of reference periods in time series from last break.

//Comment//

Breaks in statistical time series may occur when there is a change in the definition of the parameter to be estimated (e.g. variable or population) or the methodology used for the estimation. Sometimes a break can be prevented, e.g. by linking.

1333

)))

1334

|Applicability:|(((

1335

The length of comparable time series is applicable:

1336

1337

* to all statistical processes producing time-series;

1338

* to users and producers, with different level of details given.

Computed only by Eurostat but recommended also for inclusion in national quality reports.

)))

|(((

Calculation formula:

)))|(((

The reference periods are numbered.

1350

1351

1352

//CC//1 =//Jlast //−//J first //+1

1353

1354

//J,,last,,// …number of the last reference period with disseminated statistics.

1355

1356

//J,,first,,//,, ,,…number of the first reference period with comparable statistics.

1357

)))

1358

|Target value:|A long time series may seem desirable, but it may be motivated to make changes, e.g. since reality motivates new concepts or to achieve coherence with other statistics.

1359

|Aggregation levels and principles:|(((

1360

The calculation is done at statistical process level. Aggregations are possible at MS, EU, and Domain (e.g. social statistics, business statistics) level.

1361

1362

1363

The indicator for the EU or domain level should be calculated by Eurostat considering the time series of the EU aggregate.

)))

|(((

Interpretation:

)))|If there has not been any break, the indicator is equal to the number of the time points in the time series.

1370

|Specific guidance:|(((

1371

The length of the series with comparable statistics is expressed as the number of time periods (points) in this series. It is counted from the first time period with statistics after the break onwards. The result does not depend on the length of the reference period.

1372

1373

1374

Only applicable for the statistical data disseminated in the sequence of regular time periods (points).

1375

1376

1377

If more than one series exist for one statistical process the domain manager should select the appropriate ones for calculation.

)))

|(((

References:

)))|§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat). § ESS Standard for Quality Reports – 2009 Edition (Eurostat).

|(((

**~ **

**Name:**

**~ **

)))|**AC1. Data tables – consultations [[(% class="wikiinternallink" %)^^**~[1~]**^^>>path:#_ftn1]](%%) **

|(((

Definition:

)))|(((

Number of consultations of data tables within a statistical domain for a given time period.

1399

1400

By "number of consultations" it is meant number of data tables views, where multiples views in a single session count only once.

1401

1402

Some information available through the monthly Monitoring report on

1403

1404

Eurostat Electronic Dissemination and its excel files with detailed figures.

1405

)))

1406

|Applicability:|(((

1407

The number of consultations of data tables is applicable:

1408

1409

* to all statistical processes using on-line data tables for dissemination of statistics;

1410

* to producers (Eurostat domain managers).

1411

1412

Computed only by Eurostat but recommended also for inclusion in national quality reports.

1413

)))

1414

|(((

1415

Calculation formulae:

)))|(((

AC2 = #//CONS//

where #//CONS//,, ,,denotes the absolute number of elements in the set CONS (this is also called cardinality of the set). In this case CONS represents the consultations of a data table for specific subject-matter domain. The frequency of collection of the figures for this indicator should be monthly.

1427

1428

Remark: internal page views will be excluded.

1429

)))

1430

|Target value:|There is no immediate interpretation of low and high values of this indicator, and there is no particular target.

1431

|Aggregation levels and principles: |(((

1432

The calculation is done at statistical process level. Aggregation is possible at the following level:

1433

1434

* Domains specific data tables.

1435

* Annual aggregation.

The principle is to calculate the number of consultations of data tables by subject matter.

1440

)))

1441

|Interpretation:|(((

1442

This indicator should be carefully analysed and combined with other information that will complement the analysis.

1443

1444

The indicator contributes to the assessment of users' demand of data (level of interest), for the assessment of the relevance of subject-matter domains.

1445

1446

1447

A ratio can be computed to give insight to the proportion of consultation of the ESMS files in question in comparison to the total number of consultations for all the domains.

1448

)))

1449

|Specific guidance: |(((

1450

An informative and straightforward way to represent the output of this indicator is by plotting the figures over time in a graph. In particular, it would be a graph where the horizontal (x) axis would represent months and the vertical (y) axis would represent the number of datasets consulted. It would be possible to monitor the interest of users for each dataset at the domain specific level.

1451

1452

1453

A graph of both the number of consultations of data tables and ESMS files (AC1), with the appropriate tuning, would be interesting to display.

)))

|(((

References:

)))|§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat). § ESS Standard for Quality Reports – 2009 Edition (Eurostat).

**~ **

|(((

**~ **

**Name:**

**~ **

)))|(((

**~ **

**AC2. Metadata - consultations [[(% class="wikiinternallink" %)^^**~[2~]**^^>>path:#_ftn2]](%%) **

)))

|(((

Definition:

)))|(((

Number of metadata consultations (ESMS) within a statistical domain for a given time period.

1481

1482

By "number of consultations" it is meant the number of times a metadata file is viewed.

1483

1484

1485

Some information is available through the monthly Monitoring report on

1486

1487

Eurostat Electronic Dissemination and its excel files with detailed figures.

1488

)))

1489

|Applicability|(((

1490

This indicator is applicable:

1491

1492

* to all statistical processes;

1493

* to producers (Eurostat domain managers).

1494

1495

Computed only by Eurostat.

1496

)))

1497

|(((

1498

Calculation formulae:

)))|(((

AC1 = #//ESMS//

where #//ESMS//,, ,,denotes the absolute number of elements in the set ESMS

1509

1510

(this is also called cardinality of the set). In this case the set ESMS represents the ESMS files consulted for a specific subject-matter domain for a given time period.

1511

1512

1513

Remark: internal page views will be excluded.

1514

)))

1515

|Target value:|There is no immediate interpretation of low and high values of this indicator, and there is no particular target.

1516

|Aggregation levels and principles: |(((

1517

The calculation is done at statistical process level. Aggregation is possible at the following levels:

1518

1519

* Domains specific ESMS files.

1520

* Annual aggregation.

The principle is to calculate the number of consultations of ESMS files by subject matter domains.

)))

|(((

Interpretation:

)))|(((

The indicator contributes to the assessment of users' demand of metadata (level of interest), for the assessment of the relevance of subject-matter domains.

1532

1533

1534

A ratio can be computed to give insight to the proportion of consultation of the ESMS files in question in comparison to the total number of consultations for all the domains.

)))

|(((

Specific guidance

)))|(((

An informative and straightforward way to represent the output of this indicator is by plotting the figures over time in a graph. In particular, it would be a graph where the horizontal (x) axis would represent months and the vertical (y) axis would represent the number of ESMS files consulted. It would be possible to monitor the interest of users for each ESMS file at the domain specific level.

1542

1543

1544

A graph of both the number of consultations of data tables (indicator AC2) and metadata (ESMS) files with a correspondence, with the appropriate tuning, would be interesting to display, over time.

1545

)))

1546

|References:|§ ESS Handbook for Quality Reports – 2009 Edition (Eurostat). § ESS Standard for Quality Reports – 2009 Edition (Eurostat).

|(((

**~ **

**Name:**

**~ **

)))|**AC3. Metadata completeness - rate**

|(((

Definition:

)))|The ratio of the number of metadata elements provided to the total number of metadata elements applicable.

1562

|Applicability:|(((

1563

The rate of completeness of metadata is applicable:

1564

1565

* to all statistical processes;

1566

* to producers (Eurostat domain managers).

Computed only by Eurostat** **but recommended also for inclusion in national quality reports.

1571

)))

1572

|(((

1573

Calculation formulae:

)))|(((

∑#//M,,L,,//

//AC//3//,,C ,,//=

∑#//L//

//L// in the denominator is the set of applicable metadata elements under consideration and //M ,,L,,// in the numerator is the subset of //L //of available metadata elements. The notation #//L //means the number of elements in the set //L// (the cardinality). Letter C in the left-hand side of the formula stands for both EU and EFTA countries.

1585

1586

1587

The set //L //is obtained by calculation for a group of metadata elements as explained below over a geographical entity (MS or the EU+EFTA), a statistical domain, etc.

1588

1589

1590

There are three groups of metadata, described below together with a categorisation using the current EURO-SDMX concepts (only the main concepts are included in the following breakdown).

1591

1592

1593

1. Metadata about statistical outputs; concepts 3, 4, 5, 8.1, 9, 10;

1594

1. Metadata about statistical processes; concepts 11, 20.1, 20.2, 20.3, 20.4, 20.5, 20.6;

1595

1. Metadata about quality: concepts 12-19

Computations are made separately for each of the three groups and for each of the combinations (group of metadata, EU level, etc.)

1600

)))

1601

|Target value:|The target value is 1 meaning that 100% of metadata is available from what is required/applicable to the statistical process, or aggregate, in question.

1602

|Aggregation levels and principles: |(((

1603

The calculation is done at the level of ESMS files.

1604

1605

Aggregations are possible at MS, EU, and Domain (e.g. social statistics, business statistics) level.

1606

1607

1608

The principle is to calculate the indicators as an un-weighted rate at the level of MS and EU for a statistical domain (social statistics, business statistics etc.).

)))

|(((

Interpretation:

)))|(((

Each indicator shows to what extent metadata of a specific type is available compared to what should be available.

1616

1617

1618

This indicator should be carefully analysed since this rate only reflects the existing amount of metadata for a certain statistical process but not the

1619

)))

1620

| |quality of that information.

1621

|Specific guidance:|(((

1622

All the information is to be retrieved from ESMS files.

1623

1624

In case the ESMS is empty for the different categories specified previously no calculation is needed but a descriptive text should be replaced.

1625

1626

1627

Concerning Eurostat, it is possible to have direct access to those files through Eurostat's website whereas for MS it will be possible to have access to ESMS files, in the near future, through the National RME tool.

1628

1629

1630

It should be taken into account what availability of metadata actually means.

)))

|(((

References:

)))|(((

* ESS Handbook for Quality Reports – 2009 Edition (Eurostat).

1638

* ESS Standard for Quality Reports – 2009 Edition (Eurostat).

1639

* Euro SDMX Metadata Structure, version March 2009.

)))

**~ **

----

[[~[1~]>>url:file:///C:/Users/axyli/Downloads/02-ESS-Quality-and-performance-Indicators-2014.pdf#_ftnref1]] The indicator must be collected in collaboration with Unit D4 - Dissemination.

1648

1649

[[~[2~]>>url:file:///C:/Users/axyli/Downloads/02-ESS-Quality-and-performance-Indicators-2014.pdf#_ftnref2]] The indicator must be collected in collaboration with Unit D4 - Dissemination.

1650

1651

Wiki source code of Guidelines for the Implementation of the ESS Quality and Performance Indicators (QPI)