8 SDMX Registry/Repository

Last modified by Artur on 2025/09/10 11:19

8.1 Scope of this Chapter

SDMX_2-1_User_Guide_draft_0-1_html_4c1c6e5debb65e85.png

In this guide it has been assumed that the structural metadata (e.g. DSD, MSD, Code List, Concept Scheme) is available to processing applications such as a data extract function of a database or a data visualization function of a website. For many SDMX processing applications these structural metadata must be made available “on demand”.

This Chapter explains the role and functions of an SDMX Registry and how the Registry is used both as a storage, retrieval, and maintenance repository for SDMX structural metadata, and as a mechanism for discovering data and reference metadata including the automated “pull” scenario of an automated data reporting system .

8.2 The Need for an SDMX Registry

First, it should be stated that it is not necessary to operate or have access to an SDMX Registry in order to use SDMX. Many use cases do, however, require access to structural metadata but these can be retrieved from any web service that can respond to a query for structural metadata (either the REST query or the “Structure Where” query) - this service does not need to be an SDMX Registry.

Whilst both an SDMX Registry and a non-Registry web service use exactly the same query mechanism, a Registry differs from non-Registry web service in three important areas:

The Registry offers a maintenance service for structural metadata – this is the “Repository” function of the Registry.
The Registry offers a data and metadata Registration service
The Registry offers a subscription and notification service

The functions of these services are covered in this Chapter.

Software tools that support SDMX Registry services are becoming more functional and have reached a point where it is extremely easy to install and operate an SDMX Registry. This can be used to support a community of users or be used solely within a single organization in order to have a central repository for SDMX structural metadata which can be maintained and retrieved.

8.3 Objective of a Registry

The objective of the SDMX registry/repository is, in broad terms, to allow organisations to publish statistical data and reference metadata in known formats such that interested third parties can discover these data and interpret them accurately and correctly. The mechanism for doing this is twofold:

To maintain and publish structural metadata that describes the structure and valid content of data and reference metadata sources such as databases, metadata repositories, data sets, metadata sets. This structural metadata enables software applications to understand and to interpret the data and reference metadata in these sources.
To enable applications, organisations, and individuals to share and to discover data and reference metadata. This facilitates data and reference metadata dissemination by implementing the data sharing vision of SDMX.

8.4 SDMX Registry/Repository Architecture

8.4.1 Architectural Schematic

The architecture of the SDMX registry/repository is a layered architecture that is founded by a structural metadata repository which supports a provisioning metadata repository which supports the registry services. These are all supported by the SDMX-ML schemas. Applications can be built on top of these services which support the reporting, storage, retrieval, and dissemination aspects of the statistical lifecycle as well as the maintenance of the structural metadata required to drive these applications.

Figure 19: Schematic of the Registry Content and Services

8.4.2 Structural Metadata Repository

The basic layer is that of a structural metadata service which supports the lifecycle of SDMX structural metadata artefacts such as Maintenance Agencies, Data Structure Definitions, Metadata Structure Definitions, Provision Agreements, Processes etc. This layer is supported by the Structure Maintenance and Query Service.

Note that the SDMX-ML Submit Structure Request message supports all of the SDMX structural artefacts. The only Registry artefacts that are not supported by the SDMX-ML Submit Structure Request are:

Registration of data and metadata sources
Subscription and Notification

Separate registry-based messages are defined to support these artefacts.

8.4.3 Provisioning Metadata Repository

The function of this repository is to support the registration of various types of data-store which model SDMX-conformant databases or files, and to link to these data and reference metadata sources. These links can be specified for a data provider, for a specific data or metadata flow. In the SDMX model this is called the Provision Agreement.

This layer is supported by the Data and Metadata Registration Service.

8.5 Services of an SDMX Registry/Repository

8.5.1 Structure Maintenance and Query Service

A Registry offers a web service for the maintenance of structural metadata. Whilst this service must comply with the SDMX-ML Registry Interface, it is common for such registries to also provide a graphical user interface for the submission and maintenance of SDMX structures.

SDMX_2-1_User_Guide_draft_0-1_html_1441cca2307b86ac.jpg

Figure 20: Example Graphical User Interface to an SDMX Registry

The Registry must also offer a query service for structural metadata.

8.5.2 Registration Service

The Registration service allows a data or metadata publisher (in SDMX terminology this is a Data Provider) to publish that data or metadata exist. As the Registration is connected to a Provision Agreement which itself is connected to a Data Provider and Data or Metadata Flow, the exact type of data or metadata registered is known, and also its structure (as the Data or Metadata Flow references the DSD or MSD).

The Registration must include a URL of either a file of SDMX-ML containing a data or metadata set, or a URL of a web service which can be queried to obtain the data.

SDMX_2-1_User_Guide_draft_0-1_html_541961e6279194a1.jpg

Figure 21: Schematic of Registered Data and Metadata Sources in the SDMX-IM

The Registration can be used both for data consumers to find data (this is described below in “Data and Metadata Discovery”) and for automated data reporting systems to harvest the data (the “pull” scenario).

8.5.3 Subscription and Notification Service

This subscription service allows a user to monitor Registry events. An event is something that has changed in the Registry. The subscription can be specified for specific objects such as a new registration of data or a change to a code list, or it can be specified for many objects such as any change to a specific type of structure or all registrations. A subscription can ask for notification by e-mail or notification to a URL (this will be a web service that can act upon the notification such as the automated update of structural metadata supporting a web dissemination service or the “pull” mechanism for data reporting).

When a monitored event is triggered the notification service notifies interested parties of the Registry content that has changed. The notification can be user (email) or application (URL).

8.6 Data and Metadata Discovery

8.6.1 Choreography

Discovering published data and reference metadata involves interaction with the Registry to:

optionally (but usually) browsing a subject matter domain category scheme to find Dataflow Definitions (and hence Data Structure Definitions) and Metadataflows (and hence Metadata Structure Definitions) which structure the type of data and/or reference metadata being sought
build a query, in terms of the selected Data Structure Definition or Metadata Structure Definition, which specifies what data are required and submitting this to a service that can query an SDMX Registry which will return a list of (URLs of) data and reference metadata files and databases which satisfy the query
processing the query result set and retrieving data and/or reference metadata from the supplied URLs

SDMX_2-1_User_Guide_draft_0-1_html_51f6d84398cd8ba3.jpg

Figure 22: Schematic of Data and Metadata Discovery and Query in the SDMX-IM

8.6.2 Example

A worked example of this scenario can be found in Annex 4 – Data Reader and Data Writer Functions.

8.7 Patterns for Data and Metadata Exchange

SDMX identifies three basic process patterns and two modes (push and pull) regarding the exchange of statistical data and metadata.

SDMX_2-1_User_Guide_draft_0-1_html_b247734721100020.png

SDMX_2-1_User_Guide_draft_0-1_html_2a1b6d63f93d6579.png

Figure 23: The Three Basic Patterns of Data and Reference Metadata Exchange

In the push mode the data/metadata are sent by the reporter to the collector.

In the pull mode the data consumer retrieves the data from the reporter's web server. The data may be made available for download in an SDMX-conformant file, or they may be retrieved from a database in response to an SDMX-conformant query, via a web service running on the reporter's server. In both cases, the data are made available to any organisation requiring them, in formats which ensure that data are consistently described by appropriate metadata, whose meaning is common to all parties in the exchange.

Data sharing using the pull mode is well adapted to the database-driven and data hub architectures. Both architectures provide the best benefits for the data producers because they can lessen the burden of publishing the data to multiple counterparties.

In both architectures, it is necessary to implement a notification mechanism, providing provisioning metadata in order to alert collecting organisations that data and metadata sets are made available by data providers, details about the online mechanism for getting data (for example, a queryable online database or a simple URL) and constraints regarding the allowable content of the data sets that will be provided.

At the heart of a data-sharing architecture there is often an SDMX Registry. This is a central location where structural and provisioning metadata can be found. In fact all the users/applications that need to access data can query the registry in order to know what data sets and metadata sets are available from data providers, and how to access them.

8.7.1 The Database-driven Architecture

The database-driven architecture is implemented by those collecting organisations that periodically need to fetch the data and to load them in their database. In general a batch process is used in order to automate the flow in which a whole or a partial dataset, including incremental updating, is used.

From the data management point of view, the pull approach within a database-driven architecture includes the following steps:

1) when new data are available, the data provider should:

a) create an SDMX-ML file containing the new data set

b) provide a web service (WS) that builds SDMX-ML messages upon request.

In both cases a provision agreement must be in place.

2) the data collector Pull Requestor is notified of the new registration. Note that it is common for a Registry to provide an RSS mechanism, though this is nit a formal part of the SDMX standard. The data collector then:

a) retrieves the SDMX-ML file from the specified URL, if it resides in a URL,
or

b) uses the Query Message or REST Query included in the feed to query the data provider web service, if the data are prepared by the data provider web service.

SDMX_2-1_User_Guide_draft_0-1_html_ba1159b7bd2e98c4.png

Figure 24: Database-driven Architecture

8.7.2 The Data Hub Architecture

The data hub architecture consists of an accessible system providing involved actors with the following services:

data providers can:

- notify the hub of new sets of data and corresponding structural metadata (measures, dimension, code lists, etc.);
- make data available directly from their systems through a querying system.

data users can:

- browse the hub to define a dataset of interest via the above structural metadata;
- retrieve the dataset from the data providers.

From the data management point of view, the hub is also based on agreed hypercubes or datasets, but here the hypercubes or datasets are not sent to the central system. Instead the following process operates:

a user identifies a dataset through the web interface of the central hub using the structural metadata, and requests it;
the central hub translates the user request in one or more queries and sends them to the related data providers’ systems;
data providers’ systems process the query and send the result to the central hub in a standard format;
the central hub puts together all the results originated by all interested data providers’ systems and presents them in a human readable format.

SDMX_2-1_User_Guide_draft_0-1_html_1c9984b4c53f1c50.png

Figure 25: Data Hub Architecture

8.7.3 Data Producer Architectures

In order to implement an SDMX IT architecture for data-sharing using the pull mode, several steps must be accomplished by a data producer and several questions must be considered:

which statistical domains are involved and where are the data currently stored?
which structural metadata are involved, and where are they currently stored?
what is the business process behind the data flow involved in the exercise?
will the SDMX data producer architecture be part of a data warehouse architecture, of a data hub architecture or of both?

Generally data and structural metadata that will be involved in the new SDMX information system are stored either in databases or in files. The two cases lead to different architectural approaches:

a. data and structural metadata are already stored in a database and it is necessary to build suitable software interfaces in order to make the system “SDMX-compliant”.
b. a separate special-purpose database is set up to store data and structural metadata. This database will be designed with the main aim of being part of an SDMXcompliant system. In this case the database can be modelled using the SDMX Information Model. An SDMX Registry is, of course, such a database.

Both cases make it possible to:

extract SDMX-ML files from the database that will be made available to be pulled by data collectors;
allow the database to be queried directly through a web service.

Whichever type of data producer architecture is involved, a mapping process between structural metadata may be necessary, as explained in 5.3.

8.8 Registry Interfaces

8.8.1 Registry Interfaces

The Registry Interfaces are:

Notify Registry Event
Submit Subscription Request
Submit Subscription Response
Submit Registration Request
Submit Registration Response
Query Registration Request
Query Registration Response
Query Subscription Request
Query Subscription Response
Submit Structure Request
Submit Structure Response

Applications communicate with the Registry using either the SDMX-ML Registry Interface message (which contains an XML element that defines the actual interface) or individual messages – one for each interface.

In more technical terms the Registry interfaces are invoked in one of two ways:

The interface is the name of the root node of the SDMX-ML document.
The interface is invoked as a child element of the RegistryInterface message where the RegistryInterface is the root node of the SDMX-ML document.

In addition to these interfaces the Registry must support a mechanism for querying for structural metadata.

All these interactions with the Registry – with the exception of Notify Registry Event – are designed in pairs. The first document (the one which invokes the SDMX Registry Interface) is a “Request” document. The message returned by the interface is a “Response” document.

It should be noted that all interactions are assumed to be synchronous, with the exception of Notify Registry Event. This document is sent by the SDMX-Registry to all subscribers whenever an even occurs to which any users have subscribed. Thus, it does not conform to the request-response pattern, because it is inherently asynchronous.