06-12-2024

Insight into the origin of data

Data is regarded more and more as important assets that organisations must pay explicit attention to. Attention to the quality of this data and the addition of metadata is also essential. For example, it is important to be able to understand the origin of data. It provides an important indication of the quality of data and thus largely determines the trust that users have in the data. The origin of data can be recorded in metadata at various levels. The basis is that the metadata of a dataset clearly shows which organisation created the data. For individual data, for example, relevant context can be which specific person created it, at what time, in response to which event and as part of which activity. The figure below provides an overview of the types of metadata that are relevant.

Information products are derived from source data by means of a transformation. Users want to know which sources and derivation rules have been used. The sources used must then be described in the metadata of the information product and the derivation rules must be retrievable by users. Users also prefer to have insight into the source data used. They want to have insight into this for individual data in a report or dashboard that they are looking at. Ideally, the entire chain is insightful, from collection, via processing and transformations to what is visible on a screen. Whether this is really necessary and feasible will have to be determined in an individual context. If the entire chain is not insightful for an information product, then derivation rules should at least be able to be requested in all links of the chain.

Fortunately, there are all kinds of standards that can be used to record metadata about origin. The PROV standard is specifically aimed at recording origin and is also used within the DCAT standard, for example. It makes it possible to record in detail who, what, when, which data was created and which source data and derivation rules were the basis for this. The PROV standard is a Linked Data vocabulary and therefore makes it possible to record this type of information about origin, also directly with the data itself. Another standard that makes it possible to record information about origin is the Dublin Core standard (also known as ISO 15836), which comes more from the corner of web content. This is also available as a Linked Data vocabulary, and offers a standard classification of types of parties involved in the creation of data. The MDTO standard, which is aimed at making information objects sustainably accessible, also has an extensive interpretation of data about origin.

In the context of the renewed eIDAS regulation, a European Digital Identity Wallet will become available. Here, users can place verifiable statements from all kinds of parties, which they can then provide to service providers. A verifiable statement is actually proof that data originates from a certain party. It therefore also says something about the origin of data. It will therefore become increasingly important for organizations to be able to provide verifiable statements. In a more general sense, providing more formal proof of the origin of data is valuable. It gives customers a certain confidence in the data and its origin. This therefore opens up new possibilities to also provide other data that is exchanged with more information about its origin. It increases the reliability of exchanged data.

The above is based on the GDI domain architecture data exchange, which I have drawn up together with architects from a number of government organizations. This architecture has now been established and is publicly available. We are also currently working on a domain architecture access, which you can also follow online.

Herkomst
Interessant? Deel het!
Illustratie stel je vraag
Meer weten over deze blog?

Neem contact op met ons, we vertellen er graag meer over!

© ArchiXL  |  Chamber of Commerce  05084421