5.2 Working definitions#
Before going any further, it is probably worth setting out some key terms, and giving them working definitions. The term “working definitions” is used because different international forums within the official statistics community are further developing definitions related to data ecosystems and data governance, so the definitions below may need to be updated over time to reflect the latest ideas. For now, these working definitions aim to highlight areas of consensus, at least within official statistics, whilst also helping the reader to understand the terminology used in this chapter. In some cases, where the working definitions are rather long, short definitions are also proposed, which, while not being as comprehensive, aim to convey the key points of the definition in one sentence.
Data ecosystem
Working definition: The entire network of actors (data collectors, producers, providers, analysts, users and others) that directly or indirectly generate and produce, collect, process, disseminate, analyse and/or otherwise consume data and associated services, as well as the necessary legal, policy, administrative, technological and technical infrastructures, that combine to support interactions and partnerships, facilitate the use of data and thereby generate value from data for society as a whole, within a specified country or region.
Notes:
The working definition is based on the definition of a data ecosystem given in the Glossary of this Handbook, prior to the development of this chapter: “The entire network of data collectors, data producers, data analysts and other data users that directly or indirectly collect, process, disseminate, analyse and/or otherwise consume data and associated services within a specified country or region”.
That definition is quite closely aligned with others that have been proposed over recent years, for example, the United Nations National Quality Assurance Frameworks Manual for Official Statistics (🔗) talks about a system in which several actors interact to exchange, produce, and utilize data, and the Africa Data Consensus (🔗) defined a data ecosystem as including the actors mentioned in the working definition above, as well as “laws and policy frameworks, and innovative technologies and tools”.
A definition in a paper from the German National Statistical Office (DESTATIS) (🔗) goes further and talks about national data ecosystems supporting “the social contract for data by allowing trustworthy data sharing and the use (and reuse) of a country’s data resources at the national, sub-national, or sectoral levels”. It goes on to discuss the value of a national data ecosystem, in terms of maximising the value of data to society through “data-driven digital products and services that are equally accessible to all citizens, businesses, and administrations”, concluding that “this approach delivers digital dividends and addresses the digital divide, ensuring business and government services continuity and empowering civil society”.
Perhaps the most detailed and comprehensive definition of a data ecosystem comes from the 2024 United Nations Economic Commission for Europe publication “Data Stewardship and the Role of National Statistical Offices in the New Data Ecosystem” (🔗). It combines elements of definitions from Statistics Canada, Eurostat and the OECD:
A data ecosystem encompasses data and statistical data, data subjects, along with a broad range of stakeholders, partnerships and data users that are involved in related data access and sharing arrangements, according to their different roles, responsibilities and rights, technologies, and business models. This includes the capacities, processes, policies and infrastructure used to manage data throughout its lifecycle and maximize its use as a strategic asset. The data governance and data stewardship activities take place in a data ecosystem, and relate to managing the interactions of four main categories of actors in the data ecosystem: data generators, data services, data business users, and end customers.
In common with the Africa Data Consensus, this definition goes beyond the actors, and includes policies and technical infrastructure. It also mentions maximising the use (and, hence, value) of data as a strategic asset, along similar lines to the DESTATIS definition. These two elements have therefore been added to the original definition from the Glossary of this Handbook, to create an enhanced definition representing a broad view across the international statistical community.
Data governance
Working definition: A system of decision rights and accountabilities for the management of the availability, usability, integrity and security of the data and information to enable coherent implementation and co-ordination of data stewardship activities as well as increase the capacity (technical or otherwise) to better control the data value chain, and the resulting regulations, policies and frameworks that provide enforcement. This includes the systems within an enterprise, organization or government that define who has authority and control over data assets and how those data assets may be used, as well as the people, processes, and technologies required to manage and protect data assets.
Short definition: A system of decision rights and accountabilities for the management of the availability, usability, integrity and security of data.
Notes:
The full working definition is taken from the 2024 United Nations Economic Commission for Europe publication “Data Stewardship and the Role of National Statistical Offices in the New Data Ecosystem” (🔗), which, in turn, combines elements from the Data Governance Institute, Statistics Canada, the OECD, and others. It is in line with other definitions used in the international statistical community, such as one proposed by the United Nations Economic and Social Commission for Asia and the Pacific (🔗):
Data governance is defined as the exercise of authority and control over the management and transformation of data with the objective of enhancing the value of data assets and mitigating data-related risks.
Data stewardship
Working definition: Data stewardship represents the ethical and responsible creation, collection, management, use, and reuse of data. It is expressed through long-term, inter-generational curation of data assets such that they benefit the full community of data users and are used for public good. Data stewardship works to support the growing maturity of data policy and is applicable at all scales, from the national or data system level, to the organization or enterprise level, to the individual or dataset. Made visible through a range of internal and external functions associated with stewardship roles – including data access, security, and data quality and standards – it influences proactive and responsible data practice to help deliver data strategies, maintain trust, and promote accountability. Reflecting an appropriate level of maturity, data stewardship is enabled though good data governance and data management, which provide oversight of data assets throughout their lifecycle to ensure their proper care.
Short definition: Data stewardship represents the ethical and responsible curation of data, including creation, collection, management, use, and reuse.
Notes:
The full working definition is taken verbatim from the 2024 United Nations Economic Commission for Europe publication “Data Stewardship and the Role of National Statistical Offices in the New Data Ecosystem” (🔗), which, in turn, combines elements from Statistics Canada, Statistics New Zealand, the OECD, and others. It encompasses and complements other proposed definitions, including those from Reister (🔗) and the Australian Bureau of Statistics (🔗).
Data custodian
Working definition: A data custodian ensures data assets safekeeping by focusing on the information technology aspects of data management, including data security, custody/storage, accessibility, scalability, configuration management, availability, auditing, backing-up and restoring, standardization, restoration processes, technical standards, and policy/procedure enterprise implementation.
Notes:
This working definition is also taken from the 2024 United Nations Economic Commission for Europe publication “Data Stewardship and the Role of National Statistical Offices in the New Data Ecosystem” (🔗), which, in turn, combines elements from Statistics Canada and the OECD. Some definitions go further, giving data custodians the responsibility for deciding whether and how their data assets can be integrated with other data assets, taking into account principles of confidentiality and ethics. The “data assets” safeguarded by a data custodian are generally data created / collected by the custodian as well as data created / collected by others that have been integrated or otherwise transformed by the custodian.
Data value chain
Working definition: Data-related processes through which value is created with data, including data creation, collection, validation, verification, storage, curation, enrichment, processing and analysis, access, sharing, and deletion.
Notes:
This definition is taken from the OECD Recommendation of the Council on Enhancing Access to and Sharing of Data, 2021 (🔗).
Some other relevant terms are defined in the Glossary to this Handbook. Some of the most relevant terms and definitions from that glossary are copied below for ease of reference:
Administrative data
Data collected by a government department or other public agency primarily for administrative (not research or statistical) purposes.
Administrative source
Government department or other public agency that collects administrative data.
Big data
Data generated by business or government transactions, social media, phone logs, communication devices, web scraping, sensors, etc., characterised by high volume, velocity and variety.
Citizen generated data (Sometimes referred to as “Citizen data”)
Data produced by non-state actors with the active consent and participation of citizens, including the indigenous people, primarily to tackle issues that affect them directly.
Geospatial data
Data that combine location information (usually coordinates on the earth), attribute information (the characteristics of the object, event, or phenomena concerned), and often but not always, temporal information (the time or life span at which the location and attributes exist).
Statistical data
Data collected, processed or disseminated by a statistical organization for statistical purposes.
Official statistics
Statistics produced in accordance with the Fundamental Principles of Official Statistics by a national statistical office or by another producer of official statistics that has been mandated by the national government or certified by the national statistical office to compile statistics for its specific domain.