12.8 Methodological services#
This section contains information about the methodological services that are part of the common statistical infrastructure and that support questionnaire design, sample design and estimation, editing, coding, imputation, outlier determination, seasonal adjustment, time series analysis, and confidentiality and statistical disclosure control activities associated with a survey or other statistical process.
Methodological services involve the use of specialised tools and systems, and of specialised staff (typically methodologists) who are expert in the design, development or acquisition, and use of such tools and systems. In this context, a tool implies a computer application supporting statistical activity, and a system is an integrated set of tools supporting a range of statistical activities.
International standards provide the foundation for developing methodological services and international statistical organizations provide support in their application.
12.8.1 Questionnaire design#
Provision of questionnaire design support
Responsibility for the design and development of a survey questionnaire lies with the subject matter area staff for the survey. Questionnaire design specialists should provide support for questionnaire design and development, typically located in a methodology unit. Support may include:
identification and acquisition of one or more questionnaire design and development tools;
training of NSO staff in the questionnaire design principles, practices and use of the tools;
assistance to subject matter staff in the design and testing of questionnaires;
review of questionnaires from the perspective of understandability and question flow, and suggestions for improvements.
The principles on which the services are based are described in Chapter 9.2.5 β Data collection and capture modes.
Questionnaire design guidelines, tools and systems
Many NSOs have developed questionnaire design guidelines to assist their staff. For example Questionnaire Design (π) is a section within the Basic Survey Design ManualΒ (π) developed by the Australian Bureau of Statistics.
In addition to guidelines, there are many questionnaire design tools available from NSOs, international statistical organizations, and commercially. Most of these tools are part of larger systems that, in addition to questionnaire design, include data collection and capture, editing, imputation and tabulation.
Two of the best-known systems from NSOs are:
Blaise developed by Statistics Netherlands, and which supports questionnaire design and all types of computer-assisted data collection; and
Census and Survey Processing System (CSPro) developed by the US Census Bureau and which is a public domain software package used by hundreds of organizations for entering, editing, tabulating, and disseminating census and survey data.
Two of the best-known systems from international statistical organizations are. Survey Solutions (π), free software developed in the Data Group of the World Bank, and EUSurvey, an online survey management system for creating questionnaires.
Systems from the commercial world include:
Survey123 for ArcGIS, which (π) is a form-centric data gathering application that integrates the use of mapping technology and survey operations;
Google forms (π), which is free and supports an unlimited number of surveys each with an unlimited number of respondents;
SurveyMonkey (π) is similar to Google Forms in that it supports any kind of online survey. The free version supports only a very small number of respondents.
All these systems are further discussed in Chapter 15.8 β Questionnaire design tools.
12.8.2 Sample design and estimation#
Provision of sample design and estimation support
The close relationship between the estimation scheme and the sample design is often designed together even though estimation actually takes place much later in the survey process than sample selection.
It is broadly acknowledged that sample design and estimation should be fully delegated to specialists in these subprocesses as they require more mathematical knowledge than other survey subprocesses. Typically, but not invariably, the specialists are located in a methodology unit. Sometimes they are embedded within the subject matter areas responsible for the surveys.
Responsibility for the design of, development/acquisition of, and support in use of sample design and estimation tools virtually always rests with specialists located in a methodology unit. For any given survey, the subject matter staff are responsible for specifying requirements and constraints in terms of sample size, planned output tables, acceptable sampling errors, data collection budget and costs, etc., and for checking that the resulting sampling and estimation methods satisfy these requirements and constraints. The support services typically include:
identification of appropriate sampling and estimation procedures;
identification and (if need be) acquisition of appropriate sampling and estimation tools;
conduct of sampling terminating with verification of the final sample with subject matter area;
support to subject matter area in conduct of estimation and interpretation of sampling errors.
The principles on which the services are based are described in Chapter 9.2.4 β Survey design and Chapter 9.2.6 β Processing survey.
Sample design and estimation guidelines
Many NSOs have developed sample design and estimation guidelines for their staff, both subject matter and methodology experts. An example is Sample Design, Australian Bureau of Statistics (ABS), which is a section within the ABSβs Basic Survey DesignΒ (π) documentation. It deals with:
defining the population, frame and units;
calculating the sample size;
determining the sampling methodology; and
determining the estimation method to be used.
The document includes a review of non-probability sampling methods, including quota sampling, convenience and haphazard sampling, and judgement (purposive) sampling, and the circumstances in which they might be used. It describes simple random sampling with and without replacement, systematic sampling, stratified sampling, sample allocation, cluster sampling and multi-stage sampling, post-stratification and the circumstances within which each of these might be appropriate.
In addition to guidelines developed by NSOs, there are many textbooks and research articles on design and estimation. A classic textbook is Sampling Techniques, Cochran, Third Edition, 1977, Wiley (π); and a classic article is Sampling and Estimation for Establishment Surveys, 1994, M A HidiroglouΒ (π).
The journals published by the International Statistical Institute (ISI), specifically including the Journal of Official Statistics (π), and the various national association such as the American Statistical Association (ASA), and the Royal Statistical Society (RSS) are rich sources of articles.
Sample design and estimation tools and systems
In the past, an NSO would develop tools for stratification, sample size determination, sample selection, and estimation itself, often separately for each survey. This is no longer common practice as tools for every aspect of sample design, and estimation are readily available commercially and from international organizations and other NSOs. Typically, these tools are combined with one another and with tools for data preparation, analysis and tabulation in a single system.
Three of the best-known commercially systems are listed below. Purchase of any of these systems must be accompanied by training. All have a broad range of features, some of which may be complex and not all of which may be relevant to an NSO.
SAS is a software suite that can discover, alter, manage and retrieve data from various sources and perform statistical analysis on them. It provides a graphical point-and-click user interface for non-technical users and more advanced options through the SAS language.
SPSS Statistics is a statistical software platform from IBM with essentially the same features as SAS.
Stata is a statistical software platform with essentially the same features as SAS and SPSS.
There are many freely available systems covering a wide range of functions of which. R (Project for Statistical Computing) is the best known. Although the system is free, the NSO staff require training in its application, which has to be purchased.
Over one hundred systems are available through GitHub.
The systems referenced above are described in Chapter 15.7 β Specialist statistical processing/analytical software.
12.8.3 Editing, coding, imputation and outlier determination#
Provision of support for editing, coding, imputation and outlier determination
The ultimate responsibility for executing these activities for any particular statistical production process typically lies with the subject matter area manager. However, in most NSOs, primary editing and (sometimes) coding are actually carried out by the staff who undertake data collection, typically field staff working from regional offices. Secondary editing, imputation and outlier determination may be carried out by specialists in these activities located in a methodology unit or in the relevant subject matter area.
Responsibility for the design of, development/acquisition of, and support in the use of, generic editing, coding, imputation and outlier detection tools and systems typically rests with specialists located in a methodology and/or ICT units.
In summary, support for editing, coding, imputation and outlier determination may include:
identification and/or development of appropriate procedures and tools;
training of staff in the editing, coding, imputation and outlier detection procedures and tools;
assistance to staff in the conduct of these activities.
The principles on which the services are based are described in Chapter 9.2.6 β Processing survey.
Editing, coding, imputation and outlier determination guidelines, tools and systems
Many NSOs have developed questionnaire design guidelines for these activities to assist their staff. For example, Data Processing (π) is a section within the Basic Survey Design ManualΒ (π) developed by the Australian Bureau of Statistics. Generic Statistical Data Editing Models (π) have been developed by a multinational task team under the High-Level Group for the Modernisation of Official Statistics (π).
In many NSOs the editing, coding, imputation and outlier determination tools are built separately for each production process (survey or administrative data collection) and, in the case of annual or less frequent surveys/collections for each cycle. This is not recommended practice. To the extent possible, the best approach is to use generic tools that can be customised to a particular survey or collection. Tools may be developed in house or, preferably, acquired from another NSO or an international statistical organization. In some cases, tools performing more than one of the functions may be combined in a system.
Systems currently available include CSPro, Survey Solutions, and Blaise, as noted above and described in Chapter 15.7 β Specialist statistical processing/analytical software.
12.8.4 Seasonal adjustment and time series analysis#
Provision of support for seasonal adjustment
As discussed in Chapter 10.3.1 β Methods of analysis, seasonal adjustment is a method widely used in official statistics for removing the seasonal component of a sub-annual (usually monthly or quarterly) time series. It includes pre-treatment, which involves detection and correction of outliers and calendar adjustment, i.e., removing trading day variations and moving holiday effects. In some cases, the original series may be differenced, i.e., a new series derived that comprises the differences between adjacent points in the original time series. The various choices made in setting up a seasonal adjustment (including pre-treatment) for a particular series are collectively referred to as model selection.
Responsibility for the seasonal adjustment rests with the subject matter area responsible for the series. The model selection and seasonal adjustment algorithms are complex and depend upon knowledge and use of a seasonal adjustment system. Therefore, the subject matter staff is typically helped by seasonal adjustment specialists, who are usually located within a methodology unit or an analysis unit, depending upon the organizational structure of the NSO. Their role is:
to decide upon the seasonal adjustment system(s) to be used across the NSO as a whole;
to acquire and install the seasonal adjustment system(s), to test it and to adjust it for each time series;
to determine the appropriate approach and system (if more than one is available);
to establish the initial seasonal adjustment model and parameter settings for the series;
to check that the model and settings work appropriately on the series before handing over the system to the subject matter area; and
periodically, to review the outputs, check the continuing applicability of the model and settings, and to make adjustments if need be.
Support may also be provided from outside the NSO. For example, Eurostat provides a free seasonal adjustment remote helpdesk (π).
Seasonal adjustment guidelines, tools and systems
ESS Guidelines on seasonal adjustment, 2015, EurostatΒ (π) provide a comprehensive description of all aspects of seasonal adjustment, including pre-treatment and model selection.
There are multiple seasonal adjustment systems available, of which the most commonly used are listed below and described in Chapter 15.7 β Specialist statistical processing/analytical software.
X-12 ARIMA, US Census Bureau.
TRAMO-SEATS, Department of Statistics, National Bank of Spain.
X-13 ARIMA-SEATS system, which combines X-12 ARIMA and TRAMO-SEATS, developed and supported by the US Bureau of the Census.
Jdemetra+, also combines X-12 ARIMA and TRAMO-SEATS, developed by the Department of Statistics, in the National Bank of Belgium for the ESS Seasonal Adjustment Group.
It is highly recommended that an NSO use the same seasonal adjustment system for all the series that are seasonally adjusted. This allows staff to become familiar with the system. In any case, a single system may allow more than one approach to seasonal adjustment. To the extent possible, the same approach should be used for all series. However, there may be a case for using different approaches in different domains.
12.8.5 Confidentiality and disclosure control#
Provision of support for confidentiality and disclosure control
As discussed in Chapter 3.2.6 β Principle 6 - Confidentiality, Principle 6 of the Fundamental Principles of Official Statistics (π) states:
βIndividual data collected by statistical agencies for statistical compilation, whether they refer to natural or legal persons, are to be strictly confidential and used exclusively for statistical purposes.β
Confidentiality is ensured through measures such as:
protecting questionnaires during data collection and when in transit β as discussed in Chapter 15.2.17 - Data security;
requiring all employees to swear an oath not to disclose confidential information;
restricting access to buildings and servers with confidential information β as discussed in Chapter 16.3 - Building security;
implementing confidentiality checking and disclosure control procedures β as discussed in Chapter 10.3.1 - Methods of analysis.
The first three items above are security issues. This section focuses on the fourth item, support for confidentiality checking and disclosure control.
Ensuring that there is no disclosure of confidential data in the output tables is the responsibility of the subject matter staff responsible for the statistical production process. However, these staff may not have the specialized skills required for confidentiality checking and preservation and may well draw on specialist support in selecting and using the appropriate tools. The specialists are usually located in methodology unit or an analysis unit, depending upon the NSOβs organizational structure. The support services typically include:
specification of appropriate confidentiality checking and disclosure control procedures and identification and (if need be) acquisition of a corresponding confidentiality checking and disclosure control tool, for use throughout the NSO;
and, for each set of output tables from a statistical production process:
support to subject matter area in their conduct of confidentiality checking and disclosure control; and
periodic verification of confidentiality preservation effectiveness.
Confidentiality checking and disclosure control tools
Identifying and preventing disclosureΒ is not a process that can readily be done, or should be done, manually. First, the tables are typically too complicated and/or voluminous. Second, manual processing is inefficient when the job can be much more readily done by automated processing. Thus, an NSO should either acquire a confidentiality checking and prevention tool as an element of its common statistical infrastructure or develop its own tool. Acquisition is recommended wherever possible to save development costs and to be more certain that the tool does the job properly. However, as confidentiality checking, and preservation tools are not readily available commercially (there being very little demand for such tools outside realms of official statistics) acquisition is likely to be from another NSO. Two well-known examples are as follows:
ARGUS, Statistics Netherlands
As described in ARGUS Usersβ Manual Version 3.3 (π), the purpose of Ξ-ARGUS is to protect tables against the risk of disclosure. This is achieved by modifying the tables so that they contain less detailed information. Several modifications of a table are possible - a table can be redesigned, meaning that rows and columns can be combined and/or sensitive cells can be suppressed and together with any additional cells required to protect against residual disclosure. Cells to protect these can be found in some optimum way (secondary cell suppression). A twin application, Β΅-ARGUS protects microdata files. Both applications have been rewritten in open source (π).
G-Confid, Statistics Canada
As described in GβConfid: Turning the tables on disclosure risk, 2013 (π), cell suppression is the technique used to protect tabular economic data in the disclosure control application G-Confid, developed at Statistics Canada. It is a generalized system that can deal with potentially voluminous multi-dimensional tables and incorporate new approaches. Its main objective is to provide the appropriate protection level for confidential cells while minimizing the loss of information resulting from the process.
G-Confid features a suite of three SAS components for use with tabular economic data at various aggregation levels. PROC SENSITIVITY identifies cells requiring primary suppression. The macro SUPPRESS protects the cells identified by PROC SENSITIVITY by selecting an optimal set of cells for complementary suppression using a linear programming algorithm. The macro AUDIT validates a suppression pattern not provided by the macro SUPPRESS or the G-Confid user altered after running the macro SUPPRESS. An additional macro, AGGREGATE, provides further information about sensitive unions of cells, and another auxiliary macro REPORTCELLS provides a visual snapshot of the suppression pattern to facilitate the creation of output tables of the economic data under study.