Chapter 7 Dissemination and related topics: security, documentation, metadata and archiving

7.1 Confidentiality and security

7.1.1 Confidentiality principles
415.
415. Censuses collect information on each person and household in the country or territory. As described in
Chapter 3, census data may be collected or compiled through direct enumeration, through a compilation of information from registers and administrative sources, or through a combination of these. A census does not focus on the specific details about individuals, but rather on providing statistics about the community as a whole and groups within the community. The public, therefore, has a right to expect, and needs to be assured, that when personal information is provided in confidence, this confidence will be respected. Names, addresses and personal identification numbers (PINs) should be separated from other data as soon as possible in the census process, and not released, so that the output data contains no personal identifiers. The confidentiality requirement encompasses the whole census operation, ranging from the security of any personal information process (such as completed census questionnaires or individual information from population registers or administrative data sources) to the protection of the information contained in the outputs and made available publicly.
416.
416. Assurances should be given to the public that all the personal information (such as names, detailed address information, government identification numbers, or internal linkage keys) will be treated in strict confidence by the census authorities and by any person who is employed by, or provides a service to, the census authority for the purposes of carrying out the census. Many countries have domestic legislation that protects such information, in the form either of specific census legislation or of more general legislation relating to statistical confidentiality or personal data protection and freedom of information. International human rights law also places an obligation on state actors to protect privacy.
417.
417. The following additional principles should govern the treatment of the information obtained from any part of the census processes:
(a) Only persons under the management of the census authorities, or agents acting on their behalf, should have access to any census-relevant personal information obtained;
(b) Privacy of individuals must be given primary importance. Census-relevant personal information must therefore be processed in such a way that it will not reveal any personal information to the public. Additionally, when information is collected directly from respondents, individual household members should be given the opportunity, if they wish, to give personal information on a separate questionnaire in a way that will not reveal it to others in their household or to an enumerator. In censuses compiled using data from administrative sources, personal information should be stored separately from linkage keys, kept confidential and not revealed in the dissemination of results;
(c) All members of the census organization and outside agents providing services to the census authority in connection with the census should be given strict instructions, and be required to sign legal undertakings, about confidentiality. They should be liable to prosecution for any breaches of the law;
(d) The physical security of any documents or digital data stored for census production and containing personal information held by the census authorities, by field staff or by authorized agents, should be strictly enforced and, if deemed necessary, reviewed independently;
(e) Any mode of accessing or processing personal census data should have strict safeguards to prevent unauthorized access;
(f) In releasing statistics from the census, sufficient steps should be taken to prevent the inadvertent disclosure of information about identifiable individuals and households. Special precautions may apply to statistical outputs for small areas;
(g) All data collected during the census that are no longer required for processing or analysis and contain personal information should be securely destroyed as soon as there is no longer a legal or operational need for their retention (see section
7.4), in order to prevent any possibility of subsequent unauthorized access or misuse. The destruction process should be documented and, if necessary, independently supervised.

7.1.2 Statistical disclosure control
418.
418. Statistical disclosure control (SDC) is the process used to protect statistical data in such a way that they can be released without revealing confidential information that can be linked to specific individuals or entities. The key objective of any SDC methodology should be to ensure a sufficient level of protection with minimum information loss and therefore maximum data utility retained. The SDC literature provides for a variety of standard risk and utility indicators to assess risk versus utility systematically and inform the choice of the most appropriate SDC setup for the country’s data release programme.
419.
419. Measures to prevent the disclosure of tabular data may include some, or all, of the following procedures:
(a) Restricting the number of output categories into which a variable may be classified, such as aggregated age groups rather than single years of age, particularly for the older ages (“global recoding”);
(b) Where the number of people or households in an area falls below a minimum threshold, suppressing statistical output (“local suppression”) – except, perhaps, for basic headcounts – or merging it with that for a sufficiently large neighbouring area;
(c) Adding “noise” to the microdata records before producing tables (“pre-tabular noise”). For example, swapping some unit record characteristics of the most risky records by finding a match in the microdata based on a set of predetermined matching variables, and swapping all or some of the other variables between the matched records (“targeted record swapping”);
(d) Adding “noise” to the tables produced (“post-tabular noise”). For example, rounding cell values up or down to the nearest multiple of the predefined rounding base (conventional rounding) or adding noise of limited magnitude in a controlled and consistent way across tables (controlled noise injection, for instance with the “cell key method”).
420.
420. In the release of census microdata (such as microdata under contract or public use files) it is important to ensure the removal of all information from databases relating to name, address and any unique characteristics that might permit the identification of individuals. Microdata sets for scientific use allow for complex analysis beyond what is possible using the published tables, visualizations and indicators. Constructing a microdata sample, thereby providing access to only a fraction of the total population, adds a layer of protection while preserving information about population-level characteristics. In addition, applying global recodes and local suppressions to the microdata can diminish the risk of disclosure. Perturbing the microdata or making use of synthetic methods in a targeted way may also help to protect confidential information. Perturbation and synthetic approaches should be used within bounds and tested rigorously, as they have the potential to reduce the accuracy and utility of the microdata.
421.
421. A systematic assessment of risk versus utility may indicate that a combination of several protection measures is most suitable for a given data release programme. Moreover, it is important to apply any SDC scheme consistently across the entire statistical output, as inconsistencies typically entail additional disclosure risks. Some of the noise-based methods mentioned in paragraph
419 were developed specifically to overcome known shortcomings of more traditional methods, such as excessive information loss and loss of consistency of local suppression in large table sets.
422.
422. Interactive publication tools are becoming increasingly popular (see paragraph
426). Where these are used, it is important that the SDC scheme can be automated and integrated into these tools. Very flexible tools that serve highly customizable user requests by direct queries to the complete underlying microdata entail specific additional disclosure risks, for instance from scripted massive and systematic querying attacks. Dedicated SDC schemes are typically needed in such cases.
423.
423. High geographical detail is one of the unique features of census outputs in many countries, reflected in the increasing trend towards releasing more data pertaining to the smallest geographical units, nowadays often complemented by grid products (see section
8.9). However, releasing data on very small and non-overlapping geographical units entails specific additional disclosure risks (such as “geographic differencing” e.g. between administrative and grid boundaries) and quality constraints (such as mapping accurately the spatial distribution of population or other indicators), which should be accounted for in the SDC scheme. For example, post-tabular noise methods have been found effective by several countries during their preparation of the past census round.
424.
424. Irrespective of the specific SDC scheme adopted, it is essential to explain to the users why and how these methods are applied and describe the broad properties of the scheme so that they are aware of the implications on output data. This holds in particular for SDC methods where released data look like untreated data, or for methods that may lead to a limited loss of internal additivity of tables. Educational efforts in this regard should obviously start with the NSO staff responsible for efficient data protection, and then this knowledge should be disseminated among data users and taught to people who handle statistics and data managers in various institutions and economic entities.

7.2 Dissemination
425.
425. A census is not complete until the information collected is made available to users in a form, and to a timetable, suited to their ever-changing needs. Thus, in disseminating the results of a census, much emphasis should be put on responsiveness to users’ needs (see section
6.3) and on high standards of quality in the production of statistics (see
Chapter 9). In accordance with the Fundamental Principles of Official Statistics, census results should be disseminated simultaneously to all users, and the greatest care should be exercised to avoid the inadvertent disclosure of information about identifiable individuals. Various statistical measures can be applied to protect confidentiality (see paragraphs
418–
424).
426.
426. There are several conventional ways of making the results of a census available to users:
(a) As reports (either in hard copy or, more commonly, as digital media) containing standard and pre-agreed tabulations, usually at the national, regional or local district area level;
(b) As ad-hoc reports comprising the standard census data but disaggregated by other sub-groups not previously published (a contribution towards the cost of production may be paid by the requester);
(c) As data products available online through NSOs’ websites or other electronic media. These data products may range from aggregated to micro databases, available for online processing or free download, optionally equipped with dynamic or interactive data visualization tools including mapping features to enhance the value of the statistics;
(d) As commissioned or customized outputs produced from a database, automated table builder or statistical data service, comprising customized cross-tabulations of variables not otherwise available from standard reports or abstracts. These should conform to the same statistical disclosure controls as those applied to standard outputs; and
(e) As microdata (often referred to as public use files), usually available only as a sampled fraction of the total population (often sampled at the household level to include individuals from the household) with SDC methods applied (see paragraphs
418–
424). To balance the increased risk of disclosure when providing microdata, a combination of additional controls may be applied: potential data users are often screened; data are made available in a restricted format only; and data are often supplied or accessed under secure and strictly-controlled conditions where thorough steps have been taken to protect the confidentiality of the data.
427.
427. Where customizable dissemination tools are not available or not sufficient to provide specific tabulations required by only a few users, such as certain government offices or specialized research organizations, these can be supplied on demand. Once produced, however, there should be no restriction on making them available publicly.
428.
428. In many countries, the cost-effectiveness of printed census products is diminishing. Physical copies cannot reach as many users as an online publication and the opportunities for enhanced communication features are reduced in comparison to online products. Traditional publications, especially in printed form, are nearing complete obsolescence in many countries. While print publications can provide coherent and consistent commentary on individual topics and therefore may suit particular users or markets, users nowadays generally expect interactive, dynamic, digital forms of dissemination.
429.
429. Data publications in electronic formats online should provide users with easy means of data retrieval in standard formats or through application programming interfaces (APIs), ideally complemented by user-friendly interactive functionalities to customize outputs. Immediate usability of the data tables online is the most important feature to ensure the searchability and relevance of the census information. International standards for metadata, such as the Statistical Data and Metadata eXchange standard (SDMX), should be used formatting output databases. Dissemination strategies should also be harmonized with any national government policies on open data.
430.
430. Online tools for ordering, specifying, customizing and receiving census data products and public use samples (microdata) should be developed wherever possible, ensuring that appropriate measures are in place to protect the statistical confidentiality of the data and the security of transmission. In the design of census outputs, consideration should be given to all current and emerging forms of technology used by intended audiences, such as smart phones and other portable devices (for more on the role of technology in dissemination, see section
5.7). It is also important to consider the flexibility of the content that users will need. Trends change quickly and a comprehensive set of tables or products that can ensure rapid access to data is critical for the relevance of the census information. Online table builders allow users to customize their requests and query the database directly for immediate results. This reduces both the need for a very large number of pre-determined data tables and the requirement for users to conduct lengthy searches to see whether the information they seek has been published and to locate it.
431.
431. Social media have become an increasingly popular and effective means of disseminating small amounts of census output, particularly to non-specialist users or for disseminating timely results relevant to events or important dates. The various social media platforms currently available permit text, images and infographics to be used to highlight the usability of census outputs. The use of such media often demonstrates an NSO’s commitment to engage and establish a dialogue with users in order to respond more readily to their questions and concerns (see section
6.5.2).
432.
432. Online access or dissemination of micro- and macro-databases on digital media can contribute greatly to an enlargement of the user base and thus to a greater demand for census data. It is important to weigh these advantages against two potential limitations:
(a) Some cross-tabulations may have particular quality issues because of non-response, sampling or processing errors, or as a result of processing or imputation procedures. Census authorities should establish procedures for warning potential users about such problems to help safeguard the credibility of the entire census. Some NSOs suppress the release of certain cross-tabulations for reasons related to substantive quality, although such a policy may alienate users. Other NSOs release such cross-tabulations in accordance with a clear policy that takes into account both substantive and technical considerations;
(b) Some detailed cross-tabulations, and all files with individual records, pose the risk of disclosing information about identifiable individuals, in violation of the rules on census confidentiality. This issue is discussed more fully in paragraphs
415–
424.
433.
433. Both the substantive quality and confidentiality issues need to be addressed, and appropriate safeguards established. With such safeguards in place, neither issue should prevent the dissemination of a wide range of census products.
434.
434. A range of products should be made available to meet the constantly-evolving requirements of users. There is likely to be a need for most or all of the following (specific needs should be determined through appropriate consultation rather than assumed):
(a) National, regional and local area summaries;
(b) Reports on key findings on particular topics, supplemented by detailed results and analyses, either in a standard form for areas down to the more local geographic levels, or as more detailed disaggregated statistics on particular topics and populations;
(c) Population profiles or key summary statistics for small areas and small population groups;
(d) Spatial and graphical analyses, including a census atlas, possibly complemented by interactive graphical analysis or mapping tools allowing for user-customized analysis;
(e) Value-added products such as area and household classifications;
(f) Supplementary metadata covering definitions, classifications, and coverage and quality assessments. The specific metadata will differ for enumeration-based, combined and register-based censuses, but should be fully comprehensive no matter which census design is used.
435.
435. The dissemination of census results should adhere to a predetermined, structured plan, including a pre-announced timetable. National figures should be released promptly according to the schedule, followed by the subsequent release of disaggregated data after a comprehensive processing phase, all in accordance with the established plan. Timeliness is a significant concern in the release phase, and efforts to shorten the release period should be made while ensuring the integrity of the data (for more on timeliness and punctuality, see
Chapter 1 Chapter 9).
436.
436. The initial release of population counts is generally awaited with great anticipation among a wide range of users including central government, policymakers, the media and the general public. Thus, some countries release provisional results very soon after enumeration is completed. Provisional results. which provide a general picture of population trends, are subject to change once the full data processing and verification operations have been completed. Data users should be made aware of the implications of using provisional population counts, which may differ substantially from the finally produced and validated figures. The nature of provisional results should be very clearly communicated to the media to avoid creating a mistaken impression that the final figures constitute a correction of earlier “errors”.
437.
437. The schedule and description of upcoming releases of final results and products should be made public early in the census process. This can help to maintain public interest in the census as well as to manage expectations of users. Release schedules should be planned carefully and conservatively to enable NSOs to meet them to the greatest extent possible, and any changes that do occur should be communicated to the public as soon as possible. The releases can be staggered, from simple, short descriptive summaries covering a country’s major geographical divisions initially, to more comprehensive cross-tabulations and descriptive thematic and analytical reports later on.
438.
438. Census data should, as far as possible, be provided to the public free at the point of access or delivery, in accordance with the first Fundamental Principle of Official Statistics (relevance, impartiality and equal access). In cases where charges are necessary (such as for customized or commissioned outputs), such charges should be set to make access to the results affordable to all types of users. Flexible interactive publication tools that process customized data queries automatically (see paragraph
426) may help to address custom user needs efficiently and thus increase the value of free-of-charge data releases. Online dissemination options have reduced reliance on extensive paper outputs (see paragraph
428). However, there may still be specific needs for NSOs to provide a paid print-on-demand service to supply census material to users who are unable to access or receive digital copies. Such users should not be disadvantaged by the lack of paper-based output.
439.
439. Products should be developed which allow statistical and geographical information to be delivered together with GIS and the use of APIs to meet the broadest possible range of needs, and with as much flexibility and inter-connectivity as possible, within the constraints imposed by the requirement to ensure confidentiality (see paragraph
423). Some desirable properties of products include:
(a) Users should be able to find information quickly and simply;
(b) NSOs will increase the value of their census data greatly by embedding graphic and mapping capabilities within products. Ideally, users should be given the possibility to generate graphs and maps easily for themselves, and then to print, save and download them, making them available for further use. Several countries have found that cooperation with commercial agencies is valuable for production of such products (see section
4.4);
(c) Open data formats should be prioritized, allowing the integration of census information with other databases, thus widening the range of ways in which the data can be used.
440.
440. Thematic mapping and data visualization have become important elements of dissemination of outputs. They appeal to NSOs because of their ability to engage users and to increase the reach of census data. Data visualization is a broad field, with content and structures ranging from simple infographics to sophisticated tools for multi-dimensional data analysis. Data visualization may present difficulties for some census agencies due to high costs, infrastructure requirements and the need for compliance with individual countries’ technical requirements (e.g. accessibility, common look-and-feel). The skills required for effective visualization may be in short supply, and there may be problems with dedicating sufficient resources to developing such skills, especially given widespread budgetary constraints. However, users are now expecting web content to be visual, engaging and personalized, so NSOs should give high priority to developing a data visualization capability.
441.
441. Since creating data visualizations can entail high production costs, NSOs should attempt to achieve a balance between producing these and producing the more traditional tabular data products, which not only have lower production costs but are preferred by some user groups. While there is a strong trend towards expanding visualization programmes in countries of the UNECE region to make census outputs more accessible for the general public, it is important for NSOs to continue to serve key users who prefer to use very rich raw tabular data for their own analyses. In the latter case, efficient and user-friendly online data access interfaces are important (e.g. APIs; see paragraph
429) and automatized online self-service tools (e.g. table builders; see paragraph
430) may support more efficient and targeted dissemination. A thorough understanding of user segments and their specific needs is therefore required. Dedicated research should be undertaken to better understand end users’ data needs (for more on user segmentation see
Chapter 6). There is no one-size-fits-all solution and many NSOs account for this by maintaining dedicated units or teams responsible for data visualization technology (whether census-specific or NSO-wide).

7.3 Documentation and metadata
442.
442. An important component of any country’s programme for disseminating the results of its census is a comprehensive portfolio of supporting documentation and metadata to help explain, clarify and enhance the value of the statistical outputs. Documentation and metadata should enable users to draw valid comparisons with data from previous censuses or other data sources.
443.
443. A metadata system provides supplementary information on characteristics of the census data. An NSO’s metadata system should be based on international standards, while meeting any specific national requirements. Since a census and its results are often closely connected with other statistical activities, it is recommended for the census metadata system in each country to use the same elements as the metadata system of the NSO as a whole. It is often necessary, however, for the census metadata to contain some elements that are used only for that census. The metadata system should also facilitate the widest possible international comparability of data.
444.
444. The metadata (as well as the metadata system) used for the 2030-round censuses should ensure comparability with data from previous censuses, while at the same time including any new elements arising from developments since the previous census. The metadata systems of individual NSOs should also reflect the extent to which they use data from direct enumeration and administrative data sources.
445.
445. A metadata system should encompass, as a minimum:
(a) Definitions of terms and concepts used;
(b) Data dictionary or glossary of terms;
(c) Explanatory notes to the tables;
(d) Classifications and nomenclatures;
(e) The census questions (if the information was collected using direct enumeration);
(f) The purposes for which the information was collected, particularly in the case of administrative data;
(g) The data sources used, particularly where data are derived from administrative sources;
(h) Data collection methods – Description of the methods used for data collection (e.g., online questionnaires, field interviews);
(i) Data quality and error estimation – Information on potential sources of error, data validity and reliability.
446.
446. For indicators for which international standard classifications have been created, those international classifications should be used. For indicators that cannot be classified by such international classifications, it may be necessary to create new nomenclatures. Supporting documentation may need to cover a wide range of specific issues such as: basic methodology, coverage, response rate, data sources, pilots and tests, derived variables, Internet responses, imputation, and post-enumeration surveys, as well as reports covering more general descriptions of the census operation as a whole and the quality of the data. As a minimum, countries should produce quality and coverage measurements, such as response rates (nationally and locally) and levels of data imputation (for the data source as a whole and for individual topics). More details about recommended documentation of coverage and quality are given in
Chapter 9.
447.
447. Methodological reports are particularly important in cases where the methodology has changed since the previous census (such as moving from a census based entirely on direct enumeration to a wholly or partially register-based approach). Such changes are likely to affect the definitions and concepts used, and hence the comparability between censuses. Such limitations of comparability should be made very clear to users through the methodological reports.
448.
448. Census authorities should consult with stakeholders to understand their metadata requirements, including their varying levels of statistical literacy and ability to interpret the metadata. Consultations should include users from diverse backgrounds, including government agencies, researchers, policymakers and the public. Feedback could be gathered in a range of ways, including structured surveys, focus groups, or other consultations (see section
6.3). This feedback should be used to refine metadata descriptions, making them more intuitive and comprehensive.

7.4 Archiving and access to closed census records
449.
449. The census is a special statistical data source, which in many cases offers comparable information covering regular intervals over a period of as much as 100–150 years. Census data is valuable not only for present-day decision makers and users, but also for future generations. The NSO or national archiving agency has the large responsibility to handle, archive and store this unique data source, maintaining this special historical picture of society for the future.
450.
450. Many countries retain the census information relating to individual persons and households only for as long it is required for data processing and the production of the statistical results, or until the entire census operation is conducted. However, the scientific, historical and genealogical value of the individual records should not be underestimated when considering the overall costs and benefits of the census. NSOs may decide to allow access – whether public or restricted – to the full set of census records after a specified period of closure. If countries do intend to retain the records for research in this way, they should ensure that there is a robust legal and physical framework in place to protect the security and confidentiality of the records until they become open to the public.
451.
451. National governments should recognize that the ability of NSOs to collect information from the general public in any future censuses or surveys may be seriously compromised if assurances given about the confidentiality of the information collected are not honoured. Public confidence in the security and confidentiality of the personal information processed for the census should therefore be treated as paramount.
452.
452. Closure of census records should extend to cover a period that is sufficient to protect the confidentiality of the information, particularly any sensitive information, about living people, or at least to minimize the risk of breaching such confidentiality. The period of closure in many countries is prescribed specifically by statute and varies from country to country. Other countries may rely on more general provisions within data protection and freedom of information legislation to keep confidential records closed until the risk of disclosure of personal information about living individuals has expired. A period of 100 years is therefore recommended, although with increasing life expectancy, countries may consider extending this threshold according to national circumstances.
453.
453. In addition to requests for public access, NSOs may also receive requests from other government agencies for access to census records for the purpose of validating or corroborating existing information when historical records are sparse or non-existent. Access to these records should be considered when compelled by law or if it clearly serves the public good.
454.
454. In addition to archiving the census records (for those countries that do so) it is important for all countries to ensure the preservation of, and easy access to, all the metadata and procedural or operational material, including all project management documentation, created during the entire census process. Not only does this provide a valuable audit trail when evaluating the success and effectiveness of the census, but it will also enable future census planners to learn from the successes achieved, and the challenges faced, by their predecessors. In doing so, countries should ensure that, as technology develops rapidly, the media and systems on which this valuable information is archived are reviewed regularly to ensure that it can be retrieved readily whenever it may be required in future years, perhaps as much as 20–50 years hence.