
5.4 Management
219.
219. Census operations involve a range of administrative processes that are common to other large-scale projects. For example, the planning of a complex operation such as the census would benefit from the use of project planning software and systems and applications for recruiting and paying large numbers of temporarily employed census enumerators. The NSO should consider how technology might assist in improving the efficiency and effectiveness of these operations. This can contribute both to containing the cost of the census and improving the overall quality of the census by allowing resources to be focused on the primary tasks of enumeration, processing and dissemination.
220.
220. Technology solutions are now available which can combine multiple field management functions. Customer Relationship Management (CRM) can provide solutions for field application management, front-end website, enhanced communication through chat-bot functionality and a citizen helpdesk for knowledge management.
221.
221. Modern technologies provide opportunities to improve the management of field operations and thus the quality of the census itself. Multi-modal collection operations require that timely information be provided to census enumerators so that they do not visit households that have already submitted a census form. This is both an efficiency issue and a public relations issue.
222.
222. While the key issue is the flow of timely information to the census enumerator, the same systems should also provide for a close to real time two-way flow of information between census managers and enumeration staff. Such monitoring of enumerator work will allow for more timely interventions where the data collection process is falling behind schedule or there are some problems with the quality of the data collected.
223.
223. The NSO may need to rely on external organizations for key parts of the solution. Regardless of whether these systems are internal or external, they must adhere to internationally (e.g. ISO27000 family), regionally (e.g. NIS2 in the European Union) or nationally agreed cyber security standards. This is especially important as software for various external or internal operations (collection, processing, geographical information systems, imputation, dissemination) are provided by a variety of solutions (e.g. one-stop-shop, proprietary, custom developments).
224.
224. A key factor in reducing risk is the relationship with the technology partners (contractors). Strong governance where the in-house census management team remains at the core, is critical to ensuring census systems are designed, implemented and delivered successfully.
225.
225. A census management infrastructure should contain the following elements:
(a) a register of dwellings with addresses and geospatial coordinates, in which all addresses are distributed by enumeration areas;
(b) a register of enumerators and their contact details with the possibility of linking each of them to a certain enumeration area (from the register of dwellings) and the addresses that it covers – if this data collection method is used;
(c) a register of devices for data collection (for example, tablet computers or smartphones) and their unique serial numbers with the possibility of linking them to the enumerators (from the register of enumerators) – if this data collection method is used;
(d) a central census data storage facilitating the collection, processing and accumulation of all data linking to the respondent's residential address (from the register of dwellings) that are received by all data collection methods (online responses, enumerators data, data from administrative sources);
(e) communication software to enable timely exchange between enumerators, supervisors and the census management team.
226.
226. All elements of this infrastructure should be interconnected and managed centrally using software and technology tools specially developed for the census purposes. It is recommended to perform the installation, configuration and performance check of the devices in the census management centre before delivering the devices to the field personnel.
227.
227. An integrated field communication system can use, and build on, already existing IT infrastructure. This implies, for example, direct access to the national geocoding infrastructure that provides data of single addresses. The geospatial information (addresses, buildings, cadastral parcels) should rely on using permanent identifiers
(see
Chapter 8).
228.
228. The central census data storage management tools provide the following functions:
(a) Downloading of forms completed online (if this data collection method is used);
(b) Verification of the forms’ suitability for processing (availability of answers necessary for the respondent's identification);
(c) Linking completed forms to the addresses of dwellings;
(d) Creation of a confirmation, such as a QR code, of the successful participation in the census for the respondent's feedback (depending on the method of receiving the completed form, to the personal account of the online census respondent, by email or other means) and for the transmission to the enumerator – if this data collection method is used;
(e) Downloading completed electronic questionnaires from the enumerators’ devices, from administrative sources and other data collection channels used in the census, linking them to the addresses of dwellings;
(f) Consolidation of versions of completed census forms obtained from all data collection channels used in the census relating to the same dwelling address – selection of the reference version, complementing it with data from other versions, removing duplicates;
(g) Calculation and visualization of statistics for monitoring the progress of the enumeration, to identify issues requiring an intervention or adjustment.
229.
229. Training and technical support for enumeration staff is an important issue. It should not be assumed that the people who are likely to be recruited for enumerator tasks are technically competent. Technology would facilitate the provision of training and technical support to enumerators, in particular if this needs to be done remotely.
230.
230. NSOs could consider remote-access technology to support flexible working arrangements of staff who process the census data while ensuring data security and confidentiality.

5.5 Direct enumeration

5.5.1 Internet
231.
231. For direct enumeration, it is recommended to offer the option of responding over the Internet as the first or preferred option.
232.
232. Responses to electronic questionnaires reduce overall collection cost and achieve data of better quality. Like other electronic data collection modes, data processing time and cost are reduced compared to paper-based modes because data will be formatted and can be uploaded or captured directly into databases. Data quality may also be higher because the instrument can contain built-in edits and prompts.
233.
233. Using the Internet as the medium means that the data is collected though self-enumeration rather than an interview. Self-administration of responses over the Internet does not require a visit of the enumerator and thus eliminates any influence that enumerators may have on the responses. By providing more privacy, it benefits the respondent’s self-correction and edits.
234.
234. The Internet option can be incorporated into any of the traditional methods of delivering and collecting census forms such as drop-off/pick-up, mail-out/mail back.
235.
235. For all the above-mentioned reasons, the Internet response is clearly preferable option for data collection in direct enumeration.
236.
236. The key factor to be considered is managing the collection control operations – that is, ensuring that every household and individual is counted once and once only. This requires the ability to link each household and any individual within the household to its geographic location. Furthermore, if the design of enumeration additionally includes the collection of forms by enumerators, they must receive suitable and timely feedback to update their own collection control information so that they do not visit households that have already responded.
237.
237. The potential level of take-up of an Internet option should be considered by assessing the proportion of the population who can access the Internet from home, the proportion who use broadband services, or the general use of the Internet for other purposes such as banking, filing tax returns or shopping.
238.
238. Systems and processes that allow for Internet return of census forms will also need to be developed. These are proven to save costs by reducing enumerator workloads, data capture, printing and postage.
239.
239. Data security is a very important issue and should be a key consideration in designing the infrastructure. Physically separate infrastructures should be set up to collect and to process the census information. Completed individual census forms, after their collection and capture, should be moved into a secure data processing infrastructure that is separate from the collection infrastructure.
240.
240. A standard census questionnaire that is downloadable from the Internet requires much less infrastructure than a form that is completed online. However, downloadable forms generally require a greater level of computer literacy than online forms. They will not necessarily work on different computer configurations and there will be an expectation that the NSO will be able to deal with each individual problem. Recent experience has shown that respondents generally prefer completing the form online. For these reasons, forms for online completion are recommended.
241.
241. Adopting the Internet response option requires the provision of credentials (usernames, passwords) to the respondents for accessing the online form. Methods of delivering the credentials include
(a) Mailing the paper forms or letters
(b) Delivery by enumerator directly to the respondent’s address
(c) Sending by email
(d) Sending by Short Message Service (SMS)
(e) Using the credentials of online public service portals or other online services that require a personal identification number
242.
242. An online form offers the possibility of interactive editing to improve response quality that is not possible on a paper form. Respondents expect that the form offers guidance – at the very minimum that they will be sequenced through the form and asked questions that are relevant to their situation. To ensure a high quality of data collected via the Internet, it is important to provide mechanisms to control response errors on the form. Such control should be conducted in real time, and the respondent should be immediately able to modify any incorrect data. If contradictions are found in the respondent's answers, the online form should identify this and provide the respondent with the opportunity to correct one or more answers, or delete them, or confirm that reported situation exists in real life (even if it is not provided for by the developers of the form). However, a balance needs to be struck so that respondent burden of error checks is not so great that it discourages people from getting to the end of completing their form, reserving hard checks to priority questions such as age and sex. Another benefit of online design is that it may be designed to allow individuals to complete their own elements of the form more easily within a household design.
243.
243. Providing the Internet option may contribute to improving the quality of the census by making it easier for some hard-to-enumerate groups to respond. Most countries report difficulties in enumerating particular population groups, for example, young adults and people living in secured accommodation where access is restricted. Some people with disabilities may also find it easier to complete an Internet form than a paper questionnaire. These groups are also more likely to be using the Internet for other purposes, and therefore, if available, this option should be promoted to these groups as a means of encouraging participation in the census.
244.
244. Provision of sufficient infrastructure provides one of the major challenges for offering an Internet option. The census enumeration takes place over a relatively short period of time and involves the whole population of a country, and it is unlikely that the NSO will already have the needed infrastructure to cope with the peak demands of a census. It is therefore likely that the Internet solution can justifiably be outsourced. It may be necessary for collection procedures to be modified to constrain demand. For example, staggering the delivery of census questionnaires or invitation letters or requiring people outside predetermined target populations or areas to contact the NSO before they can use the Internet form may be a means of restricting use of the Internet form.
245.
245. Census agencies should, therefore, assess how they wish to promote the use of the Internet. Such promotion should be determined by the capacity of the service to handle the expected load and should be coordinated with other data collection procedures. The public relations strategy should encompass assurances about the security and confidentiality of the information supplied via the Internet. Assuming that the Internet option is targeted to the whole population, the public relations strategy should also encompass managing public expectations about the ability to access the site during periods of peak demand. Simple messages of so-called “graceful referrals” advising people to use the Internet option at off-peak times should be prepared and used, if necessary, on the census Internet site itself, through any census telephone inquiry service and in any media promotion.
246.
246. The take-up of the Internet response option can be expected to increase above the levels observed in the 2020 census round. During the data collection, census agencies should constantly monitor the levels of public response and make an effort to increase the level of online response if necessary.

5.5.2 Portable devices
247.
247. The increasing sophistication and the reduction in unit costs of communication using laptop computers, tablet computers and smartphones means that these may be a cost-effective solution for census data collection. Possible applications for such devices include the replacement of enumerator paper maps, address registers and lists and as a means of data collection in the field. They have possible applications in the full range of census collection methodologies from drop-off/pick up through to the collection of the census questionnaires.
248.
248. Portable devices have the advantage of being able to provide real time two-way management information. Census managers can be informed of the progress of the collection operations as the enumerators deliver census forms and collect completed returns. Likewise, census managers can provide the enumerator, via the portable device, with updates on forms received and on households that need to be followed-up. Additionally, geospatial information for the collections (e.g. missing addresses, new developments) can be exchanged to allow efficient use of resources. Census managers can identify, in real time, areas where the enumeration is falling behind schedule or not meeting quality standards and instigate appropriate interventions (consider section
5.4).
249.
249. Centralized management of the portable devices for the census data collection includes the automation of the following functions:
(a) Installation on the devices of the following: special software for the census data collection by the enumerator (the electronic questionnaire); list of dwellings addresses and maps of the enumeration area; metadata and classifiers used in the electronic questionnaire; tools for the monitoring the operation of the device and the enumerator; training materials for the enumerators;
(b) Linking between the device, the enumerator and the enumeration area as elements of the corresponding registers (dwellings, enumerators and devices), update of the registers and links between their elements in the event of the enumerators or devices being replaced;
(c) Online management and monitoring of each device’s operation after its initialization by the field staff;
(d) Clearing the device of all information and software used for the census purpose after the successful transfer of the collected data to the central data storage at the end of the census and preparing the devices for a long-term keeping (conservation) or use for other purposes (for example, transfer to another agency).
250.
250. The online management and monitoring of device’s operation includes:
(a) Obtaining information about the time when the device is turned on and off;
(b) Obtaining geo-coordinates for completing each form and recording data about the data collection process, such as interview duration and the number of completed questions, errors and corrections in the forms;
(c) Transmission to the devices of the addresses and the identification data of online respondents to enable the enumerator to verify the completeness of the enumeration at each address and make corrections if necessary;
(d) Remote installation of software updates on the devices in case of emergency. It is recommended to use this only when critical software errors are detected during the census, since it is important to ensure the consistency of data collected using different software versions;
(e) Locking the device in the event that a field worker reports the loss or theft of the device to prevent illegal use of the device or information leakage from it;
(f) Providing a means of remote consultation of the enumerator with the central census office.
251.
251. Use of portable devices should allow greater opportunities for increased efficiency in data collection. However, several technical issues need to be considered in using such devices:
(a) Screen size may affect the ability of the enumerator to record and verify responses accurately. For the same reason, responding with mobile devices over the Internet risks fragmentation of data due to the small size of the screens.
(b) The compact and lightweight devices with sufficiently large storage capacity are most convenient for the field work of enumerators. The brightness and contrast of the screen should be adjustable to use the device both in bright and in dark light.
(c) To ensure the safety of data, completed information should be held in the devices for as short a time as possible. This time depends on how often the data synchronization processes are done, and is also determined by the time required for enumerators to finalize and verify completed questionnaires before synchronizing with the census data storage.
(d) Devices should be able to deal with being offline for periods of time. The length of battery life should be considered in relation to the daily workloads of field staff. It may be worth providing an additional power bank for the device.
(e) If system and software updates have to be made at the data collection stage, it is necessary to avoid the risks of loss of previously collected data or their inconsistency with the data collected after the update;
(f) The GPS accuracy (e.g. in densely populated urban areas) and the mobile signal reception (e.g. mountainous or forest areas) may not be satisfactory on some areas of the country. An assessment of mobile web connectivity should be done particularly if the portable device uses web-based collection.
252.
252. Solutions based on portable devices should be extensively tested before the census phase, both on their own and in interaction with other elements of census technology that do not use portable devices.
253.
253. There is also a range of security issues associated with the use of portable devices:
(a) There is a greater risk of being stolen or lost compared with bundles of paper forms. However, regular uploading of the data from such devices should minimize the need to re-enumerate areas if the devices are lost.
(b) Measures are needed to protect the confidentiality of any data on the device, in the event of loss of the device, and in transmission of the data. Data stored on the devices should be encrypted and only accessible through dedicated protection measures (e.g. passwords, fingerprints);
(c) Transmission of the data also needs to be secured through encryption and use of secure channels end to end;
(d) Security software should be loaded to the device and must be compatible with the other applications on the device. However, security software and passwords add an extra level of complication in use. These security measures will add to the support costs.
254.
254. The training tools for the portable devices should be uploaded to the device for the convenience of their use by the enumerators for the training and during the field work. They should cover all the elements of the enumerator's work, be interactive, have easy navigation and contain illustrative examples of the enumerator's reaction in all possible situations of using this device.
255.
255. Census agencies should think ahead about using the large number of devices after the census. It is impractical to store devices for the next census since they can become technologically outdated and unusable in 5 to 10 years without using and recharging. Census agencies may transfer some of these devices to other users (e.g. the government organizations) while keeping some of the devices.

5.5.3 Telephone
256.
256. In the past, automated telephone interviewing has been suggested as a potentially cost-effective solution for countries that have a short-form census questionnaire requiring only the capture of basic demographic information. However, no country applied it in the 2020 census round. Automated telephone interviewing is not recommended.
257.
257. Computer Assisted Telephone Interviewing (CATI) method can be used to collect data via the census questionnaire and/or to verify and complete any missing data collected on a long-form questionnaire. The user-friendliness of such systems decreases greatly as either the number and complexity of the questions increase or the number of people in the household increases.

5.5.4 Design of the electronic questionnaire
258.
258. The design of the electronic questionnaire is a very important part of the technological solution when responses are provided online or collected using portable devices. The design of the electronic questionnaire should take into account the following requirements:
(a) Contain a complete set and a clear sequence of the questions, which are divided into open and closed questions;
(b) Contain skip patterns by automatically displaying only the relevant questions and skipping those that are irrelevant or not applicable to particular respondents;
(c) Consider all branching paths of the questionnaire, including for rare situations, so that each question is addressed to at least some subset of the population;
(d) Provide response options with the choice of only one option or several answer options for closed questions, and if the "other" option is selected, allow the capture of the respondent's own answer;
(e) Fit the entire one question and its answer on the device screen without scroll or skip to the next screen, if possible, because the hidden part of question or answer options may be missed when answering it;
(f) Make available the help option as a text hint or a jump to the appropriate element of the metadata or training materials;
(g) Provide for easy navigation between the questions to one respondent, between members of the same household and between different sections of the questionnaire (e.g. about the housing conditions, about the household, about the person);
(h) Use built-in controls for the validity of the entered data, taking into account previously entered information about the respondent and other members of this household;
(i) Display a progress bar for questionnaire completion as well as the general quantitative characteristics of the completed questionnaire, such as number of persons in the household and the percentage of relevant questions answered.
259.
259. In the case of using an electronic questionnaire for both online self-completion by respondent and for enumerator’s device, the design of the questionnaire may differ because the respondents have no knowledge of the census methodology, whereas the enumerator is pre-trained and familiar with census terminology and the metadata built in the questionnaire.
260.
260. The online form should additionally contain a summary of the basic requirements for completing it by the respondent. These include:
(a) A description of the general structure of the questionnaire and the sequence of its completion;
(b) An estimated time of completing the questions per one respondent or one household;
(c) A description of ways to call up help information and respond to error messages;
(d) A description of the possibility to correct, delete or add some information to the previously completed questionnaire, if necessary;
(e) The signs of successful and unsuccessful completion of the census process and further actions of the respondent (e.g. obtaining confirmation of participation in the census);
(f) A way of feedback to NSO, such as a telephone number, email address of the census hotline or of the NSO, to evaluate the quality of online services or ask questions that were not answered when filling out the form;
(g) The ability to get translation of the form into the most popular languages in the country if necessary;
(h) The provision of answers to frequently asked questions with terminology accessible to respondents and a link to the page of the NSO website where the legal, methodological and organizational principles of the census are described.

5.5.5 Technology to support the enumeration of people with disabilities and without Internet access
261.
261. When introducing new technologies, it is necessary to keep in mind that the census must cover the entire population, regardless of the used technical equipment and the respondents’ proficiency of computer use.
262.
262. Technology can support the enumeration of the impaired and digitally disconnected in two main ways: (a) in reaching respondents that do not have the necessary Internet connection; (b) to allow for as many people as possible to enter their responses electronically.
263.
263. The digitally disconnected. In order to reach as many people as possible it will be required to identify locations that do not have proper Internet connectivity. One aspect that NSOs should consider in such cases is to offer a paper option. However, other options should also be explored such as deploying enumerators to gather information with portable devices using the Internet over a satellite, or giving the respondent a phone number to call so they can fill out their form by telephone.
264.
264. Accessible internet response. Another aspect to consider is how people with disabilities could fill out their census form over the Internet. Internet response should follow accessibility standards as defined by the Web Content Accessibility Guidelines (WCAG) 2.2. While adhering to those standards technically, it is important to consider accessibility already at the stage of developing the content. Before developing any new features that could impact accessibility, consultation with a centre of expertise in accessibility and a user experience group should be performed to ensure that new functionalities are designed to be properly accessible. Examples of the applied features include:
(a) Hidden text for auto-generated character mask fields, to inform vision impaired users of the necessary inputs required to be typed by the user and those that will be provided automatically;
(b) Required colour contrast to ensure users can view text content;
(c) Techniques for associating labels with interactive controls to allow assistive technology to recognize the label and present it to the user, therefore allowing the user to identify the purpose of the control.

5.5.6 Data capture from paper questionnaires
265.
265. Based on practices in the 2020 census round, it can be assumed that most countries with direct enumeration in their next census will use the Internet response option. The use of paper forms and optical recognition technology could be assumed as limited. Data collection with paper questionnaires may nonetheless be necessary because of respondents’ preference or lack of access to the Internet. In comparison to data collection over the Internet or by using electronic devices, this requires additional processing steps such as scanning, data capture and possibly also keying by operators. The fact that the system needs to interpret different handwriting contributes to the complexity of the process.
266.
266. For processing the paper questionnaires, it is recommended to use automated processes such as Intelligent Character Recognition (ICR).
267.
267. Optical Mark Recognition (OMR) can be a cost-effective option where the census form contains only tick-box responses. Additional means of data capture or computer-assisted coding operation are required to handle write-in responses. However, OMR has largely been superseded by ICR technologies.
268.
268. The most cost-effective option is likely to be a combination of digital imaging, ICR, repair and automated coding. An example of this process is briefly described below.
(a) The census forms are processed through scanners to produce an image. Recognition software is used to identify tick box responses and translate handwritten responses into textual values. Confidence levels are set to determine which responses are of acceptable quality and which responses require further repair or validation;
(b) Automated repair is designed to reduce the need for operator intervention and typically involves the use of dictionary look-up tables and contextual editing. The dictionaries are tailored according to the census question being processed. Thus, for example, the dictionary for country of birth question would only contain names of countries. Preparatory work on the construction of natural language dictionaries of terms will greatly increase the efficiency of coding;
(c) Operator repair can be undertaken on images not recognized. This is only cost-effective for those questions where there is a high probability that the repaired data can then be automatically coded;
(d) Automatic coding uses computerized algorithms to match captured responses against indexes. Those responses that cannot be matched are then passed to a computer-assisted coding process. For the responses that cannot be automatically coded, it is recommended to use a machine-learning algorithm that could replace human coders with as good and even better data quality and highly reduce cost. Data from a previous census can be used to train the machine-learning algorithm. Data from the current census testing cycle could also be used, especially if new variables need to be coded and can be verified to be of equal or higher quality than the one achieved using human coders.
(e) Further considerations on the use of digital imaging, ICR, repair, automated coding, Optical Mark Recognition (OMR) and Optical Character Recognition (OCR) are presented in the CES census recommendations for the 2020 round. Generative artificial intelligence can be expected to lead to new possibilities and replace keyers. However, it requires investment to build proper models to keep high quality of data.

5.6 Administrative data

5.6.1 Scope
269.
269. The technologies applied for the use of administrative data differ from those for data collection in the field. The development and increase in the availability of new information and telecommunication technology (ICT) allows administrative registers to be utilized more widely in population and housing censuses. Bearing in mind the development of state-of-the-art technologies and the commitment of agencies to implement innovative solutions in censuses in the 2030 census round, it will be necessary to create or modernize the software and hardware infrastructure for collecting, storing and linking data from administrative sources and storing metadata on processes and products.
270.
270. The quality of the source data has a large impact on the quality of output products. Therefore, the methodology for improving the quality of data from administrative sources, for example, by adjusting them to satisfy statistical requirements, is of vital importance. State-of-the-art ICTs may prove very useful here and have a key impact on improving the efficiency and effectiveness of these operations. For assessing the quality of administrative sources for use in censuses, reference is made to the UNECE guidelines on this matter, published in 2021.
271.
271. As part of the preparatory work for the census, particularly in the design phase, the necessary technical requirements related to the use of data from administrative registers, which may affect the need to modernize infrastructure, should be determined in the following areas:
(a) Data collection;
(b) Data storage;
(c) Data linking;
(d) Storage of metadata on processes and products.
272.
272. The application of several techniques of collecting data from administrative registers and other sources for use in population and housing censuses will require a more comprehensive organization and management processes and more complex systems. Modern technologies provide opportunities for improvement in this case as well. The process of collecting data from administrative registers should include the preparation of a data-collection strategy using various data-collection modes.

5.6.2 Security and confidentiality
273.
273. It is crucial to consider the growing emphasis on data security, privacy and data protection in society. This is evident in the legal and statistical frameworks of many countries. Moreover, EU member states must comply to the General Data Protection Regulation (GDPR). Consequently, there are increased requirements on how the agencies conducting a register-based population and housing census should receive and process data. To address these demands, a strong emphasis on the need-to-know principle and the processing of anonymized data is recommended.
274.
274. Secure IT infrastructure is the necessary condition for data collection from administrative registers. A crucial issue connected with the process of data collection is the protection of data. Regardless of the technology applied, the data collection strategy should ensure information security. This requirement should be addressed at the early stages of designing the process of obtaining and gathering data from administrative registers and designing the proper software and hardware infrastructure. The technical issues concerning the coding of data transmission should be considered in detail, together with the use of secure transmission channels.
275.
275. Appropriate technical and organizational measures should be implemented to protect the data against accidental or unlawful destruction or accidental loss (including backups), alteration, unauthorized disclosure or access.
276.
276. Security must be implemented in multiple layers. Register owners, data-collecting NSOs and other partners must establish secure transmission channels with corresponding certificates to control access granted to collectors.
277.
277. By organizing data into different “states” - such as source data, prepared data, statistical data and output data - authorization management can be effectively implemented for various roles and teams within the NSO. Data access should be enforced on a need-to-know basis, with anonymized identifiers and disclosure controls applied across all aggregation levels.

5.6.3 Data linking, transfer and storage
278.
278. Modern technologies are useful in the process of linking records and data. After identifying the administrative sources to be used in the census, it would be necessary to map them and create application programming interfaces (API) for the automatic flow of data to the central.
279.
279. Administrative sources are different from each other, both within different subject areas within a country and between countries. If possible, it is advised to use standardized APIs. Regardless of the interface used or differences encountered, good metadata and good understanding of data are very important for their inclusion in the census.
280.
280. Many countries are currently transitioning to cloud-based storage and computing (see section
5.9). Compared to traditional on-premises solutions, this shift offers new possibilities, including the potential for a more flexible, variable-controlled universe of data, or data lakes, rather than traditionally thematically stored, structured and updated data. These data lakes will place high demands on metadata and accurate periodization, made more feasible by the increased processing power of cloud-based computing. Centralizing the administrative data into a data lake would support the business process and can provide a strong and reliable infrastructure for storing, creating metadata and synchronizing the different sources of data.

5.6.4 Improving the administrative data
281.
281. Various techniques are useful for converting administrative data into statistical data. With the procedure of automatic data cleaning in place, it is possible to eliminate errors in source data from administrative registers and edit the data efficiently.
282.
282. Machine-learning models could be used, for example, in determining the “census address” of the individual or type of the private household.
283.
283. Artificial intelligence (AI) could be used, for example, for determining the classification of the economic branch of employment. Classification using AI can be more efficient, give quality results and allow classifying a high percentage of cases. The classification models used for this purpose need constant updates and quality analysis to evaluate the results. It is understood that specific security and confidentiality requirements may not always allow working with such data and models in the cloud. There are also known limitations of using classification models developed in English in the context of other languages.