China’s Collection of Data on Foreigners Is a National Security Risk

Nov 29, 2021

By Christopher Balding

China operates a techno-surveillance state collecting vast amounts of data domestically to ensure social stability. Using tactics from facial recognition programs to monitoring messages between friends, China maintains a large network of data collection points and management systems that allows government officials to monitor anyone in China at almost any time. But significantly less has been written about how China collects data that enable it to monitor foreign individuals and institutions.

China’s intelligence and security data collection targeting foreign institutions and individuals takes place across a broad array of threat vectors and by a wide range of actors. The government amasses information from public data available on platforms such as Google, corporate websites, Twitter and Facebook.

This practice is supplemented by Chinese electronics and technology companies that collect and share user data on foreign consumers. In an authoritarian state with a large electronics manufacturing sector, this fusion of government surveillance with private-sector cooperation creates a significant threat to privacy abroad, from intrusive data collection to unauthorized remote use of electronics such as headset microphones or cameras.

Further, China is making extensive efforts and investment in data management and analytics. The country is using technology to gain an edge in foreign surveillance and data collection. To respond to this growing threat, the U.S. and other liberal democratic nations should work to strengthen privacy and security measures.

How China Collects Data

In a civil-military fusion state such as China, where nonstate entities assist the government in collecting data, information on foreigners is gathered through a variety of channels. Because of this multi-pronged approach, it is difficult for nations such as the U.S. to take corrective or defensive actions. Many intelligence-gathering behaviors may be legal, or they may target individuals or institutions not typically the subject of surveillance. These new targets do not have the necessary training or defensive capabilities to counter China’s tactics. Beijing integrates a broad array of data channels and methods in its intelligence and data-collection operations, making policy responses very difficult.

As a manufacturing center for a large share of global technology, China occupies a key position. Many countries have long accused China not only of intercepting communications of foreign states—something almost every state has attempted for centuries—but of injecting electronic products with code capable of monitoring use. While many firms in today’s cloud-connected world hold a variety of user data, the civil-military authoritarian state changes the nature of what data are collected and how they are used: to serve the state and monitor foreign individuals or institutions.

Thus, a Chinese-made electronic device with monitoring code gives China the capability to spy on individuals or institutions that use the devices. For instance, data on noise-canceling headphones with built-in microphones are stored in China, and the Chinese government legally has access to that data and can remotely turn on the microphone.

To many non-Chinese, monitoring such a vast array of products sounds like science fiction. Though data-gathering is frequently associated with telecommunications firms such as Huawei and ZTE, which are already on the U.S. Entity List, many Chinese firms engage in gathering data on foreigners in ways that most consider outlandish. Though data collection on product usage by consumers is not itself novel, the breadth and use of private data provided for Beijing’s ends is unique.

For example, just recently, the FBI raided the U.S. offices of a China-based payment services provider after being alerted to suspicious data packets being sent back to China. The BGI Group, a Chinese genetics company that processes large amounts of DNA tests, has been labeled a security risk because it collects biologic data on Americans and other foreigners. The Federal Communications Commission has called for a review of Chinese drone and camera maker DJI because of the security risks of data ending up in China.

This is just a small sampling of Chinese companies that the U.S. considers security risks; these stories provide some understanding of the scope of Chinese efforts to gather data on foreign individuals and institutions. But China’s work extends far beyond basic data collection. Its technology is designed to provide automated insight into potential human intelligence targets and how they may serve Chinese objectives.

How China Uses Data

China’s open-source and big data strategy draws from large amounts of publicly available data on the internet. These data are largely self-provided or publicly reported, as on professional or social media platforms or in news media reports. China has developed specialized tracking and data ingestion tools specifically for intelligence-gathering purposes. The tools gather data from disparate sources that effectively generate professional, and in some cases psychological, profiles of key individuals. These data are instantly available to intelligence analysts, and the data allow automated sorting and classification to simplify background research.

One key challenge of mass data collection and surveillance is how to highlight specific or key pieces of information when they are needed. China accomplishes this in a couple of ways. First, Chinese big data collection involves scoring or ranking basic input data. In one database within China, cameras record and count individuals, classifying them by age and gender; similar databases exist in other countries too. Other datasets include widely used political science metrics on groups’ level of anger, propensity for violence or extremism. Data often include not only the information itself, but the value and importance of that information and of its source. One database collected by a contractor for Chinese intelligence even labeled individuals as “important” or “not important.”

Second, big data management means being able to find targeted, specific information. For instance, natural language processing tools within data management systems can summarize media stories that may involve Chinese interests. Other tools enable reverse-engineering from a targeted profile to locate individuals or information. If analysts need to find an individual with access to a specific technology or inside a certain company, they will work backwards, locating that individual based on his or her access to targeted technology, information or individuals.

Third, Chinese analysts link data points to each other to better contextualize information. Because massive amounts of data are gathered, linking potentially disparate pieces of information may provide context or understanding for previously misunderstood data. For instance, data collected from social media may identify an individual as a military service member. From there, an analyst could unpack location data and facial recognition data from pictures.

Automated systems could generate information about the individual’s colleagues, professional role, unit and other key information. These data could help analysts name significant numbers of individuals in military units and even pinpoint where the unit might be located and whom to target should the need arise. Not only does China collect this information, but the collection process is largely automated.

Fourth, China engages the civil-military fusion state in its data analysis, which opens up a vast array of possibilities for specific targeting. For instance, harvesting information about an individual would open the door to locating any devices that person uses. Locating device-specific identifiers and matching them against user databases from Chinese firms would provide easy entry to more granular or direct surveillance data. Many widely used, Chinese-coded apps, from TikTok to more mundane functions such as calculators, actually impose wide-ranging permissions on user devices.

Chinese and U.S. Approaches to Intelligence Activities

To contextualize China’s surveillance activities and capabilities, it is helpful to compare them with the intelligence activities of other countries, particularly the U.S. There is minimal evidence of the degree to which other governments such as the U.S. engage in similar types of activities, for many reasons.

First, there is a fundamental difference between the legal and organizational roles of the state in China versus the U.S. In a civil-military fusion system, the lines between the state and private enterprise are blurred to such a degree as to become irrelevant. Chinese citizens are obligated as a matter of law to assist the government at all times and to turn over any requested records on domestic and international activity and user data.

In many other countries such as the U.S., firms have legal and operational independence, and they may even fight attempts by the state to gather information about their users or refuse to cooperate in other ways. For instance, Google has refused to work with the U.S. military on artificial intelligence projects related to national security issues. Conversely, People’s Liberation Army and Ministry of State Security workers are known to work inside major Chinese tech firms in addition to sitting on their boards.

Differences among legal systems provide further evidence of how information-gathering varies among countries and blurs the lines between sources of collection. In China, the National Security Law requires firms to provide assistance domestically or internationally whenever requested by the government. The U.S. government, to the contrary, requires legal authority for surveillance or information-gathering that involves U.S. citizens. For example, when tech firms receive legally authorized warrants, they input legal documentation into their warrant management system, which records and allows specified access while excluding the government from viewing data not covered by the warrant. Chinese tech firms do not impose similar controls, which means the government’s access is unbounded.

Furthermore, China has an enormous infrastructure able to surveil and accumulate data on foreigners via its omnipresent domestic surveillance system. China is the most surveilled country in the world by almost any metric. It has approximately one camera for every two people—in a country of nearly 1.4 billion people. One study estimated that more than 2 million individuals worked in some capacity for the Chinese censorship machine. A significant amount of this work has now become automated, with Chinese media data now being categorized by sensitivity, emotions, blacklist status and other metrics. Still, many online-facing media firms employ a major share of their labor force to perform state-mandated censorship activities.

Many of the same tools, techniques and activities that are used to surveil Chinese domestically can be applied to gathering information on individuals around the world. Data from a Chinese social media platform about Chinese citizens are amassed using the same technical tools as data collected from Facebook and Twitter. With so many Chinese government and corporate resources allocated to surveillance and data collection, adding a few more foreigners into the extensive surveillance machinery takes minimal effort.

Chinese and U.S. Approaches to Data in General

Even beyond strict intelligence or surveillance work, China generally takes a different approach to data than the U.S. Whereas U.S. data collection is much more segmented—between the government and private sector, among different government agencies and even within companies, as when data files with personally identifiable information are encrypted—Chinese data collection is not as segmented and is therefore less secure.

The reduced security stems from a simple paradox of data security and Chinese governance. Chinese data by definition are less secure because the information must be available to the government at all times. Data cannot be readily available and readable on demand while also being encrypted, segmented and private. Consequently, the lower the encryption and the broader the data pool, the lower the security. On the other hand, encryption and other tools of increased digital security defeat the objectives of the surveillance state. China cannot meet its paradoxical requirements for both security and total government access.

This paradox presents an opportunity for the U.S. and other nations. If the Chinese state can watch its citizens, then so can others. Non-Chinese actors have a path to address China’s behavior by targeting its weak information security. So far, non-Chinese states have been very restrained in information grayzone activity in China; however, given the weakness of Chinese information security and the country’s belligerence, it is not difficult to see scenarios in which this could change.

The purpose of data collection in China is also very different from in the U.S. Google and Facebook may collect massive amounts of data, but their purpose is to sell you something, whereas data are collected for a very different end goal in China. In a civil-military fusion state, where companies act as de facto government agencies, the objectives of the state are the objectives of the companies. This is part of what leads companies to a philosophy of “collect data now and worry what to do with it later.” Google may collect geographic data on you hoping to understand what restaurants to recommend, whereas China uses this information to detect regular group meetings, such as house churches, that could be perceived as threats to the state.

With fewer legal or resource-based constraints on what the state can do with data, Chinese companies collect information on foreigners and the government worries about how best to use it. The effort to link all these data together has driven much of the investment into artificial intelligence, big data and machine learning, industries that derive a material portion of their revenues from state contracts. While there is no known single centralized database in China, there is some evidence that databases are linked together for government access.

Policy Implications

The information operation being run by the Chinese government is beyond the scope or scale of any surveillance operation the world has ever seen before. China’s role as both a manufacturing center and domestic surveillance state with extensive infrastructure to leverage for international purposes makes its data collection capabilities immense and multifaceted. Rather than taking gentle, tentative steps, the U.S. and other nations need to take bold action to address the ever-growing surveillance capabilities of China and other authoritarian states.

First, liberal democratic governments should strengthen privacy legislation and regulation. Technology has many good uses and cannot be turned back in a misguided attempt to address the rise of China. However, technology that can be used for democratic purposes can also be used for oppressive ends. Enhanced privacy legislation should include protections not just from foreign states and parties, but from domestic governments and other parties seeking access to data.

Second, greater attention needs to be paid to data transfer and internet openness. The reality is the internet is fragmenting for multiple reasons. Increasing user safety and privacy will require strengthened encryption, reporting, auditing and localization. A lot of attention is paid to the cloak-and-dagger information-gathering we see in the movies, but such covert operations are often not necessary. Due to an open internet and no privacy requirements imposed on websites or app developers, data on users and firms can simply be moved to adversarial states. These simple shortcomings need to be addressed.

Third, given China’s centrality to large amounts of manufacturing, specifically electronics manufacturing, the U.S. should create an active policy to move its technological manufacturing to democratically allied states. The U.S. should not be welcoming Chinese electronic products with audio-visual and other monitoring capabilities that send data back to China. This requires relocating production of both high-tech products such as advanced semiconductors and low-tech products such as cellphones.

Given the scope of data collection on devices and China’s known activity in this area, we cannot take our information safety for granted. The U.S. cannot continue to stand still as China builds a surveillance state that digitally reaches into Americans’ homes.