A tic-tac-toe board with human faces as digital blocks, symbolizing how AI works on pre-existing, biased online data for information processing and decision-making

This Person Does Not Exist: Reality and Simulation in Synthetic Face Datasets

This Person Does Not Exist: Reality and Simulation in Synthetic Face Datasets investigates the growing use of “synthetic data” to train artificial intelligence (AI) models and asks how these datasets complicate ideas about reality, authenticity, and ground truth. Synthetic data is a method for creating entirely new data by mimicking the statistical patterns of a smaller, real-world dataset. According to its proponents, synthetic data is better, cheaper, and faster than collecting new observational data. Skeptics, however, point out that synthetic data has the potential to amplify data biases, like a hall of mirrors. While public attention has focused on deepfakes and the outputs of generative AI models, the contribution of synthetic data as an input to AI development has been relatively overlooked.

Through expert interviews and discourse analysis of primary sources, I seek to understand how technology companies and AI developers describe the relationship between synthetic data and its referent. In particular, the project will focus on generated images of realistic-but-imaginary people and the accompanying claim that these synthetic images can resolve problems of diversity and inequality in technology. With synthetic data, AI developers can input variables like gender, race, and age and generate custom-made faces. However, this raises questions about the assumptions about appearance and social identity that are being embedded in AI-generated faces.

Image credit: Amritha R Warrier & AI4Media / https://betterimagesofai.org / https://creativecommons.org/licenses/by/4.0/

People

Nina Toft Djanegara

Nina Dewi Toft Djanegara is the Technology, Policy, and Law Fellow at the Institute for Technology, Policy, and Law (ITLP), a collaboration between the UCLA School of Law and the Samueli School of Engineering. An anthropologist by training, her research examines how human identity and categories of social difference are understood by computers. In addition to her work on synthetic data, she is also writing a book about the history of facial recognition and its use in U.S. border enforcement. Toft Djanegara holds a BA from UC Berkeley, an MS from Yale University, and an MA and PhD from Stanford University.