Skip to content

Register Now: S+T at UW Pop-Up Working Groups on Science, Society & Justice

Register Now: S+T at UW Pop-Up Working Groups on Science, Society & Justice

Society + Technology at UW is offering a new three-part Pop-Up Working Group on Science, Society & Justice for the UW community, hosted by Dr. Tim Brown (Department of Bioethics & Humanities).

Tuesday, February 11, 2025 | Topic: Airing
9:30 – 10:25 AM (PT)
📍 Online | Chatham House Rule
Register for the first session on Feb. 11

Tuesday, February 25, 2025 | Collaborating
9:30 – 10:25 AM (PT)
📍 Online | Chatham House Rule
Register for the second session on Feb. 25

Tuesday, March 11 | Creating
9:30 – 10:25 AM (PT)
📍 Online | Chatham House Rule
Register for the third session on March 11

*This event is now completed. Check out the Twelve Minute Stories written by participants in the final session, Creating.*

About the Working Group Theme: Science, Society & Justice

This working group begins with a guiding question: what does research, teaching, and intellectual life around science, society, and justice mean in 2025 for UW faculty, staff, and students?

To answer to this question, we are fostering a brave space for shared support, empathy, and uplifting dialogue. Hosted by Tim Brown, PhD, and moderated by Monika Sengul-Jones, PhD, the goal is to be a space for UW affiliates to collectively and iteratively discuss current affairs in relation to our charge at the university and determine key takeaways. 

“[W]e are witnessing tectonic shifts in the global political landscape that will deeply impact science and society research. UW faculty, staff, and students will need to adapt and respond in ways that uphold our values and uplift our communities,” said Brown. “To promote academic freedom and integrity at UW and beyond.”

Are the sessions the same?

No, the three Pop-Up Working Group sessions are not duplicates. Instead, the series is designed as an interlinked, emergent, and aggregated conversation.

I can’t attend, should I still register?

Yes, then you’ll receive information about the conversations and the opportunity to connect with others.

Will the sessions be recorded?

No, the Pop-Up Working Group sessions will not be recorded. If you’re unable to attend one or more sessions but would like to connect with Dr. Tim Brown about these themes more generally, you may reach out directly at timbr@uw.edu.

About Tim Brown, PhD

Tim Brown is an Assistant Professor in the Department of Bioethics & Humanities and a founding member of the Neuroethics Thrust within the Center for Neurotechnology at UW. His research explores the intersections of biomedical ethics, philosophy of technology, and social justice, particularly in the context of neurotechnologies and their societal impact.

🔗 Learn more about Dr. Tim Brown: https://depts.washington.edu/bhdept/timothy-brown-phd

About S+T Pop-Up Working Groups

S+T Pop-Up Working Groups are thematic problem-solving sessions proposed and hosted by members of the Society + Technology Affiliate network. Each session is a 55-minute online conversation addressing a timely question or challenge.

The first Pop-Up Working Group in 2024 emerged from a reading group discussion on AI at UW. Have a question or problem you’d like to explore with the S+T network? Propose a Pop-Up Working Group session by emailing mmjones@uw.edu.

🔗 Learn more: https://depts.washington.edu/societytech/wordpress/community-programs/

Salon Series Launches This January with Two Online Conversations on Genetics and Bioethics

S+T proudly launches an online Salon Series this January 2025 as one of the community programs designed to strengthen the emerging cross-campus, cross-disciplinary network.

Each Salon is a one-hour and fifteen-minute conversation between three-five Affiliates from the S+T network, with a moderator. The purpose is to recognize and honor live, arranged encounters as a meeting of the minds, to give greater visibility to the S+T network, and to cultivate intellectual conditions for deeper collaborations.

[1] S+T Salon | Online | Genetic Technologies, Technologies of Genetics

Mon, Jan. 13, 2025, 12:30 pm – 1:45 pm

Perspectives on technologies of genetics, including biostatistics, risk analysis, and more, from anthropological, cultural, and philosophical schools of thought.

Presenters:  Christian Anderson (IAS, UW Bothell, Shannon Cram (IAS, UW Bothell), Malia Fullerton (Dept. of Bioethics & Humanities, School of Medicine), Lisa Hoffman (Urban Studies, UW Tacoma), Sarah Nelson (Genetic Analysis Center, Dept. of Biostatistics, UW Seattle)

[2] S+T Salon | Online | Bioethics and Human Flourishing

Tue, Jan. 14, 2025, 2:30 pm – 3:45 pm 

Ethical, social, cultural, geographical, and critical perspectives on research and applications of genetics, neurotechnologies, precision medicine, and more.

Presenters: From the Department of Bioethics and Humanities at the School of Medicine: Tim Brown, Amy Hinterberger, Sue Trinidad; and from the UW Tacoma School of Interdisciplinary Arts and Sciences: Ilā Ravichandran

Speaker Biographies

[1] Salon | Jan. 13, 2025 | Genetic Technologies, Technologies of Genetics

Christian Anderson (IAS, UW Bothell)

Christian Anderson is an Associate Professor at UW Bothell and an interdisciplinary scholar working at the intersections of human geography, urban studies, cultural studies, science and technology studies, and critical social thought. In his previous work, his primary mode has been ethnographic. He is increasingly interested in practicing place-based collaborative methods—including oral histories, mapping and geo-visual techniques, and other qualitative approaches—within various contexts and structures of community-embedded collective study and knowledge production.

Across all of his work, Anderson’s abiding aim is to understand how ordinary people’s everyday lives, routine practices, relations, and taken-for-granted or “common sense” conceptions of the world interconnect with broader formations of culture, power, social reproduction, and political economy. Additionally, he seeks to experiment with conscientious place-based processes and protocols through which alternative conceptions, practices, relations, formations, and futures might emerge.

Shannon Cram (IAS, UW Bothell)

Shannon Cram is an Associate Professor at UW Bothell and an interdisciplinary scholar working at the intersections of geography, anthropology, science and technology studies, and the environmental humanities. Her research explores what it means to reckon with an unevenly contaminated environment and how managing exposure shapes the very definitions of health and hazard in the United States. Her work investigates the embodied politics of waste and wasting, with particular attention to the co-production of science and social life. She analyzes how frames such as risk, reason, pollution, and protection recognize (and fail to recognize) environmental impacts. In examining how such forms of recognition have come to be, she also asks how they might be reimagined. This concern with how power circulates in and through situated histories of toxicity is central to her scholarship.

Malia Fullerton (Dept. of Bioethics & Humanities, School of Medicine)

Stephanie Malia Fullerton, DPhil, is Professor in the Department of Bioethics and Humanities at the University of Washington School of Medicine. She is also an Adjunct Professor in the UW Departments of Epidemiology, Genome Sciences, and Medicine (Medical Genetics), as well as an affiliate investigator with the Public Health Sciences division of the Fred Hutchinson Cancer Research Center. She received a PhD in Human Population Genetics from the University of Oxford and later re-trained in Ethical, Legal, and Social Implications (ELSI) research with a fellowship from the NIH National Human Genome Research Institute. She currently serves as Director of Research for the department. Fullerton’s work focuses on the ethical and social implications of genomic research and its equitable and safe translation for clinical and public health benefit. She co-leads (with Sarah Nelson) ELSI research focused on data governance in the context of emerging cloud-based biomedical data storage and analysis platforms (R21 HG011501). She serves as a co-I with the Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium coordinating center (U01 HG011697), and previously served as the ELSI PI for the Clinical Sequencing Evidence-Generating Research (CSER) coordinating center (U24 HG007307). She has prior experience with qualitative research, particularly in association with the UW Electronic Medical Records and Genomics (eMERGE) Network, now in its fourth phase (U01 HG008657). She has also contributed expertise to the Ethics of Inclusion project (R01 HG010330) as well as to a project focused on community engagement surrounding APOL1 genetic testing in African Americans (R01 HG007879). 

Lisa Hoffman (Urban Studies, UW Tacoma)

Lisa M. Hoffman (she/her/hers) is Professor in the School of Urban Studies at UW Tacoma.  Trained in cultural anthropology, her scholarship has focused on questions of power, governing, and social change, with a particular interest in subjectivity and its intersections with spatiality.  Geographically, the majority of her work has been located in urban China, with an extension of these organizing questions into other realms in the United States, such as science and technology studies, ethnic identity, and homelessness. Her analytical approach has been strongly influenced by the work of Michel Foucault – especially in terms of how she thinks about power, sources of authority, and subject formation processes. Her current research project – “Being ‘high risk’ for cancer: personal genetics, the present self and managing future disease” – engages science and technology studies and considers how genetics and precision health are shaping subjectivity and contemporary practices of living. It is concerned with what is at stake when cancer risk assessment scores and other more personalized prevention practices (e.g., genetic testing) become increasingly commonplace, expanding the number of people identified as at-risk. The project includes ethnographic fieldwork with individuals identified as high risk for cancer, clinicians doing early detection work, and experts (e.g., in nutrigenomics, genetic counseling) who help people manage their present lives for a potential future illness. It also includes research on institutional alliances that lead to the production of knowledge about cancer prevention as well as computational practices producing health risk scores.

Sarah Nelson (Genetic Analysis Center, Dept. of Biostatistics, UW Seattle)

Sarah Nelson (she/her/hers) is an interdisciplinary researcher interested in the ethical and social implications of genomics in research, clinical care, and everyday life. She is a Senior Research Scientist and Project Manager at the Genetic Analysis Center within the University of Washington (UW) Department of Biostatistics. She holds a PhD and MPH in Public Health Genetics from UW, and the graduate certificate in Science, Technology, and Society Studies. Her graduate studies in Public Health Genetics have provided her with uniquely interdisciplinary training in the ethical, legal, and social implications (ELSI) of genetic research and its translation into clinical and consumer settings.

[2] Salon | Jan. 14, 2025 | Bioethics and Human Flourishing

Tim Brown (Department of Bioethics and Humanities, School of Medicine)


Tim Brown joined the Department of Bioethics and Humanities in the School of Medicine in July 2021 as an Assistant Professor. He earned his Ph.D. in Philosophy from the University of Washington, after earning a B.A. in Philosophy from the University of California, Santa Cruz. He is also a founding member of and long-term contributor to the Neuroethics Thrust within the Center for Neurotechnology at UW. He also leads diversity, equity, and inclusion efforts with the International Neuroethics Society. He works at the intersection of biomedical ethics, philosophy of technology, (black/latinx/queer) feminist thought, and aesthetics. His research explores the potential impact of neurotechnologies—systems that record and stimulate the nervous system—on end users’ sense of agency and embodiment. His work also interrogates neurotechnologies for their potential to exacerbate or create social inequities, in order to establish best practices for engineers. Finally, Dr. Brown’s approach to research is interdisciplinary, embedded, and relies on mixed methods; his work on interdisciplinary is aimed at encouraging deeper collaborations between humanists and engineers in the future.



Amy Hinterberger (Department of Bioethics and Humanities, School of Medicine)

Amy Hinterberger is Associate Professor and Chair of the Department of Bioethics and Humanities in the School of Medicine at the University of Washington. Prior to joining University of Washington in 2024, she was Associate Professor in the Department of Global Health and Social Medicine at King’s College London, UK. She has also held positions at the University of Warwick (2013 – 2017), Harvard University (2014), University of Oxford (2011 – 2013) and University of London (2010 – 2011).  A sociologist by training (PhD, LSE, 2010), her research addresses the ethical and political dynamics of biomedicine and biotechnology. Amy currently leads a team of researchers exploring the ethics and politics of stem cell models for human disease and development through a Wellcome Trust Investigator Award in the Social Sciences and Humanities called ‘Biomedical Research and the Politics of the Human’ ($753,351: 2020 – 2025): politicsofthehuman.org. Using qualitative empirical and ethnographic research methods, the project is designed as a social and ethical exploration into the changing relationship between humans, animals and biomedicine. Her research interests span multiple areas of innovation and technology, focusing particularly on cell-based technologies and genomics. Exploring the relationship between inequality and the social implications arising from emerging technologies is a key aspect of her scholarship. Additionally, she is interested in the intersections between sociology and bioethics, particularly in exploring the institutional governance and regulation of both humans and animals in biomedical research.  

Sue Trinidad (Department of Bioethics and Humanities, School of Medicine)


Sue Trinidad is an Assistant Professor in the Department of Bioethics and Humanities, having worked as a Research Scientist in the department from 2005-2022. She is the co-Principal Investigator of an NIAID-funded grant, Alaska Native People Advancing Vaccine Uptake, which will use peer-to-peer outreach, education, and motivational interviewing to increase COVID-19 vaccination among Alaska Native and American Indian people in Alaska. She has conducted empirical bioethics work concerning the ethical, legal, and social implications (ELSI) of genomic research and precision medicine with several large national consortia including the eMERGE Network, the Northwest-Alaska Pharmacogenomics Research Network, the CSER Consortium, and two Centers of Excellence in ELSI Research, the Center for Genomics and Healthcare Equality (CGHE) at UW and the Center for the Ethics of Indigenous Genomic Research (CEIGR) at the University of Oklahoma. Her research interests include the dynamics and ethics of equitable collaboration in health research; patient-centered communication and medical decision-making; the ethical and social implications of genomic research, wide data-sharing, and broad consent; moral and dispositional development; and qualitative methods development. As a white settler engaged in research with Alaska Native and American Indian communities, Trinidad works to develop participatory, strengths-based approaches to health research that respect Tribal sovereignty and the right of Indigenous peoples to self-determination. She served on the UW Institutional Review Board from 2009-2014. Trinidad holds a PhD in educational psychology (Learning Sciences & Human Development) from UW, an MA from the Interdisciplinary Program in Health and Humanities at Michigan State University, and a BA in English from the College of William and Mary. She came to UW from an executive-level position in product development for companies specializing in telephone triage and disease management counseling.

Ilā Ravichandran (School of Interdisciplinary Arts and Sciences at UW Tacoma)


Ilā Ravichandran is an assistant professor of legal studies at the University of Washington, Tacoma. She is an interdisciplinary sociologist who works at the intersections of feminist studies, critical carceral studies, legal studies, Black studies, and science & technology studies. Her research focuses on the intersections of science and law and engages with the global policing apparatus. To this end, her current research is a multi-method inquiry that analyzes the expanded use of genetics and genomics as a tool of racialized policing, with particular attention to the assemblages that converge to promote such an apparatus. This research has been funded by the National Science Foundation and the Social Science Research Network. She is the co-author of Imperial Policing: Weaponized Data in Carceral Chicago (University of Minnesota Press). She is also a visual artist and urban farmer, orienting her life’s work towards liberatory and imaginative futures.

Co-sponsored by the UW Tech Policy Lab, Science, Technology, and Society Studies (STSS), and the UW Department of Bioethics and Humanities in the School of Medicine.

Speaker Line-Up and Event Details for Inaugural Convening Announced

The speaker line-up and event details for the Society + Technology at UW Inaugural Convening this Friday, Jan. 10 from 9 am to 12:30 pm at the Center for Urban Horticulture have been announced. Registration is still open—access the details here.

The morning event marks the launch of this new initiative. The program will begin with remarks from President Ana Mari Cauce and Provost Tricia Serio, as the initiative is an outgrowth of the 2021-22 President and Provost Task Force on Technology and Society. The program will also feature lightning talks from affiliated researchers, a panel discussion, opportunities to meet colleagues, and a light fare reception.

The event is co-sponsored by the UW Tech Policy Lab with support from the Office of the Provost.

[Conversations] What Does Consent Mean in the Age of Large Language Models (LLMs)?

Smiling dark-haired white woman with glasses.

Transcript

Monika Sengul-Jones

As a computational linguist, why are you reluctant to have the audio recording of our conversation available or streamed on the Internet?

Angelina McMillan-Major

I’m concerned about my data being out there on the [open] internet, available to crawlers. Large language models (LLMs), as well as other generative or machine learning models, are trained using data scraped from the internet. Oftentimes, it’s collected using automated systems that crawl domains such as Wikipedia[’s corpus] going from link to link.

My data, my voice data, is called PII, personally identifiable information. It’s [among] the high-risk types of data because it’s uniquely identifying. 

I’m concerned about having my PII out in the wild, where automated systems can gather my PII and throw it into a model and use it as they will.

It’s also that personal data is pervasively undervalued. From the industry perspective, ‘data goes in’ and the product is the model, the output. So I’m concerned about our individual data rights and what can be learned about us, as people, through [our] personal data.

Monika Sengul-Jones

It’s funny that the word “data” can be used to describe something so personally unique—the sound of your voice.

Angelina McMillan-Major

Yeah, your voice is conceptualized as a pattern, [as data] it becomes frequencies. What’s important, or desirable, isn’t just the content of what’s spoken—it’s your voice frequencies and what sort of words you use.

Monika Sengul-Jones

Is it accurate to say, from a privacy perspective, you’re concerned about your sensory—vocal, in this case—fingerprints? That we need protection for something that is unintentionally created and possessed and therefore is given away without realizing or consenting?

Angelina McMillan-Major

Yes.

Monika Sengul-Jones

Let’s talk more about your work as a computational linguist. You’ve presented research on the history of computation and language, and how the same word—artificial intelligence—is used to describe different technologies. For instance, we have the ELIZA chatbot (an early natural language processing computer program developed from 1964 to 1967 at MIT) in the mid-century, which was cutting-edge AI. Today, ELIZA is pretty basic. Tell us more about why this history is important to know.

Angelina McMillan-Major

It’s a good question. Well, chatbots like ELIZA used shallow processing. It was N-gram language modeling.

Monika Sengul-Jones

Can you explain an N-gram?

Angelina McMillan-Major

They work by making a statistical prediction of what text will come next—sort of like an ‘auto-complete’ that isn’t very good.

“N” refers to the number of grams, of consecutive words, or tokens. So “the cat” is bi-gram. “The cat meows” is a tri-gram. The more words you add, the higher the n-gram, which is less frequent. The phrase “the cat meows in the tree,” that’s not going to happen often [in some given text data]. 

You look at the probability of what word might come next—that was the state-of-the-art AI. But at a certain point, there’s a limit to how natural an N-gram will sound. 

Then neural networks became popular, they sounded more natural and the probability space was more fluid.

Monika Sengul-Jones

How are neural networks different from N-grams?

Angelina McMillan-Major

A neural network is fundamentally based on an algorithm called the perceptron. This is a specific mathematical formula based on linear algebra that models language as a network [of nodes]. So [neural networks] is going from the probability statistics space to linear algebra. It shifts what sort of things you can do to smooth low probabilities [in language prediction] as well as create randomization to allow for more fluid, unique patterns that aren’t necessarily directly in the training data.

Smiling dark-haired white woman with glasses.
Angelina McMillan-Major, PhD, is a computational linguist in the UW’s Language Learning Center where she focuses on methodologies for language documentation and reclamation, specifically endangered languages. Photo credit: Russell Hugo, 2024

Monika Sengul-Jones

I have to mention, just the word ‘perceptron’ sounds cool. Were these developed around the same time as the N-gram? Or did one follow the other?

Angelina McMillan-Major

An early perceptron version of a neural network was also developed back in the 1940s.

Monika Sengul-Jones

Before ELIZA.

Angelina McMillan-Major

Yes, however, in the ’40s, computational linguistics had multiple theories, but it wasn’t until we had the personal computer and then the internet with enough data and hardware that we could actually implement these theories. So there were versions of neural networks in the ’90s, but they didn’t take off until the 2010s.

Monika Sengul-Jones

That was our “big data” moment. So, in this brief history of artificial intelligence as it pertains to language, where do large language models (LLMs) come in?

Angelina McMillan-Major

At the end of the neural network period (in 2017). Most people are familiar with LLMs that use a particular type of architecture, the transformer model. This is what ChatGPT is based on. Compared to other neural networks, LLMs using a transformer [architecture] are extremely data-intensive, using billions of tokens.

Monika Sengul-Jones

Let’s go back to my first question, what is at stake for people by the fact that we’ve called all these different technologies “artificial intelligence.”

Angelina McMillan-Major

We’re seeing models used for decision-making, like determining credit scores, and we know these outputs are biased but it’s not transparent within the module itself. We don’t have the opportunity to see—“Oh, my credit score was decided because this model output a .6 or something”—and what that means internally.

Monika Sengul-Jones

I know this black boxing causes real harm to people. We deserve transparency on how decisions are made. But also, if people use these models for decision-making, if people are relieved of decision fatigue, are you worried people are going to get stupider?

Angelina McMillan-Major

I hope not.

Monika Sengul-Jones

That’s a relief!

Angelina McMillan-Major

I’m less concerned about the loss of critical thinking skills and more about people willingly giving up rights to our personally identifiable information (PII) in exchange for ease. 

Monika Sengul-Jones

In exchange for ease, yeah. And then your PII could be used against you, I suppose.

Angelina McMillan-Major

I worry about the normalization of this exchange in society. I want society to be aware that the exchange is the centralization of power into a small number of big companies.

Monika Sengul-Jones

Big in reach, small in number.

Angelina McMillan-Major

It doesn’t necessarily have to be that way.

Monika Sengul-Jones

Let’s talk about how else it could be. In your research, you’ve been developing best practices for research with communities, such as those who speak endangered languages. In North America, Indigenous communities, for instance. For anyone concerned about privacy, about the integrity of their personally identifiable data, who wants to document their language and to protect their data, what’s your approach?

Angelina McMillan-Major advocates for a consent-based model of technology, drawing from the bodily consent literature. She recommends checking out the Consentful Tech Project to learn more. Image: Screengrab from Consentful Tech Project, 2024.

Angelina McMillan-Major

Collection, maintenance, and controlling access—these are huge priorities.

Most people are familiar with participation in data-gathering as something you can opt in or out of. When the opt-out model is used [as the default], it’s not consent, since people may not be aware that removing yourself is an option.

When you’re working with a community, the process is [and should be] different. There are archives that will hold this data. And usually, there are intimate processes. You go to a specific family, for example, whose ancestor has recorded something. You get permission from that family, you specifically ask to use their recording in research. You explain the forms you’ll be using it in, what will be shared, what the outcomes will be, and how you’ll be giving back and reciprocating with the community.

Monika Sengul-Jones

So you’re thinking about computational linguistics, in this process, as co-created partnerships of reciprocity.

Angelina McMillan-Major

Yes. Additionally, the person asking for consent carries the burden of providing as much information as possible. They need to ensure there’s some sort of understanding on the other end. This is distinct from the way that most of us just go through the terms agreement and click accept.

Monika Sengul-Jones

I just do what I need to do to move on. Those modal interruptions are the worst.

Angelina McMillan-Major

Yeah. So that’s not informed consent. That’s as-quickly-as-possible consent.

Monika Sengul-Jones

You have an acronym you use to understand consent in your work. Freely given, reversible, informed, enthusiastic, and specific; FRIES consent. That’s really nice.

Angelina McMillan-Major

Yeah, that’s drawing from the bodily consent literature.

Monika Sengul-Jones

Right, and it brings us back to the beginning of our conversation, thinking about our personally identifiable information (PII) as intimate data, as an important part of us and deserving of protection. Our PII body.

Angelina McMillan-Major

Yeah. However, one of the concepts that we don’t have a technical analogy for yet is “reversible.” Once you give your agreement, you can’t take back your data. That’s not necessarily the case in Europe, with the General Data Protection Regulation (GDPR). But that’s a problem with our current LLMs. It’s hard to take out data because it’s built into the model.

Monika Sengul-Jones

Right. I like to think about how reversal might work with, for example, The Author’s Guild class action lawsuit against OpenAI. Let’s say the authors win. How could the books be removed from OpenAI’s GPT models, to, for instance, prevent works from being generated that closely resemble those copyrighted works that should be withdrawn? The litigation is an important question for copyright law because the books are not copied or saved on the servers or directly used to generate responses to queries, rather there are cases of overfitting—and we’ll see how the courts rule—but in the event the authors win, how will whatever those books helped create be removed?

Angelina McMillan-Major

Well [the books as] data, sort of, are the weights. The actual numbers that are calculated from them form the body of the model. How do you tie a specific data instance to the weights that are spread across a giant billion-parameter model? That’s hard to do.

Monika Sengul-Jones

When I hear things like this, it reminds me of people saying, ‘You can’t put the genie back in the bottle.’ But is it impossible? It seems more of a political and labor question.

Angelina McMillan-Major

I think people are trying. I’ll say that. I’m not convinced.

Monika Sengul-Jones

You’re not convinced?

Angelina McMillan-Major

I mean, I just don’t know how you would do it, from a theoretical perspective.

Monika Sengul-Jones

But if people didn’t give consent to have their data used, and yet it was, and it became the foundation of the model, then won’t we need to figure out how to remove parts?

Angelina McMillan-Major

Well, there’s the remove-the-whole-thing option. It’s the remove parts that people are trying their best to work on.

Monika Sengul-Jones

Before we end, I want to ask you about another intervention you’ve made in your work with the Tech Policy Lab. Which is this concept of “data statements”—metadata that are attached to data points. Tell us about data statements, what do you want people concerned about data and privacy to know?

Cover of A Guide for Creating and Documenting Language Datasets with Data Statements Schema Version 3 2024 Angelina McMillan Major and Emily M Bender.
Data Statements Guide by Angelina McMillan-Major & Emily Bender (2024); report design by Elias Greendorfer.

Angelina McMillan-Major

[Data Statements] was started by Batya Friedman and Emily Bender, who were asking, ‘How can we help people make more informed decisions about selecting data for the models they are going to use, and for the systems those models are embedded in?’ Data Statements help to make sure they’re appropriate for the use case. The behavior of the model is so tied to the data that it’s trained on that you don’t want to use, for example, a model only trained on English data for some other language, something as simple as that. Data Statements are guides.

Monika Sengul-Jones

I started to think of our conversation about crawlers on the internet just going, eat, eat, eat, like a little Pac-Man. Then they run into something like a data statement and it’s like, “nope!” can’t pass, it’s not right for what I need! I don’t know [laughter] I just…I liked that visual for my understanding of data statements. Is that an accurate description?

Angelina McMillan-Major

I hope so someday! [Laughter] The existing versions of data statements are designed for human decision making, but maybe further research will result in machine-readable versions.

Transcription by Mollie Chehab
Editing by Monika Sengul-Jones
Graphic of Data Statements Guide by Elias Greendorfer
Image Credit: Portrait of Angelina McMillan-Major (2024) by Russell Hugo of the Language Learning Center

Related Links

Consentful Tech Project

Tech Policy Lab’s Data Statements Project

McMillan-Major, A., Bender, E. M., & Friedman, B. (2024). Data Statements: From Technical Concept to Community Practice. ACM Journal on Responsible Computing, 1(1), 1–17. https://doi.org/10.1145/3594737

McMillan-Major, A., et al. (2024). Documenting Geographically and Contextually Diverse Language Data Sources. Northern European Journal of Language Technology, 10(1). https://doi.org/10.3384/nejlt.2000-1533.2024.5217