Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.
As artificial intelligence infiltrates nearly every aspect of modern life, researchers at startups like Anthropic are working to prevent harms like bias and discrimination before new AI systems are deployed.
Now, in yet another seminal study published by Anthropic, researchers from the company have unveiled their latest findings on AI bias in a paper titled, “Evaluating and Mitigating Discrimination in Language Model Decisions.” The newly published paper brings to light the subtle prejudices ingrained in decisions made by artificial intelligence systems.
But the study goes one step further: The paper not only exposes biases, but also proposes a comprehensive strategy for creating AI applications that are more fair and just with the use of a new discrimination evaluation method.
The company’s new research comes at just the right time, as the AI industry continues to scrutinize the ethical implications of rapid technological growth, particularly in the wake of OpenAI’s internal upheaval following the dismissal and reappointment of CEO Sam Altman.
VB Event
The AI Impact Tour
Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!
Research method aims to proactively evaluate discrimination in AI
The new research paper, published on arXiv, presents a proactive approach in assessing the discriminatory impact of large language models (LLMs) in high-stakes scenarios such as finance and housing — an increasing concern as artificial intelligence continues to penetrate sensitive societal areas.
“While we do not endorse or permit the use of language models for high-stakes automated decision-making, we believe it is crucial to anticipate risks as early as possible,” said lead author and research scientist Alex Tamkin in the paper. “Our work enables developers and policymakers to get ahead of these issues.”
Tamkin further elaborated on limitations of existing techniques and what inspired the creation of a completely new discrimination evaluation method. “Prior studies of discrimination in language models go deep in one or a few applications,” he said. “But language models are also general-purpose technologies that have the potential to be used in a vast number of different use cases across the economy. We tried to develop a more scalable method that could cover a larger fraction of these potential use cases.”
Study finds patterns of discrimination in language model
To conduct the study, Anthropic used its own Claude 2.0 language model and generated a diverse set of 70 hypothetical decision scenarios that could be input into a language model.
Examples included high-stakes societal decisions like granting loans, approving medical treatment, and granting access to housing. These prompts systematically varied demographic factors like age, gender, and race to enable detecting discrimination.
“Applying this methodology reveals patterns of both positive and negative discrimination in the Claude 2.0 model in select settings when no interventions are applied,” the paper states. Specifically, the authors found their model exhibited positive discrimination favoring women and non-white individuals, while discriminating against those over age 60.
Interventions reduce measured discrimination
The researchers explain in the paper that the goal of the research is to enable developers and policymakers to proactively address risks. The study’s authors explain, “As language model capabilities and applications continue to expand, our work enables developers and policymakers to anticipate, measure, and address discrimination.”
The researchers propose mitigation strategies like adding statements that discrimination is illegal and asking models to verbalize their reasoning while avoiding biases. These interventions significantly reduced measured discrimination.
Steering the course of AI ethics
The paper aligns closely with Anthropic’s much-discussed Constitutional AI paper from earlier this year. The paper outlined a set of values and principles that Claude must follow when interacting with users, such as being helpful, harmless and honest. It also specified how Claude should handle sensitive topics, respect user privacy and avoid illegal behavior.
“We are sharing Claude’s current constitution in the spirit of transparency,” Anthropic co-founder Jared Kaplan told VentureBeat back in May, when the AI constitution was published. “We hope this research helps the AI community build more beneficial models and make their values more clear. We are also sharing this as a starting point — we expect to continuously revise Claude’s constitution, and part of our hope in sharing this post is that it will spark more research and discussion around constitution design.”
The new discrimination study also closely aligns with Anthropic’s work at the vanguard of reducing catastrophic risk in AI systems. Anthropic co-founder Sam McCandlish shared insights into the development of the company’s policy and its potential challenges in September — which could shed some light into the thought process behind publishing AI bias research as well.
“As you mentioned [in your question], some of these tests and procedures require judgment calls,” McClandlish told VentureBeat about Anthropic’s use of board approval around catastrophic AI events. “We have real concern that with us both releasing models and testing them for safety, there is a temptation to make the tests too easy, which is not the outcome we want. The board (and LTBT) provide some measure of independent oversight. Ultimately, for true independent oversight it’s best if these types of rules are enforced by governments and regulatory bodies, but until that happens, this is the first step.”
Transparency and Community Engagement
By releasing the paper, in addition to the data set, and prompts, Anthropic is championing transparency and open discourse — at least in this very specific instance — and inviting the broader AI community to partake in refining new ethics systems. This openness fosters collective efforts in creating unbiased AI systems.
“The method we describe in our paper could help people anticipate and brainstorm a much wider range of use cases for language models in different areas of society,” Tamkin told VentureBeat. “This could be useful for getting a better sense of the possible applications of the technology in different sectors. It could also be helpful for assessing sensitivity to a wider range of real-world factors than we study, including differences in the languages people speak, the media by which they communicate, or the topics they discuss.”
For those in charge of technical decision-making at enterprises, Anthropic’s research presents an essential framework for scrutinizing AI deployments, ensuring they conform to ethical standards. As the race to harness enterprise AI intensifies, the industry is challenged to build technologies that marry efficiency with equity.
Update (4:46 p.m. PT): This article has been updated to include exclusive quotes and commentary from research scientist at Anthropic, Alex Tamkin.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.