Why Anthropic and OpenAI are obsessed with securing LLM model weights

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.

As chief information security officer at Anthropic, and one of only three senior leaders reporting to CEO Dario Amodei, Jason Clinton has a lot on his plate.

Clinton oversees a small team tackling everything from data security to physical security at the Google and Amazon-backed startup, which is known for its large language models Claude and Claude 2 and has raised over $7 billion from investors including Google and Amazon — but still only has roughly 300 employees.

Nothing, however, takes up more of Clinton’s time and effort than one essential task: Protecting Claude’s model weights — which are stored in a massive, terabyte-sized file — from getting into the wrong hands.

In machine learning, particularly a deep neural network, model weights — the numerical values associated with the connections between nodes — are considered crucial because they are the mechanism by which the neural network ‘learns’ and makes predictions. The final values of the weights after training determine the performance of the model.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

Learn More

A new research report from nonprofit policy think tank Rand Corporation says that while weights are not the only component of an LLM that needs to be protected, model weights are particularly critical because they “uniquely represent the result of many different costly and challenging prerequisites for training advanced models—including significant compute, collected and processed training data, algorithmic optimizations, and more.” Acquiring the weights, the paper posited, could allow a malicious actor to make use of the full model at a tiny fraction of the cost of training it.

“I probably spend almost half of my time as a CISO thinking about protecting that one file,” Clinton told VentureBeat in a recent interview. “It’s the thing that gets the most attention and prioritization in the organization, and it’s where we’re putting the most amount of security resources.”

Concerns about model weights getting into the hands of bad actors

Clinton, who joined Anthropic nine months ago after 11 years at Google, said he knows some assume the company’s concern over securing model weights is because they are considered highly-valuable intellectual property. But he emphasized that Anthropic, whose founders left OpenAI to form the company in 2021, is much more concerned about non-proliferation of the powerful technology, which, in the hands of the wrong actor, or an irresponsible actor, “could be bad.”

The threat of opportunistic criminals, terrorist groups or highly-resourced nation-state operations accessing the weights of the most sophisticated and powerful LLMs is alarming, Clinton explained, because “if an attacker got access to the entire file, that’s the entire neural network,” he said.

Clinton is far from alone in his deep concern over who can gain access to foundation model weights. In fact, the recent White House Executive Order on the “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence” includes a requirement that foundation model companies provide the federal government with documentation about “the ownership and possession of the model weights of any dual-use foundation models, and the physical and cybersecurity measures taken to protect those model weights.”

One of those foundation model companies, OpenAI, said in an October 2023 blog post in advance of the UK Safety Summit that it is “continuing to invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights.” It added that “we do not distribute weights for such models outside of OpenAI and our technology partner Microsoft, and we provide third-party access to our most capable models via API so the model weights, source code, and other sensitive information remain controlled.”

New research identified approximately 40 attack vectors

Sella Nevo, senior information scientist at Rand and director of the Meselson Center, which is dedicated to reducing risks from biological threats and emerging technologies, and AI researcher Dan Lahav are two of the co-authors of Rand’s new report “Securing Artificial Intelligence Model Weights,”

The biggest concern isn’t what the models are capable of right now, but what’s coming, Nevo emphasized in an interview with VentureBeat. “It just seems eminently plausible that within two years, these models will have significant national security importance,” he said — such as the possibility that malicious actors could misuse these models for biological weapon development.

One of the report’s goals was to understand the relevant attack methods actors could deploy to try and steal the model weights, from unauthorized physical access to systems and compromising existing credentials to supply chain attacks.

“Some of these are information security classics, while some could be unique to the context of trying to steal the AI weights in particular,” said Lahav. Ultimately, the report found 40 “meaningfully distinct” attack vectors that, it emphasized, are not theoretical. According to the report, “there is empirical evidence showing that these attack vectors are actively executed (and, in some cases, even widely deployed),”

Risks of open foundation models

However, not all experts agree about the extent of the risk of leaked AI model weights and the degree to which they need to be restricted, especially when it comes to open source AI.

For example, in a new Stanford HAI policy brief, “Considerations for Governing Open Foundation Models,” authors including Stanford HAI’s Rishi Bommasani and Percy Liang, as well as Princeton University’s Sayash Kapoor and Arvind Narayanan, said that “open foundation models, meaning models with widely available weights, provide significant benefits by combatting market concentration, catalyzing innovation, and improving transparency.” It continued by saying that “the critical question is the marginal risk of open foundation models relative to (a) closed models or (b) pre-existing technologies, but current evidence of this marginal risk remains quite limited.”

Kevin Bankston, senior advisor on AI Governance at the Center for Democracy & Technology, posted on X that the Stanford HAI brief “is fact-based not fear-mongering, a rarity in current AI discourse. Thanks to the researchers behind it; DC friends, please share with any policymakers who discuss AI weights like munitions rather than a medium.”

The Stanford HAI brief pointed to Meta’s Llama 2 as an example, which was released in July “with widely available model weights enabling downstream modification and scrutiny.” While Meta has also committed to securing its ‘frontier’ unreleased model weights and limiting access to those model weights to those “whose job function requires” it, the weights for the original Llama model famously leaked in March 2023 and the company later released model weights and starting code for pretrained and fine-tuned Llama language models (Llama Chat, Code Llama) — ranging from 7B to 70B parameters.

“Open-source software and code traditionally have been very stable and secure because it can rely on a large community whose goal is to make it that way,” explained Heather Frase, a senior fellow, AI Assessment at CSET, Georgetown University. But, she added, before powerful generative AI models were developed, the common open-source technology also had a limited chance of doing harm.

“Additionally, the people most likely to be harmed by open-source technology (like a computer operating system) were most likely the people who downloaded and installed the software,” she said. “With open source model weights, the people most likely to be harmed by them are not the users but people intentionally targeted for harm–like victims of deepfake identity theft scams.”

“Security usually comes from being open”

Still, Nicolas Patry, an ML engineer at Hugging Face, emphasized that the same risks inherent to running any program apply to model weights — and regular security protocols apply. But that doesn’t mean the models should be closed, he told VentureBeat. In fact, when it comes to open source models, the idea is to put it into as many hands as possible — which was evident this week with Mistral’s new open source LLM, which the startup quickly released with just a torrent link.

“The security usually comes from being open,” he said. In general, he explained, “‘security by obscurity’ is widely considered as bad because you rely on you being obscure enough that people don’t know what you’re doing.” Being transparent is more secure, he said, because “it means anyone can look at it.”

William Falcon, CEO of Lightning AI, the company behind the open source framework PyTorch Lightning, told VentureBeat that if companies are concerned with model weights leaking, it’s “too late.”

“It’s already out there,” he explained. “The open source community is catching up very quickly. You can’t control it, people know how to train models. You know, there are obviously a lot of platforms that show you how to do that super easily. You don’t need sophisticated tooling that much anymore. And the model weights are out free — they cannot be stopped.”

In addition, he emphasized that open research is what leads to the kind of tools necessary for today’s AI cybersecurity. “The more open you make [models], the more you democratize that ability for researchers who are actually developing better tools to fight against [cybersecurity threats],” he said.

Anthropic’s Clinton, who said that the company is using Claude to develop tools to defend against LLM cybersecurity threats, agreed that today’s open source models “do not pose the biggest risks that we’re concerned about.” If open source models don’t pose the biggest risks, it makes sense for governments to regulate ‘frontier’ models first, he said.

Anthropic seeks to support research while keeping models secure

But while Rand’s Neva emphasized that he is not worried about current models, and that there are a lot of “thoughtful, capable, talented people in the labs and outside of them doing important work,” he added that he “would not feel overly complacent.” A “reasonable, even conservative extrapolation of where things are headed in this industry means that we are not on track to protecting these weights sufficiently against the attackers that will be interested in getting their hands on [these models] in a few years,” he cautioned.

For Clinton, working to secure Anthropic’s LLMs is constant — and the shortage of qualified security engineers in the industry as a whole, he said, is part of a problem.

“There are no AI security experts, because it just doesn’t exist,” he said. “So what we’re looking for are the best security engineers who are willing to learn and learn fast and adapt to a completely new environment. This is a completely new area — and literally every month there’s a new innovation, a new cluster coming online, and new chips being delivered…that means what was true a month ago has completely changed.”

One of the things Clinton said he worries about is that attackers will be able to find vulnerabilities far easier than ever before.

“If I try and predict the future, a year, maybe two years from now, we’re going to go from a world where everyone plans to do a Patch Tuesday to a world where everybody’s doing patches every day,” he said. “And that’s a very different change in mindset for the entire world to think about from an IT perspective.”

All of these things, he added, need to be considered and reacted to in a way that still enables Anthropic’s research team to move fast while keeping the model weights from leaking.

“A lot of folks have energy and excitement, they want to get that new research out and they want to make big progress and breakthroughs,” he said. “It’s important to make them feel like we’re helping them be successful while also keeping the model weights [secure].”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link