Some gen AI vendors say they’ll defend customers from IP lawsuits. Others, not so much.

A person using generative AI — models that generate text, images, music and more given a prompt — could infringe on someone else’s copyright through no fault of their own. But who’s on the hook for the legal fees and damages if — or rather, when — that happens?

It depends.

In the fast-changing landscape of generative AI, companies monetizing the tech — from startups to big tech companies like Google, Amazon and Microsoft — are approaching IP risks from very different angles.

Some vendors have pledged to defend, financially and otherwise, customers using their generative AI tools who end up on the wrong side of copyright litigation. Others have published policies to shield themselves from liability, leaving customers to foot the legal bills.

While the terms of service agreements for most generative AI tools are public, they’re written in legalese. Seeking some clarity, I reached out to vendors about their policies on protecting customers who might violate copyright with their AI-generated text, images, videos and music.

The responses — and non-responses — were enlightening.

Regurgitating data

Generative AI models “learn” from examples to craft essays and code, create artwork and compose music — and even write lyrics to accompany that music. They’re trained on millions to billions of e-books, art pieces, emails, songs, audio clips, voice recordings and more, most of which came from public websites.

Some of these examples are in the public domain — at least in the case of vendors that trawl the web for training data. Others aren’t, or come under a restrictive license that requires citation or specific forms of compensation.

The legality of vendors training on data without permission is another matter that’s being hashed out in the courts. But what might possibly land generative AI users in trouble is regurgitation, or when a generative model spits out a mirror copy of a training example.

On the top are images generated by Stable Diffusion from random captions in the model’s training set. On the bottom are images that the researchers prompted to match the originals.

On the top are images generated by Stable Diffusion, an image-generating AI, from random captions in the model’s training set. On the bottom are images prompted to match the originals. Image Credits: Somepalli et al.

Microsoft, GitHub and OpenAI are currently being sued in a class action motion that accuses them of violating copyright law by allowing Copilot, a code-generating AI, to regurgitate licensed code snippets without providing credit. Elsewhere, thousands of writers have signed an open letter decrying generative AI technologies that “mimic and regurgitate” their “language, stories, style and ideas.”

The cases keep coming.

Authors in California and New York have sued OpenAI for alleged IP theft of their works. Image-generating tool vendors, including Stability AI and Midjourney, are the subject of lawsuits brought by artists and stock image sites like Getty Images. And Universal Music Group is seeking to ban AI-generated music mimicking the style of musicians it represents from streaming platforms, sending takedown notices to have the songs removed.

Perhaps it’s no surprise, then, that in a recent survey of Fortune 500 companies by Acrolinx, nearly a third said that intellectual property was their biggest concern about the use of generative AI.

The threat of running afoul of copyright with a generative AI tool hasn’t stopped investors from pouring billions into the startups creating those tools. One wonders, however, whether the situation will remain tenable for much longer.

A question of indemnity

In the midst of the uncertainty, you might think that generative AI vendors would stand behind their customers in the strongest terms — if for no other reason than to their allay their fears of IP-related legal challenges.

But you’d be wrong.

From the language in some terms of service agreements — specifically the indemnity clauses, or the clauses that specify in which cases customers can expect to be reimbursed for damages from third-party claims — it’s clear that not every vendor’s willing to chance a court decision forcing them to rethink their approach to generative model training, or in the worst case their business model.

Anthropic, for instance, which recently inked a deal with Amazon to raise as much as $4 billion and is reportedly seeking another $2 billion investment from Google and others, reserves the right to “hold harmless” itself and partners from damages arising from the use of its generative AI — including those related to IP.

Point blank, I asked Anthropic, which offers strictly text-generating models, whether it would legally or financially support a customer implicated in a copyright lawsuit over its models’ outputs. The company declined to say.

AI21 Labs, another well-funded generative AI startup building a suite of text editing tools, also declined to give an answer. So I looked at its policy.

A21 Labs says that it might “assume exclusive defense and control” of a lawsuit against a customer if the customer chooses not to defend or settle it themselves. But it won’t pay for the privilege; it’ll be at the customer’s own expense.

OpenAI — arguably the most successful generative AI vendor today, with over $10 billion in venture capital and revenue approaching $1 billion — pointed me to its terms of use, which limit the company’s liability to “the amount [a customer] paid for [an OpenAI] service that gave rise to [a] claim during the 12 months before the the liability arose or $100.” That’s the best-case scenario for customers; OpenAI’s policy makes it clear that the company, in many if not most cases, won’t be a party to or defend against copyright lawsuits targeting its users.

Vendors building image- and video-generating AI, where the potential copyright violations tend to be a bit more obvious, aren’t much more supportive contractually than their text-first rivals.

Stability AI, which develops music-generating models in addition to image- and text-generating ones, referred me to the terms for its API. The company leaves it to customers to defend themselves against copyright claims and — unlike some other generative AI vendors — has no payout carve-out in the event that it’s found liable.

Midjourney and Runway.ai didn’t respond to my emails — but I found their terms. Midjourney’s policy releases the company from liability for third-party IP damages. Runway.ai’s does, as well.

Fine print

Now, some vendors — perhaps becoming more attuned to the concerns of enterprise customers considering adopting generative AI, or looking to position themselves as a “safer” alternative — aren’t shying away from committing to protecting customers in the event that they’re sued for copyright infringement. To a point.

Amazon, which recently launched a platform for running and fine-tuning generative AI models, called Bedrock, says that it’ll indemnify (i.e. defend) customers against claims alleging the model infringes on a third-party’s IP rights. But Amazon’s indemnification policy only applies to the company’s in-house family of text-analyzing models, Titan, as well as Amazon’s code-generating service, CodeWhisperer.

The CodeWhisperer indemnity is broader and applies to all IP claims, including trademarks. However, it requires at least a CodeWhisperer Professional subscription with copyright-defending filtering features enabled. Free users of CodeWhisper aren’t afforded the same protections. And customers must agree to let AWS control their defense and settle “as AWS deems appropriate.”

IBM also provides IP indemnity for its generative AI models, Slate and Granite, available through its Watsonx generative AI service.

“Consistent with IBM’s approach to its indemnification obligation, IBM doesn’t cap its indemnification liability for IBM-developed models,” an IBM spokesperson told TechCrunch via email. “This applies to current [and] future IBM-developed Watsonx models.”

Google wouldn’t respond to my emails. But from the company’s terms, it’d appear that Google offers some defense for customers against third-party allegations of IP infringement arising from its text- and image-generating models. However, Google says that it might suspend a customer’s use of the allegedly infringing model if it can’t find “commercially reasonable” remedies.

Google-backed Cohere, too, has a provision in its terms suggesting that it’ll “defend, indemnify and hold harmless” customers facing third-party claims alleging that Cohere’s models infringe on IP. Given Cohere’s heavy enterprise focus, that’s not surprising.

Microsoft recently made a splashy announcement that it’ll pay legal damages on behalf of customers using its AI products if they’re sued for copyright infringement — so long as those customers use “guardrails and content filters” built into its products.

Which products does it pertain to? That’s where it gets tricky.

Microsoft says its indemnity policy covers paid versions of its portfolio of AI-powered “Copilot” services (including the Microsoft 365 Copilot for Word, Excel and PowerPoint) and Bing Chat Enterprise, the enterprise version of its chatbot on Bing. It also extends to GitHub Copilot, Microsoft’s code-generating service co-developed with OpenAI.

But in its Azure policy, Microsoft clarifies that customers using “previews” of generative AI features powered by its Azure OpenAI Service are responsible for responding to third-party claims of copyright infringement.

Kate Downing, an IP lawyer based in Santa Cruz, takes issue specifically with the Copilot indemnity provision, arguing that — given the vagueness of the provision and its exclusions — the upfront costs of enforcing might be too high for a business to swallow.

By contrast, Adobe claims to offer “full indemnity” protection for users of Firefly, its generative AI art platform, asserting its models are trained on stock images for which Adobe already holds the rights. Users must be enterprise customers, however, and are subject to Adobe’s same liability cap that applies to other tech-based IP claims.

Adobe sometime-rival Shutterstock also provides indemnity to all enterprise clients, a policy the company introduced late this summer. So does Getty Images. (Getty Images and Shutterstock, like Adobe, train their models on licensed images.)

The road ahead

It seems likely that, as generative AI vendors, particularly startups, face investor pressure to acquire enterprise customers, indemnification protections will become commonplace. Those customers want the assurance that they won’t be sued over copyright claims, after all.

But if the current state of things is any indication, the policies won’t look similar. And some will have exceptions that’ll make them more attractive in theory than in practice — in other words, more marketing ploy than a legitimate protection.

As a recent article from U.K. law firm Ferrer & Co. puts it, indemnities don’t offer a “get out of jail free card” — nor are they a panacea.

“Our key message is, don’t see the offering of provider indemnities as a complete answer to the risk of third-party infringement claims,” the firm writes on its blog. “Instead, weigh the offering of such indemnities in the balance when determining whether to use that provider’s generative AI tool for a project.”

Gen AI customers would do well to remember that.

Source link