Hacking internal AI chatbots with ASCII art is a security team's worst nightmare

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.

Insider threats are among the most devastating types of cyberattacks, targeting a company’s most strategically important systems and assets. As enterprises rush out new internal and customer-facing AI chatbots, they’re also creating new attack vectors and risks.

How porous AI chatbots are is reflected in the recently published research, ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs. Researchers were able to jailbreak five state-of-the-art (SOTA)  large language models (LLMs), including Open AI’s ChatGPT-3.5, GPT-4, Gemini, Claude, and Meta’s Llama2 using ASCII art.  

ArtPrompt is an attack strategy researchers created that capitalizes on the poor performance of LLMs in recognizing ASCII art to bypass guardrails and safety measures. The researchers note that ArtPrompt only requires black-box access to targeted LLMs and fewer iterations to jailbreak an LLM. 

Why ASCII Art Can Jailbreak an LLM 

While LLMs excel at semantic interpretation, their ability to interpret complex spatial and visual recognition differences is limited. Gaps in these two areas are why jailbreak attacks launched with ASCII art succeed. Researchers wanted to further validate why ASCII art could jailbreak five different LLMs. 

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.

Request an invite

They created a comprehensive benchmark, Vision-in-Text Challenge (VITC), to measure each LLMs’ ASCII art recognition capabilities. The VITC was designed with two unique data sets. The first is VITC-S, which focuses on single characters represented in ASCII art, covering a diverse set of 36 classes with 8,424 samples. Samples encompass a wide range of ASCII representations using various fonts meant to challenge the LLMs’ recognition skills. VITC-L is focused on increasing complexity by featuring sequences of characters, expanding to 800 classes in 10 distinctive fonts. The increase in difficulty between VITC-S to VITC-L results quantifies why LLMs struggle. 

ArtPrompt is a two-step attack strategy that relies on ASCII text to mask the safety words an LLM would otherwise filter out and reject a request on. The first step is using ArtPrompt to make a safety word, which in the example below is “bomb.” The second step is replacing the masked word in step 1 with ASCII art. Researchers found that ASCII text is very effective at cloaking safety words across five different SOTA LLMs. 

Source: Jiang, F., Xu, Z., Niu, L., Xiang, Z., Ramasubramanian, B., Li, B., & Poovendran, R. (2024). ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs. arXiv preprint arXiv:2402.11753.

What’s driving internal AI chatbot growth 

Organizations continue to fast-track internal and customer-facing AI chatbots, looking for the productivity, cost, and revenue gains they show the potential to deliver. 

The top 10% of enterprises have one or more generative AI applications deployed at scale across their entire company. Fourty-four percent of these top-performing organizations are realizing significant value from scaled predictive AI cases. Seventy percent of top performers explicitly tailor their gen AI projects to create measurable value. Boston Consulting Group (BCG) found that approximately 50% of enterprises are currently developing a few focused Minimum Viable Products (MVPs) to test the value they can gain from gen AI, with the remainder not taking any action yet.  

BCG also found that two-thirds of gen AI’s top-performing enterprises aren’t digital natives like Amazon or Google but instead leaders in biopharma, energy and insurance. A U.S.-based energy company launched a gen AI-driven conversational platform to assist frontline technicians, increasing productivity by 7%. A biopharma company is reimagining its R&D function with gen AI and reducing drug discovery timelines by 25%. 

The high costs of an unsecured internal chatbot 

Internal chatbots are a fast-growing attack surface with containment and security techniques trying to catch up. The CISO of a globally recognized financial services and insurance company told VentureBeat that internal chatbots need to be designed to recover from negligence and user errors just as much as they need to be hardened against attacks.  

Ponemon’s 2023 Cost of Insider Risks Report underscores how critical it is to get guardrails in place for core systems, from cloud configurations and long-standing on-premise enterprise systems to the latest internally-facing AI chatbots. The average cost to remediate an attack is $7.2 million, and the average cost per incident ranges between $679,621 and $701,500.

The leading cause of insider incidents is negligence. On average, enterprises see 55% of their internal security incidents being the result of employee negligence. These are expensive mistakes to correct, with the annual cost to remediate them estimated to be $7.2 million. Malicious insiders are responsible for 25% of incidents, and credential theft 20%. Ponemon estimates the average cost per these incidents to be more costly at $701,500 and $679,621, respectively.

Defending Against Attacks Will Take An Iterative Approach 

Attacks on LLMs using ASCII art are going to be challenging to contain and will require an iterative cycle of improvement to reduce the risks of false positives and false negatives. Attackers will most assuredly adapt if their ASCII attack techniques are detected, further pushing the boundaries of what an LLM can interpret.    

Researchers point to the need for more multimodal defense strategies that include expression-based filtering support by machine learning models designed to recognize ASCII art. Strengthening those approaches with continuous monitoring could help. Researchers also tested perplexity-based detection, paraphrase and retokenization, noting that ArtPrompt was able to bypass them.

The cybersecurity industry’s response to ChatGPT threats is evolving, and ASCII art attacks bring a new element of complexity to the challenges they’re going to face. Vendors, including Cisco, Ericom Security by Cradlepoint’s Generative AI isolation, Menlo Security, Nightfall AIWiz and Zscaler have solutions that can keep confidential data out of ChatGPT sessions. VentureBeat contacted each to determine if their solutions could also trap ASCII text before it was submitted. 

Zscaler recommended the following five steps to integrate and secure gen AI tools and apps across an enterprise. Define a minimum set of gen AI and machine learning (ML) applications to better control risks and reduce AI/ML app and chatbot sprawl. Second, selectively bet and approve any internal chatbots and apps that are added at scale across infrastructure. Third, Zscaler recommends creating a private ChatGPT server instance in the corporate/data center environment, Fourth, move all LLMs behind a single sign-on (SSO) with strong multifactor authentication (MFA). Finally, enforce data loss prevention (DLP) to prevent data leakages.

Peter Silva, senior product marketing manager, Ericom, Cybersecurity Unit of Cradlepoint, told VentureBeat that “Utilizing isolation for generative AI websites enables employees to leverage a time-efficient tool while guaranteeing that no confidential corporate information is disclosed to the language model.” 

Silva explained that the Ericom Security solution would begin by setting up a DLP schema using a custom regular expression designed to identify potential ASCII art patterns. For example, a regular expression like [^\w\s]{2,} can detect sequences of non-word, non-space characters. Silva says this would need to be continually refined to balance effectiveness and minimize false alarms. Next, regular expressions that are likely to catch ASCII art without generating too many false positives would need to be defined. Attaching the DLP Schema to a specifically defined category policy for genAI would ensure it is triggered in specific scenarios, providing a targeted defense mechanism. 

Given the complexity of ASCII art and the potential for false positives and negatives, it’s clear that spatial and visual recognition-based attacks are threat vector chatbots, and their supporting LLMs need to be hardened against. As the researchers cite in their recommendations, multimodal defense strategies are key to containing this threat.

Source link