Prompt Injection Capture-the-flag – Red Team x AI

Joe Cooney • April 2, 2024

Red-team challenges have been a fun activity for PZ team members in the past, so we recently conducted a small challenge at our fortnightly brown-bag session, focusing on the burgeoning topic of prompt injection.

Injection vulnerabilities all follow the same basic pattern – un-trusted input is inadvertently treated as executable code, causing the security of the system to be compromised. SQL injection (SQLi) and cross-site scripting (XSS) are probably two of the best-known variants, but other technologies are also susceptible. Does anyone remember XPath injection?

As generative models get incorporated into more products, user input can be used to subvert the model. This can lead to the model revealing its system prompt or other trade secrets, reveal information about the model itself which may be commercially valuable, subvert or waste computation resources, perform unintended actions if the model is hooked up to APIs, or cause reputational damage to the company if the model can be coerced into doing amusing or inappropriate things.

As an example, entrepreneur and technologist Chris Bakke was recently able to trick a Chevy dealership’s ChatGPT-powered bot into agreeing to sell him a Chevy Tahoe for $1 . Although the U.S. supreme court has yet to rule on the legal validity of a “no takesies backsies” contract (as an employee of X Chris is probably legally obligated to drive a Tesla anyway) it is not hard to imagine a future scenario with steeper financial consequences.

For this challenge PZers were taking on Gandalf https://gandalf.lakera.ai/ – a CTF created by AI security start-up Lakera https://www.lakera.ai/ (Gandalf is doubtless a way for them to capture valuable training data for their security product). Gandalf progresses in difficulty from young and naive level 1 Gandalf, who is practically begging to give you the password, to level 8 – Gandalf the White 2.0, who is substantially more difficult to trick.

We time-boxed the challenge to only 20 minutes, and a couple of people were able to beat Gandalf the White 2.0 in this time. Several PZers also found the challenge so absorbing they were still going an hour or more later. Some people found prompts that worked well for several levels, allowing them to rapidly progress to the higher levels of the challenge, only to hit a wall when their chosen technique stopped working. Others were beguiled into solving riddles that Gandalf seemed to be posing to them in the hope that it would give them clues to the secret word for each level.

Overall, it was a fun and approachable challenge for anyone looking to become more familiar with the issue of prompt injection.

< Older Post

Newer Post >

Share This Post

Get In Touch

Patient Zero Named Finalist in Three Categories at ARN Innovation Awards 2025

October 27, 2025

Patient Zero named finalist in three categories at ARN Innovation Awards 2025: Technical Excellence, Marketing Excellence, and Enterprise Innovation. Learn more.

The AI-Hype Twist I Didn't See Coming: Gartner IT Symposium 2025

By Hanieh Madad • September 29, 2025

At Gartner Gold Coast 2025, everything was about AI — automation, efficiency, disruption, predictions. AI, AI, AI. But in between all that noise, I found myself thinking about how I work, and how I’ve found working at Patient Zero. Some of the things they talked about, like trust, autonomy and team culture, are the things I’ve experienced here.

The Joy of Faking It - Reducing Security Risks in Legacy System Enhancement with Synthetic Data

By Joe Cooney • September 25, 2025

The Joy of Faking It - Reducing Security Risks in Legacy System Enhancement with Synthetic Data

A retro-futuristic illustration depicting two men in lab coats operating a large vintage computer.

Migrating Systems to GPT-5: Tricks and Pitfalls

By Alex Petrakis • August 25, 2025

GPT-5 - the highly anticipated latest version of OpenAI’s hit the streets a few weeks ago. Despite of some breathless commentary from influencers who had been given early access, the eventual release was a bit underwhelming (in a way that only something that would have seemed like science-fiction a few short years ago but now seems passe, can be). Aside from the quality of the model itself, which some people have claimed was more about lowering OpenAI’s costs than delivering a better result, there are some issues that the change to GPT-5 has introduced when integrating it into a product which we thought we should share.

Prompt Injection Capture-the-flag – Red Team x AI

Share This Post

Get In Touch

Recent Posts

Patient Zero Named Finalist in Three Categories at ARN Innovation Awards 2025

The AI-Hype Twist I Didn't See Coming: Gartner IT Symposium 2025

The Joy of Faking It - Reducing Security Risks in Legacy System Enhancement with Synthetic Data

Migrating Systems to GPT-5: Tricks and Pitfalls

Quick Links

Our Privacy Policy

Contact Us

Latest News

Patient Zero Named Finalist in Three Categories at ARN Innovation Awards 2025