Prompt Injection Capture-the-flag – Red Team x AI

Joe Cooney • Apr 02, 2024

Red-team challenges have been a fun activity for PZ team members in the past, so we recently conducted a small challenge at our fortnightly brown-bag session, focusing on the burgeoning topic of prompt injection. 


Injection vulnerabilities all follow the same basic pattern – un-trusted input is inadvertently treated as executable code, causing the security of the system to be compromised.  SQL injection (SQLi) and cross-site scripting (XSS) are probably two of the best-known variants, but other technologies are also susceptible. Does anyone remember XPath injection? 


As generative models get incorporated into more products, user input can be used to subvert the model. This can lead to the model revealing its system prompt or other trade secrets, reveal information about the model itself which may be commercially valuable, subvert or waste computation resources, perform unintended actions if the model is hooked up to APIs, or cause reputational damage to the company if the model can be coerced into doing amusing or inappropriate things. 


As an example, entrepreneur and technologist Chris Bakke was recently able to trick a Chevy dealership’s ChatGPT-powered bot into agreeing to sell him a Chevy Tahoe for $1. Although the U.S. supreme court has yet to rule on the legal validity of a “no takesies backsies” contract (as an employee of X Chris is probably legally obligated to drive a Tesla anyway) it is not hard to imagine a future scenario with steeper financial consequences. 


For this challenge PZers were taking on Gandalf https://gandalf.lakera.ai/  – a CTF created by AI security start-up Lakera https://www.lakera.ai/ (Gandalf is doubtless a way for them to capture valuable training data for their security product). Gandalf progresses in difficulty from young and naive level 1 Gandalf, who is practically begging to give you the password, to level 8 – Gandalf the White 2.0, who is substantially more difficult to trick. 

We time-boxed the challenge to only 20 minutes, and a couple of people were able to beat Gandalf the White 2.0 in this time. Several PZers also found the challenge so absorbing they were still going an hour or more later. Some people found prompts that worked well for several levels, allowing them to rapidly progress to the higher levels of the challenge, only to hit a wall when their chosen technique stopped working. Others were beguiled into solving riddles that Gandalf seemed to be posing to them in the hope that it would give them clues to the secret word for each level. 


Overall, it was a fun and approachable challenge for anyone looking to become more familiar with the issue of prompt injection. 

Share This Post

Get In Touch

Recent Posts

25 Oct, 2024
We’re pleased to share that Hanieh Madad, Senior Software Developer and Team Leader at Patient Zero, has been awarded the Women in Digital Technical Leader of the Year. This award recognises Hanieh’s dedication to her craft and her thoughtful approach to leadership within the tech industry. The judges highlighted Hanieh’s exceptional handling of a complex project, noting her skill in managing stakeholders, mentoring junior engineers, and her commitment to community contributions. In her acceptance speech, Hanieh shared, “I wouldn’t be standing here without my amazing team that I have had the privilege of working with. This award is as much theirs as it is mine.” At Patient Zero, Hanieh leads with a balance of technical expertise and thoughtful mentorship. Known for guiding complex projects to success, she consistently supports her team’s growth and development, making this recognition truly fitting. Congratulations, Hanieh, on this achievement and for the positive impact you continue to make.
01 Sep, 2024
Congratulations to three of our team members for being selected as finalists in the ARN Women in ICT Awards 2024. Recognised for their achievements and contributions within Patient Zero, our finalists are: Bay McGovern - Shining Star Demelza Green - Innovation Weasley Au - Graduate “This is a stunning display of emerging and established female talent in Australia,” said ARN Editor Julia Talevski. “This year’s finalists have set an extremely high bar and are a source of inspiration for women leading the way in technology — we are proud and privileged to be celebrating each and every one of them.” WIICTA 2024 will honour the channel across eight categories, spanning Innovation, Technical, Entrepreneur, Graduate, Rising Star, Shining Star, Achievement, and DE&I Individual Champion awards. In response to a wealth of standout submissions, specific categories have been divided to best acknowledge and highlight the depth of female talent in the Australian market. The winners will be announced on September 19th at the prestigious event set to take place at Doltone House in Jones Bay Wharf Sydney. For more information on the ARN Women in ICT Awards 2024, visit the official ARN announcement here .
By Joe Cooney 02 Apr, 2024
Red-team challenges have been a fun activity for PZ team members in the past, so we recently conducted a small challenge at our fortnightly brown-bag session, focusing on the burgeoning topic of prompt injection. Injection vulnerabilities all follow the same basic pattern – un-trusted input is inadvertently treated as executable code, causing the security of the system to be compromised. SQL injection (SQLi) and cross-site scripting (XSS) are probably two of the best-known variants, but other technologies are also susceptible. Does anyone remember XPath injection? As generative models get incorporated into more products, user input can be used to subvert the model. This can lead to the model revealing its system prompt or other trade secrets, reveal information about the model itself which may be commercially valuable, subvert or waste computation resources, perform unintended actions if the model is hooked up to APIs, or cause reputational damage to the company if the model can be coerced into doing amusing or inappropriate things. As an example, entrepreneur and technologist Chris Bakke was recently able to trick a Chevy dealership’s ChatGPT-powered bot into agreeing to sell him a Chevy Tahoe for $1 . Although the U.S. supreme court has yet to rule on the legal validity of a “no takesies backsies” contract (as an employee of X Chris is probably legally obligated to drive a Tesla anyway) it is not hard to imagine a future scenario with steeper financial consequences.
By Demelza Green 27 Feb, 2024
With the advent of ChatGPT, Bard/Gemini and Co-pilot, Generative AI, and Large Language Models (LLMs) have been thrust into the spotlight. AI is set to disrupt all industries, especially those that are predominately based on administrative support, legal, business, and financial operations, much like insurance and financial organisations.
More Posts
Share by: