Eight Lessons Learned By Expert Attacks On GenAI
Red teams attack GenAI to make it safer for everyone
Red teams attack GenAI with an “intent to kill” to identify safety and security risks that are likely to occur in the real world. Their job is to break GenAI systems to make them safer for everyone.
Microsoft’s red team shares eight lessons learned from attacking over 100 GenAI systems that show how attackers can trick, manipulate, and otherwise subvert GenAI systems.
Interestingly, the red team doesn’t assume adversarial intent with every attack. The intent of attack actors could be adversarial (e.g., a scammer) or benign (e.g., a typical chatbot user).
These lessons are critical for GenAI implementations worldwide and the burgeoning new field of AI risk assessments.
As we enter our new GenAI era, understanding its risks is more important than ever, and as the authors state in lesson eight, it will be a never-ending task.
Imagine the advantage of being ahead of the curve. Subscribe now!
👉THE EIGHT LESSONS
1️⃣ Understand what the system can do and where it is applied.
The first step in an AI red teaming operation is to determine which vulnerabilities to target. Starting from potential downstream impacts, rather than attack strategies, makes it more likely that an operation will produce useful findings tied to real world risks.
2️⃣ You don’t have to compute gradients to break an AI system.
As the security adage goes, “real hackers don’t break in, they log in.” The AI security version of this saying might be, “real attackers don’t compute gradients, they prompt engineer.”
3️⃣ AI red teaming is not safety benchmarking.
Although simple methods are often used to break AI systems in practice, the risk landscape is by no means uncomplicated. On the contrary, it is constantly shifting in response to novel attacks and failure modes.
4️⃣ Automation can help cover more of the risk landscape.
The complexity of the AI risk landscape has led to the development of a variety of tools that can identify vulnerabilities more rapidly, run sophisticated attacks automatically, and perform testing on a much larger scale
5️⃣ The human element of AI red teaming is crucial.
Automated tools are useful but should not be used with the intention of taking the human out of the loop. In the previous sections, we discussed several aspects of red teaming that require human judgment and creativity such as prioritizing risks, designing system-level attacks, and defining new categories of harm.
6️⃣ Responsible AI (RAI) harms are pervasive but difficult to measure.
As models are integrated into an increasing number of applications, we have observed these harms more frequently. RAI harms are pervasive, but unlike most security vulnerabilities, they are subjective and difficult to measure. Attacks may be adversarial or benign.
7️⃣ LLMs amplify existing security risks and introduce new ones.
The integration of generative AI models into a variety of applications has introduced novel attack vectors and shifted the security risk landscape. However, many discussions around GenAI security overlook existing vulnerabilities.
8️⃣ The work of securing AI systems will never be complete.
The idea that it is possible to guarantee or “solve” AI safety through technical advances alone is unrealistic. It overlooks the roles that economics can play, break-fix cycles, and regulation.
If you know someone who would like this newsletter, please share it with them and help grow our community of Asia, CBDC, and AI aficionados!
Readers like you make my work possible! Please buy me a coffee or consider a paid subscription to support my work. If neither is possible, please share my writing with a colleague!
Sponsor Cashless and reach a targeted audience of over 55,000 fintech and CBDC aficionados who would love to know more about what you do!