Bug bounty programs aren't new. Google, Apple, Microsoft all have them. But the Safety Bug Bounty that OpenAI announced on March 25 is different. It doesn't target traditional security vulnerabilities like XSS or SQL injection — it targets AI-specific safety issues.
Hijacking an agent through prompt injection. Tricking AI into exfiltrating user data. Making an agent perform unauthorized actions. Find these problems and report them, and you could earn up to $100,000.
TL;DR
- OpenAI Safety Bug Bounty: dedicated program for AI abuse and safety risks (launched 3/25)
- Separate from the Security Bug Bounty — focused on AI-specific safety issues
- Max reward: $100,000 (critical), up to $7,500 for high severity
- Key scope: prompt injection, agent hijacking, data exfiltration, unauthorized actions
- Reproduction rate of 50%+ required — intermittent issues don't qualify
- Simple jailbreaks (rude language, easily searchable info) are out of scope
- Platform: submissions via Bugcrowd, triaged within days
- Separate Bio Bug Bounty (biological risk) also running
Why a Separate "Safety" Bounty?
Photo by Zulfugar Karimov on Unsplash | AI safety requires a fundamentally different approach from traditional security
Traditional security bug bounties cover server hacking, API key exposure, authentication bypass — classic security vulnerabilities. But the age of AI agents introduces entirely new threat categories.
ChatGPT Agent, Browser, and upcoming agent products act on behalf of users. They send emails, browse websites, download files. What if hidden text on a malicious webpage could hijack an agent's behavior? That doesn't fit neatly into traditional vulnerability categories.
OpenAI determined that their existing program couldn't adequately cover these AI-specific safety issues. Just as NVIDIA's Agent Toolkit uses OpenShell to sandbox agents, OpenAI is using bounties to discover attack vectors before they're exploited.
What Can You Report?
In-Scope
| Category | Description | Example |
|---|---|---|
| Prompt injection + data exfiltration | Attacker text hijacks a victim's agent | Malicious webpage instructs ChatGPT Agent to send user emails to external server |
| Unauthorized agent actions | Agent performs disallowed actions at scale | ChatGPT Agent deletes files or changes account settings without consent |
| Proprietary information exposure | Leakage of OpenAI internal information | System prompts, model weights, internal API structures |
| Account/platform integrity | Authentication and authorization weaknesses | Accessing another user's conversation history |
Out-of-Scope
- Simple jailbreaks: Getting the model to produce rude language or easily searchable information → no reward
- Content policy bypasses: Without clear safety or abuse impact
- Theoretical risks: Abstract scenarios that can't be reproduced
Key threshold: reproduction rate of 50%+. If you try 10 times, it must work at least 5 times.
Reward Structure
Photo by Rostislav Uzunov on Unsplash | AI safety research is now being financially rewarded
| Severity | Reward Range | Requirements |
|---|---|---|
| Critical | Up to $100,000 | Large-scale impact, immediate risk |
| High | Up to $7,500 | Reproducible, clear mitigation included |
| Medium/Low | Case by case | Limited impact |
$100,000 is significant. For reference, Google's Chrome bug bounty maxes out around $30,000, and Apple goes up to $200,000. In the AI safety space, $100,000 signals how seriously OpenAI takes this problem.
Additionally, OpenAI runs limited-time promotions — submit qualifying reports in specific categories for extra bounty bonuses.
How to Participate (Step by Step)
1. Register on Bugcrowd
Create an account at bugcrowd.com/engagements/openai-safety and join the OpenAI Safety program.
2. Review the Scope
Read the exact scope and rules on the program page. Submitting out-of-scope issues wastes everyone's time.
3. Test (Your Own Account Only)
You must only test with your own accounts. Affecting other users' data or systems is prohibited.
4. Write Your Report
What to include:
- Reproduction steps: Specific, followable procedure anyone can replicate
- Impact analysis: What real damage this vulnerability could cause
- Mitigation suggestions: How to fix it (affects reward amount)
- Reproduction rate: Success rate across attempts (50%+ required)
5. Wait for Triage
OpenAI says most submissions are triaged and validated within a few days.
What This Means for Developers
This program matters beyond just "earning bounties."
1. AI Security Is Now a Distinct Specialty
Following traditional security (web, network, mobile), AI security is establishing itself as an independent field. Prompt injection, agent hijacking, multimodal attack vectors — demand for experts in these areas will surge.
Both Microsoft Copilot Cowork and NVIDIA Agent Toolkit positioned agent security layers (Agent 365, OpenShell) as core features — the entire industry is taking this seriously.
2. A Security Checklist for Your Own AI Products
OpenAI's bounty scope effectively defines the attack vectors you should be checking in your own AI products:
- Prompt injection defense against external content (web pages, emails, documents)
- Action scope limits for agents (which APIs can they call?)
- Data exfiltration path blocking (can the agent send sensitive info externally?)
- Unauthorized action prevention (consent verification before destructive operations)
3. A Viable Side Hustle
Let's be honest — for developers who deeply understand AI, the Safety Bug Bounty is an attractive side hustle. Web security bounties are hyper-competitive, but AI safety is still early enough that opportunities are more abundant.
Honest Assessment
What's Good
- Financial incentive for AI safety research: Faster real-world attack vector discovery beyond academia
- Transparency: OpenAI publicly inviting external scrutiny of their products
- Industry standard setting: Other AI companies likely to follow with similar programs
What's Concerning
- Jailbreak exclusion may be too broad: "No clear safety impact" jailbreaks could still be genuinely dangerous
- 50% reproduction threshold is high: Intermittent but severe vulnerabilities might be filtered out
- Reward ambiguity: $100,000 is the "maximum" — most payouts will likely be much lower
- OpenAI-only program: Anthropic, Google, and others don't have equivalent programs yet
Agent-Era Security = The New Frontier
In our AI agent adoption reality analysis, security concerns were cited as one key reason 91.4% of enterprises can't get AI to production. OpenAI's Safety Bug Bounty is an attempt to solve this problem with community power.
In an era where AI agents send emails, process payments, and execute code — where Stripe Machine Payments Protocol enables AI to handle money directly — security isn't optional.
Are you interested in AI security research? Have you participated in bug bounties before?
References
- Introducing the OpenAI Safety Bug Bounty program — OpenAI, March 25, 2026
- OpenAI Launches Bug Bounty Program for Abuse and Safety Risks — SecurityWeek, March 2026
- Make OpenAI's models misbehave and earn a reward — Help Net Security, March 27, 2026
- Bug Bounty: Safety Bug Bounty - Bugcrowd — Bugcrowd, program page
Related Posts:
- NVIDIA Agent Toolkit Hands-On — OpenShell's approach to agent sandboxing
- AI Agent Adoption Reality: Only 8.6% in Production — Security concerns as an adoption barrier
- AI Agents Making Payments: Stripe MPP — Why agent security is non-negotiable