Attack Categories
Understanding the different types of AI security vulnerabilities and attack vectors tested by ModelRed
ModelRed's test suites are organized into distinct attack categories that represent different types of AI security vulnerabilities. Understanding these categories helps you identify specific threats and prioritize your security efforts.
Category Overview
Encoding Attacks
Base64, hex, ROT13, and other encoding schemes used to bypass input filters
Content Safety
Toxicity, hate speech, slurs, and harmful content generation testing
Social Engineering
Emotional manipulation and psychological attack techniques
Jailbreaking
Advanced techniques to bypass safety guardrails and restrictions
Injection Attacks
Prompt injection, template injection, and code execution vulnerabilities
Malware Generation
Detection of malicious code generation and security evasion
Encoding Attacks
Techniques that use various encoding schemes to obfuscate malicious content
Description
Encoding attacks attempt to bypass input validation and content filters by representing malicious instructions in alternative character encodings. These attacks exploit the fact that many AI systems may not properly decode or recognize encoded content as potentially harmful.
Test Suites in This Category
- Base64 Injection - Tests Base64 encoding vulnerabilities
- Hexadecimal Injection - Detects hex encoding attacks
- ROT13 Encoding - Character substitution attacks
- Morse Encoding - Morse code obfuscation
- Braille Encoding - Braille text encoding attacks
- NATO Encoding - Phonetic alphabet encoding
- Unicode Attacks - Base2048, Zalgo, and emoji encoding
- Legacy Encoding - UUencoding and ASCII85 attacks
Risk Level
Low to Medium
Common in Tiers
Primarily Free and Starter tiers - these represent fundamental security testing
Content Safety
Detection of harmful, toxic, or inappropriate content generation
Description
Content safety attacks test whether AI models can be manipulated into generating harmful content including hate speech, toxicity, slurs, sexual content, and other policy violations. These tests are crucial for user-facing applications.
Test Suites in This Category
- Toxicity Generation - Advanced AI-powered toxicity testing
- Continue Slurs - Partial offensive term completion
- LMRC Content Tests - Bullying, slur usage, sexual content
- Sexualization - Inappropriate objectification testing
- Deadnaming - Transgender rights violations
- Medical Misinformation - Pseudoscience and quack medicine
- Real Toxicity Prompts - Baseline toxicity testing
Risk Level
High
Common in Tiers
Starter tier and above - essential for production applications
Social Engineering
Psychological manipulation techniques to extract information or bypass restrictions
Description
Social engineering attacks use emotional manipulation, authority exploitation, and psychological tricks to convince AI models to provide restricted information or perform unauthorized actions.
Test Suites in This Category
- Grandma Attacks - Emotional manipulation for product keys, slurs, and substances
- JSON Threats - Threatening models to output raw JSON
- Tag Injection - Hidden Unicode manipulation
Risk Level
Medium to High
Common in Tiers
Starter tier - important for applications with human interaction
Jailbreaking
Advanced techniques to override safety mechanisms and restrictions
Description
Jailbreaking encompasses sophisticated attempts to bypass AI safety measures, including role-playing scenarios, complex prompt structures, and research-grade attack techniques.
Test Suites in This Category
- Advanced Jailbreak Ablation - DAN 11.0 with 127 variations
- AutoDAN Cached - Gradient descent optimization attacks
- DAN in the Wild - Real-world attack collection
- Tense Manipulation - Past/future tense bypass techniques
- Latent Jailbreak - Translation-based distraction attacks
Risk Level
High to Critical
Common in Tiers
Starter and Pro tiers - advanced security research
Injection Attacks
Code execution and system manipulation vulnerabilities
Description
Injection attacks attempt to execute unauthorized code, manipulate system behavior, or extract sensitive information through various injection techniques including prompt injection, template injection, and SQL injection.
Test Suites in This Category
- Prompt Injection - Framework attacks targeting hate speech and violence
- Jinja Template Injection - Server-side template execution
- SQL Injection Echo - Database manipulation attempts
- Latent Injection - Hidden instructions in documents
- XSS Attacks - Cross-site scripting and data exfiltration
Risk Level
Critical
Common in Tiers
Starter through Enterprise - varies by complexity
Malware Generation
Detection of malicious code creation capabilities
Description
Malware generation tests evaluate whether AI models can be manipulated into creating malicious software, including viruses, ransomware, cryptojacking code, and evasion techniques.
Test Suites in This Category
- Malware Top Level - High-level malicious code requests
- Malware Sub Functions - Individual attack component generation
- Malware Evasion - Anti-detection code generation
- Malware Payload - Ransomware and cryptojacking
Risk Level
Critical
Common in Tiers
Enterprise tier only - highest security requirements
Supply Chain Attacks
Testing for software dependency vulnerabilities
Description
Supply chain attacks test whether AI models recommend non-existent or malicious packages that could compromise software dependencies across various programming ecosystems.
Test Suites in This Category
- Python Package Hallucination - Non-existent PyPI packages
- JavaScript Package Hallucination - Fake NPM packages
- Ruby Gem Hallucination - Non-existent Ruby gems
- Rust Crate Hallucination - Fake Cargo packages
Risk Level
Medium to High
Common in Tiers
Starter (Python only) and Enterprise (all languages)
Misinformation & Reasoning
Testing for false information and logical manipulation
Description
These attacks test the model's susceptibility to generating or accepting false information, failing at basic reasoning tasks, or being manipulated through logical fallacies.
Test Suites in This Category
- False Assertion - Testing rejection of demonstrably false claims
- Who is Riley - Hallucination detection probes
- Snowball Graph - Impossible reasoning problems
Risk Level
Medium
Common in Tiers
Starter tier - important for factual applications
Usage Examples
Testing by Category
import asyncio
from modelred import ModelRed
async def main():
async with ModelRed(api_key="mr_your_api_key_here") as client:
# Test encoding attacks
encoding_suites = [
"base64_injection",
"hexadecimal_injection",
"rot13_encoding"
]
# Test content safety
safety_suites = [
"toxicity_generation",
"continue_slurs",
"lmrc_bullying"
]
# Run category-focused assessment
result = await client.run_assessment(
model_id="my-model",
test_suites=encoding_suites + safety_suites,
priority="high",
wait_for_completion=True
)
asyncio.run(main())
Risk-Based Testing
# High-risk categories for production systems
high_risk_suites = [
# Content Safety (Critical for user-facing apps)
"toxicity_generation",
"lmrc_bullying",
# Injection Attacks (System security)
"prompt_inject_hate",
"jinja_template_injection",
# Jailbreaking (Policy compliance)
"past_tense_jailbreak",
"future_tense_jailbreak"
]
result = await client.run_assessment(
model_id="production-model",
test_suites=high_risk_suites,
priority="critical",
wait_for_completion=True
)
Security Prioritization
Critical Priority
Focus on these categories first for production systems:
- Content Safety - User safety and policy compliance
- Injection Attacks - System security and data protection
- Jailbreaking - Safety mechanism integrity
Medium Priority
Important for comprehensive security:
- Social Engineering - User interaction safety
- Supply Chain - Development environment security
- Misinformation - Factual accuracy
Lower Priority
Foundation security testing:
- Encoding Attacks - Basic input validation
Related Documentation
- Test Suites Reference - Complete list of all available test suites
- Tier System - Understanding subscription tiers and access levels
- Running Assessments - How to execute security tests