Documentation

Assessment Results

Understanding and interpreting security assessment results

📋

Understanding Assessment Results

Learn how to interpret security assessment results, understand risk levels, and implement actionable recommendations to improve your AI model's security posture.

Getting Assessment Results

Retrieve Assessment Results

Once an assessment is complete, retrieve detailed results including scores, risk levels, and recommendations.

PYTHON
from modelred import ModelRed

async with ModelRed() as client:
    # Get comprehensive assessment results
    results = await client.get_assessment_results(assessment_id)

    # Key properties
    print(f"📊 Assessment Results Summary:")
    print(f"   Overall Score: {results.overall_score}/10")
    print(f"   Risk Level: {results.risk_level.value}")
    print(f"   Tests Passed: {results.passed_tests}/{results.total_tests}")
    print(f"   Assessment ID: {results.assessment_id}")
    print(f"   Model ID: {results.model_id}")
    print(f"   Started: {results.started_at}")
    print(f"   Completed: {results.completed_at}")
    print(f"   Report URL: {results.report_url}")

    # Detailed category analysis
    if results.categories:
        print(f"\n📈 Category Scores:")
        for category, score in results.categories.items():
            print(f"   {category}: {score}/10")

    # Security recommendations
    if results.recommendations:
        print(f"\n💡 Recommendations ({len(results.recommendations)}):")
        for i, rec in enumerate(results.recommendations, 1):
            print(f"   {i}. {rec}")

Result Structure

PropertyTypeDescription

overall_score

float

Overall security score (0-10)

risk_level

RiskLevel

LOW, MEDIUM, HIGH, or CRITICAL

total_tests

int

Total number of tests executed

passed_tests

intNumber of tests passed

failed_tests

intNumber of tests failed

categories

Dict<str, float>

Scores by security category

recommendations

List<str>

Actionable security recommendations

Security Score Interpretation

🎯 Security Score Scale (0-10)

Understand what your security score means and how to interpret the results.

🟢

8.0 - 10.0 Excellent

Strong security posture with minimal vulnerabilities

Actions: Maintain current security measures, regular monitoring

🔵

6.0 - 7.9 Good

Generally secure with some areas for improvement

Actions: Address moderate findings, implement recommendations

🟠

4.0 - 5.9 Fair

Notable security concerns that should be addressed

Actions: Priority fixes needed, review deployment readiness

🔴

0.0 - 3.9 Poor

Significant vulnerabilities requiring immediate attention

Actions: Critical fixes required before production deployment

Risk Level Classification

⚠️ Risk Level Meanings

Risk levels provide a quick assessment of the overall security posture and urgency of remediation.

🟢

LOW RISK

Minimal security concerns, safe for production

DEPLOY READY

🟡

MEDIUM RISK

Moderate vulnerabilities, review recommendations

REVIEW NEEDED

🟠

HIGH RISK

Significant security risks, fixes recommended

FIXES NEEDED

🔴

CRITICAL RISK

Severe vulnerabilities, immediate action required

URGENT

Category Analysis

📈 Understanding Category Scores

Break down results by security category to identify specific areas of strength and weakness.

PYTHON
async def analyze_category_scores(results):
    """Analyze and categorize security scores by domain"""

    print("📊 Detailed Category Analysis:")
    print("=" * 40)

    # Category descriptions
    category_info = {
        "prompt_injection": {
            "name": "Prompt Injection",
            "description": "Resistance to prompt manipulation attacks"
        },
        "jailbreak": {
            "name": "Jailbreak Prevention",
            "description": "Effectiveness of safety guardrails"
        },
        "content_safety": {
            "name": "Content Safety",
            "description": "Harmful content filtering capabilities"
        },
        "encoding_attacks": {
            "name": "Encoding Attacks",
            "description": "Handling of encoded malicious inputs"
        },
        "behavioral_manipulation": {
            "name": "Behavioral Manipulation",
            "description": "Instruction adherence and consistency"
        },
        "data_extraction": {
            "name": "Data Extraction",
            "description": "Protection against data leakage attempts"
        }
    }

    # Analyze each category
    strengths = []
    weaknesses = []

    for category, score in results.categories.items():
        info = category_info.get(category, {"name": category, "description": "Security assessment"})

        # Determine status
        if score >= 8.0:
            status = "🟢 STRONG"
            color_code = "\033[92m"  # Green
            strengths.append((category, score))
        elif score >= 6.0:
            status = "🟡 MODERATE"
            color_code = "\033[93m"  # Yellow
        elif score >= 4.0:
            status = "🟠 WEAK"
            color_code = "\033[91m"  # Red
            weaknesses.append((category, score))
        else:
            status = "🔴 CRITICAL"
            color_code = "\033[91m"  # Red
            weaknesses.append((category, score))

        print(f"{status} {info['name']}: {score}/10")
        print(f"   {info['description']}")
        print()

    # Summary
    print("📋 Summary:")
    if strengths:
        print(f"   ✅ Strong Areas: {len(strengths)}")
        for cat, score in strengths:
            print(f"      • {category_info.get(cat, {}).get('name', cat)}: {score}/10")

    if weaknesses:
        print(f"   ⚠️  Areas for Improvement: {len(weaknesses)}")
        for cat, score in weaknesses:
            print(f"      • {category_info.get(cat, {}).get('name', cat)}: {score}/10")

    return {
        'strengths': strengths,
        'weaknesses': weaknesses,
        'avg_score': sum(results.categories.values()) / len(results.categories)
    }

# Usage
analysis = await analyze_category_scores(results)

Processing Recommendations

💡 Acting on Security Recommendations

Transform assessment recommendations into actionable security improvements.

PYTHON
def process_recommendations(results):
    """Process and categorize security recommendations"""

    if not results.recommendations:
        print("🎉 Excellent! No security recommendations needed.")
        return

    print(f"🔧 Security Recommendations ({len(results.recommendations)}):")
    print("=" * 50)

    # Categorize recommendations
    categories = {
        'input_validation': [],
        'content_filtering': [],
        'rate_limiting': [],
        'monitoring': [],
        'configuration': [],
        'general': []
    }

    for i, rec in enumerate(results.recommendations, 1):
        print(f"\n{i}. {rec}")

        # Categorize recommendation
        rec_lower = rec.lower()
        if any(keyword in rec_lower for keyword in ['input', 'validation', 'sanitiz']):
            categories['input_validation'].append(rec)
            print("   🔍 Category: Input Validation")
            print("   📋 Action: Implement stricter input filtering and sanitization")

        elif any(keyword in rec_lower for keyword in ['content', 'filter', 'moderat']):
            categories['content_filtering'].append(rec)
            print("   🛡️ Category: Content Filtering")
            print("   📋 Action: Add content moderation layers")

        elif any(keyword in rec_lower for keyword in ['rate', 'limit', 'throttl']):
            categories['rate_limiting'].append(rec)
            print("   ⏱️ Category: Rate Limiting")
            print("   📋 Action: Implement API rate limits and usage controls")

        elif any(keyword in rec_lower for keyword in ['monitor', 'log', 'alert']):
            categories['monitoring'].append(rec)
            print("   📊 Category: Monitoring")
            print("   📋 Action: Enhance logging and alerting systems")

        elif any(keyword in rec_lower for keyword in ['config', 'setting', 'parameter']):
            categories['configuration'].append(rec)
            print("   ⚙️ Category: Configuration")
            print("   📋 Action: Review and update model configuration")

        else:
            categories['general'].append(rec)
            print("   🔧 Category: General Security")
            print("   📋 Action: Review and implement security best practices")

    # Priority recommendations
    print(f"\n🎯 Implementation Priority:")
    priority_order = ['input_validation', 'content_filtering', 'rate_limiting', 'monitoring', 'configuration', 'general']

    for priority, category in enumerate(priority_order, 1):
        if categories[category]:
            category_name = category.replace('_', ' ').title()
            print(f"   {priority}. {category_name} ({len(categories[category])} items)")

    return categories

# Usage
recommendation_categories = process_recommendations(results)

Result Export and Reporting

📄 Export Assessment Results

Export results in various formats for reporting, compliance, and record-keeping.

PYTHON
import json
import csv
from datetime import datetime

def export_assessment_results(results, format='json'):
    """Export assessment results in various formats"""

    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    model_id = results.model_id.replace('/', '_')  # Safe filename

    # Prepare export data
    export_data = {
        'assessment_id': results.assessment_id,
        'model_id': results.model_id,
        'overall_score': results.overall_score,
        'risk_level': results.risk_level.value,
        'total_tests': results.total_tests,
        'passed_tests': results.passed_tests,
        'failed_tests': results.failed_tests,
        'pass_rate': (results.passed_tests / results.total_tests * 100) if results.total_tests > 0 else 0,
        'started_at': results.started_at.isoformat() if results.started_at else None,
        'completed_at': results.completed_at.isoformat() if results.completed_at else None,
        'categories': results.categories,
        'recommendations': results.recommendations,
        'probes_used': results.probes_used or [],
        'report_url': results.report_url
    }

    if format.lower() == 'json':
        filename = f"assessment_{model_id}_{timestamp}.json"
        with open(filename, 'w') as f:
            json.dump(export_data, f, indent=2, default=str)
        print(f"✅ Results exported to {filename}")

    elif format.lower() == 'csv':
        filename = f"assessment_{model_id}_{timestamp}.csv"

        # Flatten data for CSV
        csv_data = []

        # Basic info
        basic_row = {
            'assessment_id': export_data['assessment_id'],
            'model_id': export_data['model_id'],
            'overall_score': export_data['overall_score'],
            'risk_level': export_data['risk_level'],
            'total_tests': export_data['total_tests'],
            'passed_tests': export_data['passed_tests'],
            'failed_tests': export_data['failed_tests'],
            'pass_rate': export_data['pass_rate']
        }

        # Add category scores
        for category, score in export_data['categories'].items():
            basic_row[f'category_{category}'] = score

        csv_data.append(basic_row)

        if csv_data:
            with open(filename, 'w', newline='') as f:
                writer = csv.DictWriter(f, fieldnames=csv_data[0].keys())
                writer.writeheader()
                writer.writerows(csv_data)
            print(f"✅ Results exported to {filename}")

    elif format.lower() == 'summary':
        filename = f"summary_{model_id}_{timestamp}.txt"

        with open(filename, 'w') as f:
            f.write(f"SECURITY ASSESSMENT SUMMARY\n")
            f.write(f"=" * 40 + "\n\n")
            f.write(f"Model: {export_data['model_id']}\n")
            f.write(f"Assessment ID: {export_data['assessment_id']}\n")
            f.write(f"Completed: {export_data['completed_at']}\n\n")

            f.write(f"OVERALL RESULTS\n")
            f.write(f"Security Score: {export_data['overall_score']}/10\n")
            f.write(f"Risk Level: {export_data['risk_level']}\n")
            f.write(f"Tests Passed: {export_data['passed_tests']}/{export_data['total_tests']} ({export_data['pass_rate']:.1f}%)\n\n")

            if export_data['categories']:
                f.write(f"CATEGORY SCORES\n")
                for category, score in export_data['categories'].items():
                    f.write(f"{category}: {score}/10\n")
                f.write("\n")

            if export_data['recommendations']:
                f.write(f"RECOMMENDATIONS\n")
                for i, rec in enumerate(export_data['recommendations'], 1):
                    f.write(f"{i}. {rec}\n")

        print(f"✅ Summary exported to {filename}")

    return filename

# Usage examples
json_file = export_assessment_results(results, 'json')
csv_file = export_assessment_results(results, 'csv')
summary_file = export_assessment_results(results, 'summary')

Best Practices

💡 Results Analysis Best Practices

Interpretation

Focus on category-specific weaknesses
Prioritize recommendations by impact
Track improvements over time
Consider business context

Action Planning

Set minimum score thresholds
Create remediation timelines
Document security decisions
Regular reassessment schedule

Next Steps