Tutorial: Overview Dashboard – Real-Time System Monitoring & Incident Feed
Monitor system health, track security incidents, and respond to threats in real-time through the main dashboard overview.
What You’ll Learn
- ✅ Understand the Overview dashboard layout and key panels
- ✅ Interpret real-time metrics (throughput, latency, error rate)
- ✅ Read and act on the incident feed
- ✅ Understand the pressure gauge and health status indicators
- ✅ Monitor protocol activity and active threat sources
- ✅ Respond to critical incidents from the overview
- ✅ Filter and drill down into specific incidents
Prerequisites
- Apparatus running — Server accessible at
http://localhost:8090 - Web dashboard open — Navigate to http://localhost:8090/dashboard
- Active traffic — Some traffic/attacks hitting the system (real or simulated)
Time Estimate
~20 minutes (walkthrough + hands-on incident response)
What You’ll Build
By the end, you’ll be able to:
- Monitor system health at a glance
- Understand real-time metrics and what they indicate
- Interpret the incident feed and prioritize threats
- Respond quickly to critical incidents
- Drill down into specific events for analysis
Section 1: The Overview Dashboard Layout
What is the Overview?
The Overview Dashboard is the main landing page of Apparatus. It provides a holistic view of system health, active threats, and incident timeline in real-time.
Think of it as your 24/7 monitoring console — a single pane of glass to understand what’s happening on your system right now.
Main Sections
Try It: Open the Overview
- Open the dashboard:
http://localhost:8090/dashboard - By default, you’ll see the Overview console
- If not, click Overview in the left sidebar
Checkpoint
- Overview dashboard visible
- Key metrics section showing (Throughput, Error Rate, Latency)
- Pressure gauge visible at top
- Incident feed visible (may be empty if no traffic yet)
- Protocol activity chart showing
Section 2: Understanding Key Metrics
Real-Time Metrics Explained
The dashboard displays four key metrics updated in real-time:
1. Throughput (Requests Per Second - RPS)
Throughput: 145 RPS ↑ Normal
What it is: Number of requests being processed per second.
What’s good:
- 🟢 Stable — Stays consistent (e.g., 100-120 RPS)
- ↑ Slightly rising — Increased legitimate traffic
- ↓ Slightly falling — Normal variation
What’s concerning:
- 🔴 Drops to 0 — System not responding
- 🔴 Spikes > 2x normal — Possible attack or load surge
- 🟠 Gradual decline — System degradation under load
Action:
- Dropping? Check if services are running
- Spiking? Look at Attacker Fingerprinting console
- If normal but error rate high? Look at incident feed
2. Error Rate (%)
Error Rate: 2.3% ↓ Decreasing
What it is: Percentage of requests that returned error status (4xx, 5xx).
What’s good:
- 🟢 < 0.1% — Excellent (only occasional errors)
- 🟡 0.1–1% — Normal (expected transient errors)
- 🟡 1–5% — Acceptable (some issues but manageable)
What’s concerning:
- 🔴 > 5% — High error rate, system struggling
- 🔴 Rising sharply — Degradation detected
- 🔴 All requests failing (100%) — Complete outage
Action:
- Rising? Check incident feed for reasons
- High? Review defense logs (WAF blocking legitimate traffic?)
- Erratic? Look for chaos engineering events
3. Average Latency (Response Time)
Latency: avg 87ms ↑ Rising
What it is: Average time to respond to a request, measured in milliseconds.
What’s good:
- 🟢 < 100ms — Excellent (fast)
- 🟡 100–350ms — Good (acceptable)
- 🟠 350–750ms — Degraded (slow but functional)
What’s concerning:
- 🔴 > 750ms — Critical slowdown (users frustrated)
- 🔴 Spiking suddenly — Possible attack (tarpit, chaos)
- 🔴 Gradually increasing — Resource exhaustion
Percentiles shown:
- P95 — 95% of requests faster than this
- P99 — 99% of requests faster than this
Example:
Avg: 87ms | P95: 220ms | P99: 450ms
→ Most requests fast, but slowest 1% take 450ms
Action:
- Rising? Check Chaos Console (CPU/memory spike?)
- Tarpit active? Check Attacker Fingerprinting
- Gradual increase? Performance degradation, investigate
4. Active Sources (IP Count)
Active Sources: 42
What it is: Number of unique IP addresses that have made requests in the last 5 minutes.
What’s good:
- 🟢 Stable count — Consistent user base
- 🟢 Slow growth — Normal organic growth
What’s concerning:
- 🔴 Sharp spike — Possible DDoS attack
- 🔴 Many unknown external IPs — Investigate in Attacker Fingerprinting
Action:
- Spike detected? Go to Attacker Fingerprinting console
- High unknown external? Check risk scores
Section 3: Reading the Pressure Gauge
What is the Pressure Gauge?
The Pressure Gauge shows overall system health on a scale from STABLE → ELEVATED → CRITICAL.
🟢 STABLE 🟡 ELEVATED 🔴 CRITICAL
━━━━━━━━ ━━━━━━━━ ━━━━━━━━
Lag < 50ms 50-200ms > 200ms
All routes OK Heavy routes Request shedding
shedding
Pressure Calculation
The gauge is calculated from event loop lag:
- Event loop lag = How long JavaScript takes to process each tick
- High lag means system is struggling to keep up
- Too much traffic or CPU-intensive operations increase lag
Interpreting Pressure States
🟢 STABLE (Green)
Pressure: STABLE | Lag: 12ms
- System running smoothly
- All routes responding normally
- No load shedding
Action: All good! Continue monitoring.
🟡 ELEVATED (Yellow)
Pressure: ELEVATED | Lag: 95ms
- System starting to struggle
- Heavier routes (like
/generate) may shed traffic - SSE connections may be delayed
Likely causes:
- High traffic spike
- CPU spike (chaos engineering)
- Memory pressure
- Complex computations
Action:
- Monitor trends
- If worsening, activate mitigation
- Check Active Resources
🔴 CRITICAL (Red)
Pressure: CRITICAL | Lag: 250ms+
- System heavily overloaded
- All heavy routes shed traffic
- Requests may be rejected with 503
Likely causes:
- Sustained DDoS attack
- Runaway CPU/memory chaos
- Cascade failure
Action: IMMEDIATE
- Check Attacker Fingerprinting for active threats
- Check Chaos Console for active experiments
- Activate rate limiting or Tarpit
- Blackhole malicious IPs if necessary
Try It: Monitor Pressure Under Load
Goal: Observe pressure gauge change as load increases.
Steps:
- Note current pressure (probably 🟢 STABLE)
- Trigger a CPU spike:
curl -X POST http://localhost:8090/chaos/cpu -d '{"duration": 15000}' - Watch the Overview dashboard as pressure changes:
- Lag increases
- Gauge moves toward 🟡 ELEVATED
- Metrics may show higher latency
- After spike completes, pressure returns to 🟢 STABLE
Checkpoint
- Understand pressure gauge meaning
- Know the three pressure states
- Understand event loop lag concept
- Can read current pressure from dashboard
Section 4: Interpreting the Incident Feed
What is the Incident Feed?
The Incident Feed is a real-time timeline of security events, anomalies, and system state changes. Each incident is color-coded by severity:
| Color | Level | Meaning |
|---|---|---|
| 🔴 Red | CRITICAL | Immediate threat or system failure |
| 🟠 Orange | ERROR | Errors or potential issues |
| 🟡 Yellow | WARNING | Anomaly or degradation detected |
| 🟢 Green | INFO | Normal activity, status updates |
Incident Types
Type 1: Defense Blocks
🔴 [CRITICAL] Defense block triggered
192.168.1.50 (unknown_external, risk: 92)
912 XSS attempts, 45 SQLi attempts
Blocked rate: 98.7%
What happened: WAF detected and blocked malicious requests.
What to do:
- Go to Attacker Fingerprinting
- Find the IP
- Review attack types
- Consider tarpit or blackhole
Type 2: High Latency
🟡 [WARNING] High latency detected
Avg response time: 850ms (>350ms threshold)
P95: 2100ms (>750ms threshold)
Affected routes: /generate, /redteam/validate
What happened: Response times exceeded normal thresholds.
Likely causes:
- Heavy load
- CPU/memory chaos active
- Database/external service slow
- Tarpit active on many requests
What to do:
- Check pressure gauge
- Check Chaos Console for active experiments
- Check Attacker Fingerprinting (high tarpit activity?)
- Check if load test is running (k6 scenario?)
Type 3: Chaos Event
🔴 [CRITICAL] Chaos event detected
CPU spike: 5000ms duration active
Memory spike: 256MB allocated
Expected impact: 30–50% latency increase
What happened: Intentional chaos experiment is running.
What to do:
- Monitor effects on error rate and latency
- If chaos is unexpected, stop it via Chaos Console
- If expected, continue observing
Type 4: Traffic Anomaly
🟠 [ERROR] Traffic pattern anomaly
RPS spike detected: 145 → 520 RPS (+259%)
New sources: 18 unknown external IPs
Possible attack: Check Attacker Fingerprinting
What happened: Unusual traffic pattern detected.
What to do:
- Check Attacker Fingerprinting for new high-risk IPs
- Review traffic patterns
- Determine if legitimate (load test) or attack
- Respond accordingly
Type 5: Defense Activation
🟡 [WARNING] Defense activated
Moving Target Defense: Prefix rotated
New prefix: "xyz-789-abc"
All calls must use new prefix
What happened: A defense mechanism was activated.
What to do:
- If expected, update clients
- If unexpected, investigate who activated it
Type 6: System Status
🟢 [INFO] Traffic normal
145 requests/sec, 2.3% error rate
Avg latency: 87ms
Active sources: 42
What happened: Periodic heartbeat showing system is healthy.
What to do:
- Just continue monitoring
- A good sign of system stability
Try It: Generate an Incident
Goal: Trigger an incident and see it appear in the feed.
Steps:
- Generate some traffic/attack (use Autopilot or payload fuzzer)
- Watch the incident feed
- You should see incidents like:
🔴 [CRITICAL] Defense block triggered... 🟡 [WARNING] High latency detected...
Checkpoint
- Understand incident feed purpose
- Know the severity color coding
- Can identify incident types
- Understand what action to take for each type
Section 5: Incident Response Workflow
Scenario: Critical Incident During Monitoring
You see this in the incident feed:
🔴 [CRITICAL] Defense block triggered
203.0.113.45 (unknown_external, risk: 88)
847 XSS attempts in 30 seconds
Blocked rate: 99.5%
...
Error Rate jumped from 2% to 8%
Latency spiked to 450ms (P99)
What do you do?
Step-by-Step Response
Step 1: Assess Severity (30 seconds)
Questions to ask:
- Is error rate still rising? (check trend)
- Is latency still high? (check dashboard)
- Are new sources attacking? (check Attacker Fingerprinting)
Expected outcome:
- Understand if incident is ongoing or contained
- Determine urgency (ongoing crisis vs. past event)
Step 2: Investigate the Source (1 minute)
Action:
1. Go to Attacker Fingerprinting console
2. Find IP 203.0.113.45
3. Review:
- Risk score (88 = high)
- Category (unknown_external)
- Attack types (XSS focus)
- Success rate (0.5% bypassed!)
- Protocol heatmap
Key finding: They bypassed WAF on 4 requests!
Step 3: Respond Immediately (1 minute)
Action:
If < 1% bypass rate (acceptable):
→ Tarpit the IP
→ Monitor for continued activity
If > 1% bypass rate (concerning):
→ Blackhole immediately
→ Note the XSS payload that bypassed
→ Update WAF rules
In this case: Blackhole (4 requests bypassed the WAF).
Step 4: Investigate the Bypass (2–5 minutes)
Action:
1. Click [Details] on attacker profile
2. Review the 4 successful requests
3. Copy the XSS payloads
4. Analyze:
- What made them bypass?
- Is it a new technique?
- Is it a known CVE?
5. Update WAF rule to prevent recurrence
Example finding:
Payload: <img src=x onerror="eval(String.fromCharCode(...))">
Issue: WAF didn't decode Unicode escapes
Fix: Update XSS rule to normalize Unicode
Step 5: Document & Monitor (ongoing)
Action:
Create incident report:
- Time: 14:32 UTC
- Attacker: 203.0.113.45
- Attack: XSS scanner (automated)
- Requests: 847 total, 4 bypassed
- Action: Blackhole
- Finding: WAF bypass via Unicode encoding
- Fix: Updated XSS rule
- Status: CONTAINED & FIXED
Monitoring:
- Watch if attacker tries again (they’re blackholed)
- Monitor error rate (should return to normal)
- Verify latency recovers
Try It: Simulate and Respond to an Incident
Prerequisites: Have Apparatus running with active traffic.
Workflow:
- Generate an attack:
curl -X POST http://localhost:8090/api/redteam/autopilot/start \ -d '{"target": "http://localhost:8090"}' - Monitor the Overview:
- Watch incident feed for events
- Note metrics changes
- Respond:
- Go to Attacker Fingerprinting
- Find top attacking IPs
- Take response action (tarpit/blackhole)
- Monitor effects on dashboard
- Document:
- Note what you found
- Describe actions taken
- Predict future similar incidents
Checkpoint
- Understand incident response workflow
- Can quickly assess severity
- Know when to tarpit vs. blackhole
- Can investigate bypass attempts
- Can document findings
Section 6: Monitoring Best Practices
✅ DO: Monitor Trends, Not Individual Data Points
❌ WRONG:
Latency is 150ms, that's high!
→ One data point, not enough context
✅ RIGHT:
Latency was 80ms, now 150ms, continuing to rise
→ Trend indicates degradation
→ Action: Investigate cause
✅ DO: Correlate Metrics
When you see:
↑ Error rate rising
↑ Latency rising
↑ RPS staying stable
Conclusion: System struggling, not attack
→ Check resource usage, not attacker logs
✅ DO: Act on Incidents Immediately
🔴 CRITICAL incident appears
→ Don't wait, investigate immediately
→ Blackhole if confirmed malicious
→ Document findings
❌ DON’T: Ignore Yellow Warnings
🟡 [WARNING] appears
→ Don't assume it's not important
→ Investigate the trend
→ Act before it becomes 🔴 CRITICAL
❌ DON’T: Trust Metrics Alone
Throughput looks normal (140 RPS)
Error rate looks normal (2%)
But latency is 1500ms
→ There's a hidden problem!
→ Investigate before declaring all-clear
Summary
You’ve learned:
- ✅ Overview dashboard layout and main sections
- ✅ Understanding real-time metrics (throughput, error rate, latency, active sources)
- ✅ Reading the pressure gauge and system health states
- ✅ Interpreting the incident feed and incident types
- ✅ Incident response workflow (assess → investigate → respond → document)
- ✅ Monitoring best practices and common pitfalls
Next Steps
- Respond to threats: Tutorial: Attacker Fingerprinting
- Inject chaos: Tutorial: Chaos Engineering
- Set up defenses: Tutorial: Defense Rules
- Deep dive metrics: Tutorial: Monitoring
Last Updated: 2026-02-22
For real-time security response, see Tutorial: Attacker Fingerprinting.