What You'll Learn
- Build a SOC analyst dashboard from scratch using Wazuh's visualization tools
- Understand which metrics matter during a shift and why
- Create the five core dashboard widgets every SOC needs
- Read alert timelines, severity distributions, and agent health panels
- Identify anomalies by comparing dashboard patterns to baseline activity
- Know when a dashboard pattern signals a real incident vs. normal fluctuation
Why Dashboards Matter
In Lessons 2.1 and 2.2, you learned what log sources feed the SIEM and how to read individual alerts. But a SOC analyst doesn't investigate alerts one at a time in a vacuum — you need the big picture. That's what dashboards give you.
A well-built dashboard is your situational awareness screen. It answers the questions you should be asking every 15 minutes during a shift:
- Are we under attack right now? (alert volume spike)
- What's being targeted? (top attacked hosts)
- Is anything broken? (agent health, data gaps)
- What's changed since last shift? (trend comparison)
Without a dashboard, you're staring at a list of individual alerts with no context. With a dashboard, you can spot a brute force campaign across 50 hosts in 2 seconds — something that would take 30 minutes scanning individual alerts.
SOC Reality: In most SOCs, the first thing an analyst does when sitting down for a shift is glance at the dashboard. If the graphs look normal, you start working your alert queue. If something looks wrong — a spike, a gap, a new pattern — that becomes your priority. The dashboard is your early warning system.
The Five Core Dashboard Widgets
Every SOC dashboard should include at minimum these five widgets. They represent the essential views that keep you aware of what's happening across your environment.
Widget 1: Alert Volume Over Time (Timeline)
This is the most important widget on any SOC dashboard. It shows alert count plotted against time — usually as a line chart or area chart with 5-minute or 15-minute buckets.
| What to Look For | What It Means | Action |
|---|---|---|
| Flat baseline | Normal activity — alerts arriving at expected rate | Continue routine triage |
| Sudden spike | Something changed — could be an attack, a misconfiguration, or a noisy rule | Investigate immediately: what alerts spiked? |
| Gradual increase | Slow-building activity — could be a scan, a brute force, or increasing system load | Monitor trend, check top alert types |
| Gap or drop to zero | Data stopped flowing — agent down, network issue, or attacker killed the agent | Critical — investigate the silence, not the noise |
| Periodic pattern | Scheduled tasks, backup jobs, or automated scans | Normal if it matches known schedules |
Silence Is More Dangerous Than Noise. A sudden drop in alert volume is often more alarming than a spike. If an agent that normally generates 200 events/hour suddenly goes silent, it could mean the endpoint was compromised and the attacker disabled logging — or the agent process was killed. Always investigate unexpected silence.
Widget 2: Alerts by Severity Distribution
A pie chart or donut chart showing the breakdown of alerts by severity level. This tells you instantly whether your queue is mostly noise or mostly fire.
| Distribution | Interpretation |
|---|---|
| 90% Low, 8% Medium, 2% High | Normal day — most events are informational noise. Focus on the 2% high alerts. |
| 60% Medium, 30% High, 10% Critical | Abnormal — something is generating a lot of medium+ alerts. Investigate the source. |
| Any Critical > 0 | Critical alerts should be rare. Even one demands immediate attention. |
In Wazuh terms:
- Low = levels 0-6
- Medium = levels 7-9
- High = levels 10-12
- Critical = levels 13-15
Widget 3: Top Attacked Hosts
A horizontal bar chart showing which agents (endpoints) are generating the most alerts. This immediately tells you where the action is.
| Pattern | What It Means |
|---|---|
| One host dominates | Targeted attack, misconfiguration, or a noisy application on that host |
| Even distribution | Broad scan or campaign hitting many hosts equally |
| New host appears | A system you haven't seen before is generating alerts — could be a new deployment or a compromised shadow IT system |
Baseline Knowledge Is Key. You need to know what "normal" looks like for your environment. If linux-web-01 normally generates 50 alerts/hour and suddenly it's at 500, that's significant. If WIN-SERVER-01 always tops the chart because it runs more services, that's expected. Building this baseline intuition takes about a week of shift work.
Widget 4: Top Alert Rules (Rule Frequency)
A table or bar chart showing which Wazuh rules are firing most frequently. This tells you what types of events dominate your environment.
| Common Top Rules | What They Indicate |
|---|---|
| 530 — Agent started | Normal — agents heartbeating |
| 5501 — SSH auth success | Normal — legitimate logins |
| 5551 — SSH brute force | Attention — active brute force campaign |
| 80790 — Event log cleared | Investigate — potential evidence destruction |
| 550 — Integrity checksum changed | Review — file changes on monitored paths |
When a rule that's normally quiet suddenly appears in the top 10, that's your signal to investigate. Conversely, when a normally noisy rule disappears, something changed on that endpoint.
Widget 5: Agent Health & Status
A status panel showing which agents are active, disconnected, or never connected. This is your data integrity check.
| Status | Meaning | Action |
|---|---|---|
| Active | Agent is connected and sending data | Normal |
| Disconnected | Agent stopped reporting — could be network issue, server reboot, or compromised | Investigate within 15 minutes |
| Never connected | Agent was registered but never sent data | Configuration issue — fix during maintenance |
Building a Dashboard in Wazuh
Wazuh uses OpenSearch Dashboards (the visualization layer) to create custom dashboards. Here's the conceptual workflow for building the SOC analyst dashboard described above.
Step 1: Understand Index Patterns
All Wazuh data is stored in indices following the pattern wazuh-alerts-*. When creating visualizations, you'll select this index pattern as your data source. Each document in the index is one alert — with all the fields you learned in Lesson 2.2 (rule.id, rule.level, agent.name, etc.).
Step 2: Choose the Right Visualization Type
| Widget Goal | Visualization Type | X-Axis / Bucket | Y-Axis / Metric |
|---|---|---|---|
| Alert volume over time | Line chart | Date histogram (timestamp) | Count |
| Severity distribution | Pie / Donut chart | Terms (rule.level ranges) | Count |
| Top attacked hosts | Horizontal bar | Terms (agent.name) | Count |
| Top rule frequency | Data table | Terms (rule.id, rule.description) | Count |
| Agent health | Metric / Status | Terms (agent.name) | Count by status |
Step 3: Apply Filters and Time Ranges
Dashboards should always have:
- Global time filter: Typically "Last 24 hours" for shift overview, "Last 1 hour" for active monitoring
- Auto-refresh: Set to 30 seconds or 1 minute during active monitoring
- Saved filters: Pre-configured filters for your most common queries (e.g., severity >= 10, specific agents)
Step 4: Arrange the Layout
The standard SOC dashboard layout follows the "attention hierarchy" — the most critical information should be at the top:
┌─────────────────────────────────────────────────────────────┐
│ ALERT VOLUME TIMELINE (full width — top position) │
├───────────────────────────┬─────────────────────────────────┤
│ SEVERITY DISTRIBUTION │ AGENT HEALTH STATUS │
│ (pie/donut — left) │ (status panel — right) │
├───────────────────────────┼─────────────────────────────────┤
│ TOP ATTACKED HOSTS │ TOP ALERT RULES │
│ (bar chart — left) │ (data table — right) │
└───────────────────────────┴─────────────────────────────────┘
Dashboard Real Estate Matters. Put the timeline across the full width at the top because it's the first thing you check. Severity and health go in the middle because they're your quick-reference panels. Detailed breakdowns (hosts, rules) go at the bottom because you only drill into them when something above catches your attention.
Reading Dashboards Like a Pro
Building a dashboard is only half the skill. Reading it — quickly, accurately, under pressure — is the other half. Here are the patterns that experienced SOC analysts recognize instantly.
Pattern 1: The Spike
A sudden, sharp increase in alert volume. Your immediate questions:
- When did it start? Zoom into the timeline
- What rule is firing? Check top rules widget — if one rule dominates, it's probably one source
- Which hosts? If one host, it's targeted. If many hosts, it's a campaign or a scan
- Is it real? Compare the rule to known false positive rules. A spike in rule 530 (agent heartbeat) is routine. A spike in rule 5551 (brute force) is an attack.
Pattern 2: The Gap
Alert volume drops to zero or near-zero unexpectedly. This is often more dangerous than a spike.
- Which agents went silent? Check agent health
- Is it a network issue? Check if multiple agents from the same subnet are affected
- Did someone disable logging? Check for Event ID 1102 (audit log cleared) right before the gap
- Is it maintenance? Check the change calendar — planned reboots cause expected gaps
Pattern 3: The New Normal
A gradual shift in baseline — maybe alert volume increased 20% over the past week. This is subtle and easy to miss.
- What's contributing to the increase? Check top rules — is a new rule firing that wasn't before?
- New infrastructure? Was a new server or application deployed?
- New attack campaign? External scanners and botnets often ramp up gradually
Pattern 4: The Rotation
Periodic patterns that repeat on a schedule — daily, weekly, or monthly. These are usually benign.
- Match to known schedules — backup jobs, patch cycles, vulnerability scans, batch processing
- Unexpected rotations — malware beaconing at regular intervals (e.g., every 4 hours = possible C2)
Regular Beaconing vs. Regular Jobs. Both create periodic patterns in your timeline. The difference: legitimate jobs match known schedules and generate expected rule IDs. Malware beaconing generates unusual network connections at suspicious intervals, often to unknown external IPs. In Lesson 2.4, you'll learn how to write queries to detect beaconing patterns.
SOC Shift Dashboard Workflow
Here's a practical workflow that experienced analysts follow during a shift:
Start of Shift (First 5 Minutes)
- Open the dashboard — set time range to "Last 8 hours" to see the previous shift's activity
- Check the timeline — any spikes or gaps during the previous shift?
- Read the handoff notes — what did the previous analyst flag?
- Check agent health — any disconnected agents?
- Scan top rules — anything unusual in the top 10?
During Shift (Every 15-30 Minutes)
- Glance at the timeline — compare current hour to the baseline
- Check severity distribution — any increase in high/critical percentage?
- Review top attacked hosts — any host that wasn't there before?
- Auto-refresh should be running — if it stops, check your connection
End of Shift (Last 10 Minutes)
- Screenshot the dashboard — attach to your shift report
- Note anomalies — anything unusual that the next analyst should watch
- Update handoff notes — ongoing investigations, pending escalations
- Compare to baseline — was this shift busier or quieter than normal?
Key Metrics to Track Over Time
Beyond the live dashboard, mature SOCs track metrics over longer time periods (weekly, monthly) to identify trends and measure effectiveness.
| Metric | What It Measures | Why It Matters |
|---|---|---|
| Mean Time to Detect (MTTD) | Average time from event occurrence to alert generation | Measures how fast your SIEM catches things |
| Mean Time to Respond (MTTR) | Average time from alert to resolution | Measures how fast your team acts |
| Alert Volume Trend | Week-over-week change in alert count | Increasing = expanding attack surface or failing rules |
| False Positive Rate | Percentage of alerts closed as FP | High FP rate = rules need tuning (Module 8) |
| Alert-to-Incident Ratio | Percentage of alerts that become incidents | Shows how well your rules filter signal from noise |
| Coverage Score | Number of ATT&CK techniques with detection rules | Measures breadth of your detection capability |
Track False Positive Rate Religiously. If your dashboard shows that 85% of alerts for a specific rule are false positives, that rule needs tuning. High false positive rates cause "alert fatigue" — analysts start ignoring alerts, which means they'll miss the real one hidden among the noise. Rule tuning is covered in depth in Lesson 2.5.
Common Dashboard Mistakes
| Mistake | Why It's Bad | Fix |
|---|---|---|
| Too many widgets | Information overload — you can't watch 20 panels at once | Limit to 6-8 widgets per dashboard |
| No time filter | Seeing all-time data is useless for shift monitoring | Default to "Last 24 hours" |
| No auto-refresh | Dashboard shows stale data — you miss live incidents | Set 30-60 second auto-refresh |
| All alerts, no filtering | Noise overwhelms signal | Create separate dashboards: one for overview, one for high-severity only |
| No baseline comparison | Can't tell if a pattern is normal or anomalous | Add a "same time last week" overlay or note expected baselines |
| Ignoring agent health | You don't know when data stops flowing | Always include an agent status widget |
Key Takeaways
- A SOC dashboard is your situational awareness screen — it answers "what's happening right now?" at a glance
- Five core widgets: alert timeline, severity distribution, top attacked hosts, top rules, and agent health
- Silence (a gap in data) is often more dangerous than noise (a spike in alerts) — always investigate unexpected drops
- Dashboard reading is a skill: learn to recognize spikes, gaps, new normals, and periodic patterns
- Follow the shift workflow: check dashboard at start, monitor every 15-30 minutes, document at end of shift
- Track long-term metrics (MTTD, MTTR, false positive rate) to measure and improve SOC performance
- Avoid common mistakes: too many widgets, no auto-refresh, no baseline comparison, ignoring agent health
Knowledge Check: Dashboards & Visualizations
10 questions · 70% to pass
You're starting your SOC shift and glance at the alert timeline. You notice alert volume dropped to zero for a 30-minute window, then resumed normally. What should you do?
What does Mean Time to Respond (MTTR) measure, and why does it matter for SOC operations?
Your severity distribution widget shows 30% High and 10% Critical alerts — far above the normal 2% High baseline. What pattern is this, and what do you check first?
In the standard SOC dashboard layout, why does the alert volume timeline go at the top as a full-width widget?
Your dashboard shows a periodic spike every 4 hours in network connection alerts to the same external IP. How do you distinguish this from a legitimate scheduled job?
What is the biggest risk of having a high false positive rate on your SOC dashboard, and which metric tracks this?
Which common dashboard mistake is most likely to cause you to miss an active incident during your shift?
In Lab 1.1, you investigated alerts on the Wazuh dashboard. If you were building a 'Top Attacked Hosts' widget for that lab environment, which agent would you expect to dominate the chart based on the pre-loaded data?
In Lab 1.3, you explored 10 log sources across 4 agents. If you built an agent health widget and one of the 4 agents suddenly showed 'Disconnected' status, which agent going silent would concern you most from a security perspective?
In the Wazuh lab environment from Labs 1.1 and 1.3, you saw rule 530 (Agent started) fire for each agent. On a real SOC dashboard's 'Top Rules' widget, rule 530 appears at the top with the highest count. Is this concerning?
0/10 answered