CyberBlue Academy — Blue Team & SOC Training

What You'll Learn

Build a SOC analyst dashboard from scratch using Wazuh's visualization tools
Understand which metrics matter during a shift and why
Create the five core dashboard widgets every SOC needs
Read alert timelines, severity distributions, and agent health panels
Identify anomalies by comparing dashboard patterns to baseline activity
Know when a dashboard pattern signals a real incident vs. normal fluctuation

Why Dashboards Matter

In Lessons 2.1 and 2.2, you learned what log sources feed the SIEM and how to read individual alerts. But a SOC analyst doesn't investigate alerts one at a time in a vacuum — you need the big picture. That's what dashboards give you.

A well-built dashboard is your situational awareness screen. It answers the questions you should be asking every 15 minutes during a shift:

Are we under attack right now? (alert volume spike)
What's being targeted? (top attacked hosts)
Is anything broken? (agent health, data gaps)
What's changed since last shift? (trend comparison)

Without a dashboard, you're staring at a list of individual alerts with no context. With a dashboard, you can spot a brute force campaign across 50 hosts in 2 seconds — something that would take 30 minutes scanning individual alerts.

SOC Reality: In most SOCs, the first thing an analyst does when sitting down for a shift is glance at the dashboard. If the graphs look normal, you start working your alert queue. If something looks wrong — a spike, a gap, a new pattern — that becomes your priority. The dashboard is your early warning system.

SOC Dashboard Overview — Five Core Widgets

The Five Core Dashboard Widgets

Every SOC dashboard should include at minimum these five widgets. They represent the essential views that keep you aware of what's happening across your environment.

This is the most important widget on any SOC dashboard. It shows alert count plotted against time — usually as a line chart or area chart with 5-minute or 15-minute buckets.

What to Look For	What It Means	Action
Flat baseline	Normal activity — alerts arriving at expected rate	Continue routine triage
Sudden spike	Something changed — could be an attack, a misconfiguration, or a noisy rule	Investigate immediately: what alerts spiked?
Gradual increase	Slow-building activity — could be a scan, a brute force, or increasing system load	Monitor trend, check top alert types
Gap or drop to zero	Data stopped flowing — agent down, network issue, or attacker killed the agent	Critical — investigate the silence, not the noise
Periodic pattern	Scheduled tasks, backup jobs, or automated scans	Normal if it matches known schedules

🚨

Silence Is More Dangerous Than Noise. A sudden drop in alert volume is often more alarming than a spike. If an agent that normally generates 200 events/hour suddenly goes silent, it could mean the endpoint was compromised and the attacker disabled logging — or the agent process was killed. Always investigate unexpected silence.

A pie chart or donut chart showing the breakdown of alerts by severity level. This tells you instantly whether your queue is mostly noise or mostly fire.

Distribution	Interpretation
90% Low, 8% Medium, 2% High	Normal day — most events are informational noise. Focus on the 2% high alerts.
60% Medium, 30% High, 10% Critical	Abnormal — something is generating a lot of medium+ alerts. Investigate the source.
Any Critical > 0	Critical alerts should be rare. Even one demands immediate attention.

In Wazuh terms:

Low = levels 0-6
Medium = levels 7-9
High = levels 10-12
Critical = levels 13-15

A horizontal bar chart showing which agents (endpoints) are generating the most alerts. This immediately tells you where the action is.

Pattern	What It Means
One host dominates	Targeted attack, misconfiguration, or a noisy application on that host
Even distribution	Broad scan or campaign hitting many hosts equally
New host appears	A system you haven't seen before is generating alerts — could be a new deployment or a compromised shadow IT system

💡

Baseline Knowledge Is Key. You need to know what "normal" looks like for your environment. If linux-web-01 normally generates 50 alerts/hour and suddenly it's at 500, that's significant. If WIN-SERVER-01 always tops the chart because it runs more services, that's expected. Building this baseline intuition takes about a week of shift work.

A table or bar chart showing which Wazuh rules are firing most frequently. This tells you what types of events dominate your environment.

Common Top Rules	What They Indicate
530 — Agent started	Normal — agents heartbeating
5501 — SSH auth success	Normal — legitimate logins
5551 — SSH brute force	Attention — active brute force campaign
80790 — Event log cleared	Investigate — potential evidence destruction
550 — Integrity checksum changed	Review — file changes on monitored paths

When a rule that's normally quiet suddenly appears in the top 10, that's your signal to investigate. Conversely, when a normally noisy rule disappears, something changed on that endpoint.

A status panel showing which agents are active, disconnected, or never connected. This is your data integrity check.

Status	Meaning	Action
Active	Agent is connected and sending data	Normal
Disconnected	Agent stopped reporting — could be network issue, server reboot, or compromised	Investigate within 15 minutes
Never connected	Agent was registered but never sent data	Configuration issue — fix during maintenance

Dashboard Widget Types — Reading the Patterns

Building a Dashboard in Wazuh

Wazuh uses OpenSearch Dashboards (the visualization layer) to create custom dashboards. Here's the conceptual workflow for building the SOC analyst dashboard described above.

Step 1: Understand Index Patterns

All Wazuh data is stored in indices following the pattern wazuh-alerts-*. When creating visualizations, you'll select this index pattern as your data source. Each document in the index is one alert — with all the fields you learned in Lesson 2.2 (rule.id, rule.level, agent.name, etc.).

Step 2: Choose the Right Visualization Type

Widget Goal	Visualization Type	X-Axis / Bucket	Y-Axis / Metric
Alert volume over time	Line chart	Date histogram (timestamp)	Count
Severity distribution	Pie / Donut chart	Terms (rule.level ranges)	Count
Top attacked hosts	Horizontal bar	Terms (agent.name)	Count
Top rule frequency	Data table	Terms (rule.id, rule.description)	Count
Agent health	Metric / Status	Terms (agent.name)	Count by status

Step 3: Apply Filters and Time Ranges

Dashboards should always have:

Global time filter: Typically "Last 24 hours" for shift overview, "Last 1 hour" for active monitoring
Auto-refresh: Set to 30 seconds or 1 minute during active monitoring
Saved filters: Pre-configured filters for your most common queries (e.g., severity >= 10, specific agents)

Step 4: Arrange the Layout

The standard SOC dashboard layout follows the "attention hierarchy" — the most critical information should be at the top:

┌─────────────────────────────────────────────────────────────┐
│  ALERT VOLUME TIMELINE (full width — top position)          │
├───────────────────────────┬─────────────────────────────────┤
│  SEVERITY DISTRIBUTION    │  AGENT HEALTH STATUS            │
│  (pie/donut — left)       │  (status panel — right)         │
├───────────────────────────┼─────────────────────────────────┤
│  TOP ATTACKED HOSTS       │  TOP ALERT RULES                │
│  (bar chart — left)       │  (data table — right)           │
└───────────────────────────┴─────────────────────────────────┘

ℹ

Dashboard Real Estate Matters. Put the timeline across the full width at the top because it's the first thing you check. Severity and health go in the middle because they're your quick-reference panels. Detailed breakdowns (hosts, rules) go at the bottom because you only drill into them when something above catches your attention.

Reading Dashboards Like a Pro

Building a dashboard is only half the skill. Reading it — quickly, accurately, under pressure — is the other half. Here are the patterns that experienced SOC analysts recognize instantly.

Pattern 1: The Spike

A sudden, sharp increase in alert volume. Your immediate questions:

When did it start? Zoom into the timeline
What rule is firing? Check top rules widget — if one rule dominates, it's probably one source
Which hosts? If one host, it's targeted. If many hosts, it's a campaign or a scan
Is it real? Compare the rule to known false positive rules. A spike in rule 530 (agent heartbeat) is routine. A spike in rule 5551 (brute force) is an attack.

Pattern 2: The Gap

Alert volume drops to zero or near-zero unexpectedly. This is often more dangerous than a spike.

Which agents went silent? Check agent health
Is it a network issue? Check if multiple agents from the same subnet are affected
Did someone disable logging? Check for Event ID 1102 (audit log cleared) right before the gap
Is it maintenance? Check the change calendar — planned reboots cause expected gaps

Pattern 3: The New Normal

A gradual shift in baseline — maybe alert volume increased 20% over the past week. This is subtle and easy to miss.

What's contributing to the increase? Check top rules — is a new rule firing that wasn't before?
New infrastructure? Was a new server or application deployed?
New attack campaign? External scanners and botnets often ramp up gradually

Pattern 4: The Rotation

Periodic patterns that repeat on a schedule — daily, weekly, or monthly. These are usually benign.

Match to known schedules — backup jobs, patch cycles, vulnerability scans, batch processing
Unexpected rotations — malware beaconing at regular intervals (e.g., every 4 hours = possible C2)

⚠

Regular Beaconing vs. Regular Jobs. Both create periodic patterns in your timeline. The difference: legitimate jobs match known schedules and generate expected rule IDs. Malware beaconing generates unusual network connections at suspicious intervals, often to unknown external IPs. In Lesson 2.4, you'll learn how to write queries to detect beaconing patterns.

SOC Shift Dashboard Workflow

Here's a practical workflow that experienced analysts follow during a shift:

Start of Shift (First 5 Minutes)

Open the dashboard — set time range to "Last 8 hours" to see the previous shift's activity
Check the timeline — any spikes or gaps during the previous shift?
Read the handoff notes — what did the previous analyst flag?
Check agent health — any disconnected agents?
Scan top rules — anything unusual in the top 10?

During Shift (Every 15-30 Minutes)

Glance at the timeline — compare current hour to the baseline
Check severity distribution — any increase in high/critical percentage?
Review top attacked hosts — any host that wasn't there before?
Auto-refresh should be running — if it stops, check your connection

End of Shift (Last 10 Minutes)

Screenshot the dashboard — attach to your shift report
Note anomalies — anything unusual that the next analyst should watch
Update handoff notes — ongoing investigations, pending escalations
Compare to baseline — was this shift busier or quieter than normal?

Key Metrics to Track Over Time

Beyond the live dashboard, mature SOCs track metrics over longer time periods (weekly, monthly) to identify trends and measure effectiveness.

Metric	What It Measures	Why It Matters
Mean Time to Detect (MTTD)	Average time from event occurrence to alert generation	Measures how fast your SIEM catches things
Mean Time to Respond (MTTR)	Average time from alert to resolution	Measures how fast your team acts
Alert Volume Trend	Week-over-week change in alert count	Increasing = expanding attack surface or failing rules
False Positive Rate	Percentage of alerts closed as FP	High FP rate = rules need tuning (Module 8)
Alert-to-Incident Ratio	Percentage of alerts that become incidents	Shows how well your rules filter signal from noise
Coverage Score	Number of ATT&CK techniques with detection rules	Measures breadth of your detection capability

💡

Track False Positive Rate Religiously. If your dashboard shows that 85% of alerts for a specific rule are false positives, that rule needs tuning. High false positive rates cause "alert fatigue" — analysts start ignoring alerts, which means they'll miss the real one hidden among the noise. Rule tuning is covered in depth in Lesson 2.5.

Common Dashboard Mistakes

Mistake	Why It's Bad	Fix
Too many widgets	Information overload — you can't watch 20 panels at once	Limit to 6-8 widgets per dashboard
No time filter	Seeing all-time data is useless for shift monitoring	Default to "Last 24 hours"
No auto-refresh	Dashboard shows stale data — you miss live incidents	Set 30-60 second auto-refresh
All alerts, no filtering	Noise overwhelms signal	Create separate dashboards: one for overview, one for high-severity only
No baseline comparison	Can't tell if a pattern is normal or anomalous	Add a "same time last week" overlay or note expected baselines
Ignoring agent health	You don't know when data stops flowing	Always include an agent status widget

Key Takeaways

A SOC dashboard is your situational awareness screen — it answers "what's happening right now?" at a glance
Five core widgets: alert timeline, severity distribution, top attacked hosts, top rules, and agent health
Silence (a gap in data) is often more dangerous than noise (a spike in alerts) — always investigate unexpected drops
Dashboard reading is a skill: learn to recognize spikes, gaps, new normals, and periodic patterns
Follow the shift workflow: check dashboard at start, monitor every 15-30 minutes, document at end of shift
Track long-term metrics (MTTD, MTTR, false positive rate) to measure and improve SOC performance
Avoid common mistakes: too many widgets, no auto-refresh, no baseline comparison, ignoring agent health

Knowledge Check: Dashboards & Visualizations

10 questions · 70% to pass

You're starting your SOC shift and glance at the alert timeline. You notice alert volume dropped to zero for a 30-minute window, then resumed normally. What should you do?

What does Mean Time to Respond (MTTR) measure, and why does it matter for SOC operations?

Your severity distribution widget shows 30% High and 10% Critical alerts — far above the normal 2% High baseline. What pattern is this, and what do you check first?

In the standard SOC dashboard layout, why does the alert volume timeline go at the top as a full-width widget?

Your dashboard shows a periodic spike every 4 hours in network connection alerts to the same external IP. How do you distinguish this from a legitimate scheduled job?

What is the biggest risk of having a high false positive rate on your SOC dashboard, and which metric tracks this?

Which common dashboard mistake is most likely to cause you to miss an active incident during your shift?

In Lab 1.1, you investigated alerts on the Wazuh dashboard. If you were building a 'Top Attacked Hosts' widget for that lab environment, which agent would you expect to dominate the chart based on the pre-loaded data?

In Lab 1.3, you explored 10 log sources across 4 agents. If you built an agent health widget and one of the 4 agents suddenly showed 'Disconnected' status, which agent going silent would concern you most from a security perspective?

In the Wazuh lab environment from Labs 1.1 and 1.3, you saw rule 530 (Agent started) fire for each agent. On a real SOC dashboard's 'Top Rules' widget, rule 530 appears at the top with the highest count. Is this concerning?

0/10 answered

Anatomy of a SIEM AlertPrevious Search & CorrelationNext

Dashboards & Visualizations