- Extract key artifacts from phishing emails: sender information, URLs, attachments, and embedded objects - Defang URLs and IP addresses following safe-handling conventions before sharing or documenting - Hash suspicious attachments using MD5 and SHA256 for comparison against threat intelligence databases - Analyze extracted URLs using URLScan.io and VirusTotal to assess reputation and hosting infrastructure - Analyze file attachments on VirusTotal and Hybrid Analysis for malware verdicts and behavioral indicators - Use CyberChef to decode Base64 payloads, URL-encoded strings, and extract URLs from raw HTML - Build a structured IOC table from a single phishing email that feeds downstream blocking and detection - Perform OSINT enrichment of extracted IOCs using free analyst tools

## From Email to Evidence In Lessons PH-1 through PH-3, you learned to identify phishing emails, read their headers, and verify authentication results. That analysis tells you **whether** an email is malicious. This lesson answers the next question: **what exactly is the threat, and how do we weaponize that knowledge defensively?** Artifact extraction is the process of pulling every Indicator of Compromise (IOC) from a phishing email and analyzing each one to understand the attack's infrastructure, intent, and scope. A single phishing email can yield dozens of IOCs — sender addresses, reply-to addresses, originating IPs, URLs, domain names, attachment hashes, embedded scripts, and more. The goal is not just to confirm "this is phishing." The goal is to build a complete picture: **who** is attacking, **what** infrastructure they are using, **how** the payload works, and **what** you need to block across your environment to protect every user — not just the one who reported it. ![Artifact extraction workflow — from raw email through extraction, analysis, and IOC table creation](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-phishing/lesson-ph-4/artifact-extraction-workflow.png)

**Every artifact you extract becomes an action.** A malicious URL becomes a firewall block. A sender domain becomes an email gateway rule. An attachment hash becomes an EDR detection. The difference between a junior analyst who says "this is phishing" and a senior analyst who neutralizes the campaign is the quality of artifact extraction.

## Extracting Sender Artifacts Start with the envelope and header fields you learned in previous lessons: | Artifact | Where to Find It | Why It Matters | |---|---|---| | **From address** | `From:` header (display name + email) | Often spoofed — compare with envelope sender | | **Envelope sender** | `Return-Path:` or `smtp.mailfrom` in authentication headers | The actual sending address; may differ from display `From:` | | **Reply-To address** | `Reply-To:` header | Attackers set this to a different address to capture responses | | **Originating IP** | First `Received:` header (bottom of chain) | The IP that initiated the SMTP session | | **X-Originating-IP** | Sometimes present in webmail-originated messages | Additional source IP indicator | | **Message-ID domain** | `Message-ID:` header (domain after @) | Reveals the actual mail system that generated the message | ```text From: "IT Support Team" Return-Path: Reply-To: Received: from mail.evil-domain.xyz (198.51.100.42) Message-ID: ``` From this single header block, you extract five IOCs: the spoofed From domain (`company-secure.com`), the real sender domain (`evil-domain.xyz`), the reply-to address (`credential-harvest@protonmail.com`), the originating IP (`198.51.100.42`), and the Message-ID domain confirming the true origin. ## Extracting and Defanging URLs Phishing emails almost always contain URLs — either in the body text, HTML hyperlinks, or disguised behind buttons. Extracting them requires examining both the visible text and the underlying HTML source.

**Never click URLs from a phishing email on your workstation.** Always work with raw source, copy-paste into analysis tools, or use a sandboxed browser. Phishing URLs may fingerprint your browser, log your IP, or trigger drive-by downloads.

### Finding URLs in HTML Source The visible text of a link and its actual destination are often different: ```html https://company.com/secure-login ``` The user sees `https://company.com/secure-login`. The actual destination is `https://evil-domain.xyz/harvest?id=victim123`. Always extract from the `href` attribute, not the display text. ### Defanging Conventions Before writing URLs or IPs in reports, tickets, chat messages, or IOC lists, **defang** them to prevent accidental clicks or auto-linking: | Original | Defanged | Method | |---|---|---| | `https://evil-domain.xyz/payload` | `hxxps://evil-domain[.]xyz/payload` | Replace `http` → `hxxp`, dots in domain → `[.]` | | `198.51.100.42` | `198[.]51[.]100[.]42` | Wrap dots in brackets | | `evil@attacker.com` | `evil[@]attacker[.]com` | Bracket the @ and domain dots | CyberChef has a built-in **Defang URL** operation that handles this automatically. In Lab PH-4, you will use it extensively.

**Defanging is not optional — it is a professional standard.** Sharing a live malicious URL in a Slack channel, email, or Jira ticket can result in someone clicking it. Automated security scanners may also follow live URLs, tipping off the attacker that the campaign has been discovered.

## Hashing Attachments When a phishing email contains an attachment — a Word document, PDF, Excel file, ZIP archive, or executable — the first step is generating cryptographic hashes **without opening the file**. ```bash # Generate MD5 and SHA256 hashes md5sum suspicious_invoice.docx sha256sum suspicious_invoice.docx # On macOS md5 suspicious_invoice.docx shasum -a 256 suspicious_invoice.docx ``` | Hash Algorithm | Length | Primary Use | |---|---|---| | **MD5** | 32 hex characters | Quick lookup on VT/threat feeds (widely indexed) | | **SHA256** | 64 hex characters | Definitive identification (collision-resistant) |

**Never open suspicious attachments on your work machine.** Even "just looking" at a Word document can trigger macros. Always hash first, check the hash against VirusTotal and your threat intel feeds, and only detonate in a sandbox if analysis is needed.

### Why Both Hashes? MD5 is faster to compute and more widely indexed in legacy threat intelligence databases. SHA256 is cryptographically stronger and the standard for modern IOC sharing (STIX/TAXII, MISP). Always generate both. ## Analyzing URLs: URLScan.io and VirusTotal Once you have extracted and defanged URLs, analyze them using free tools before anyone clicks them. ### URLScan.io URLScan.io visits the URL in a sandboxed browser and captures: - **Screenshot** of the rendered page (see the phishing page without visiting it) - **DOM content** — the full HTML source of the destination page - **Network requests** — every resource loaded (scripts, images, redirects) - **Redirect chain** — the full path from initial URL to final destination - **IP and hosting information** — where the page is hosted - **Verdict** — community and automated classification Submit the URL (re-fang it for the search, or use URLScan's API) and examine the results. A credential harvesting page will typically show a login form mimicking a known brand, hosted on a recently registered domain or compromised site. ### VirusTotal URL Scan VirusTotal aggregates results from 70+ security vendors. For URL analysis: - Paste the URL into the **URL** tab (not the file tab) - Review the **detection ratio** (e.g., 12/87 vendors flagged it as malicious) - Check the **Community** tab for analyst comments - Examine **Relations** — other URLs hosted on the same IP, associated files, redirects

**Combine both tools.** URLScan.io gives you the visual context (what the victim would see) and the technical context (network behavior). VirusTotal gives you vendor consensus and historical associations. Together, they paint a complete picture.

## Analyzing Attachments: VirusTotal and Hybrid Analysis ### VirusTotal File Analysis Upload the file hash (not the file itself, to avoid sharing sensitive data) to VirusTotal: - **Detection tab:** How many AV engines detect it as malicious - **Behavior tab:** If the file has been detonated in VT's sandbox, you see process creation, file drops, network connections, and registry changes - **Relations tab:** Other files dropped, contacted domains, similar samples - **Community tab:** Analyst notes and YARA rule matches ### Hybrid Analysis Hybrid Analysis (hybrid-analysis.com) by CrowdStrike provides deeper behavioral analysis: - Submit the file for sandbox detonation (Windows 7/10, Linux) - View process trees, network connections, DNS queries, and file system changes - See extracted strings, embedded URLs, and dropped payloads - Review MITRE ATT&CK technique mapping for observed behaviors | Feature | VirusTotal | Hybrid Analysis | |---|---|---| | **AV vendor detections** | 70+ engines | CrowdStrike Falcon + selected engines | | **Sandbox behavior** | Basic (VT sandbox) | Deep (full OS-level behavioral trace) | | **Network capture** | DNS/HTTP summary | Full PCAP available for download | | **Process tree** | Basic | Detailed with parent-child relationships | | **ATT&CK mapping** | Limited | Comprehensive per-behavior mapping | | **Best for** | Quick hash lookups and vendor consensus | Deep-dive behavioral analysis | ## Using CyberChef for Decoding CyberChef (gchq.github.io/CyberChef) is the analyst's Swiss Army knife. Phishing emails frequently use encoding to evade detection, and CyberChef can decode virtually anything. ### Common Decoding Operations **Base64 Decode:** Attackers encode payloads, URLs, or entire scripts in Base64 to bypass email gateways. ```text Input: aHR0cHM6Ly9ldmlsLWRvbWFpbi54eXovY3JlZC1oYXJ2ZXN0P3VzZXI9dGFyZ2V0 Recipe: From Base64 Output: https://evil-domain.xyz/cred-harvest?user=target ``` **URL Decode:** Percent-encoded URLs hide the true destination. ```text Input: https%3A%2F%2Fevil-domain.xyz%2Fpayload%3Fid%3D12345 Recipe: URL Decode Output: https://evil-domain.xyz/payload?id=12345 ``` **Extract URLs from HTML:** When you have raw HTML source from an email, CyberChef's "Extract URLs" operation pulls every URL from href attributes, script sources, and embedded content. ```text Recipe: Extract URLs → Defang URL → Sort → Unique ``` This four-step recipe takes raw HTML and produces a clean, defanged, deduplicated URL list ready for your IOC table.

**CyberChef recipes are shareable.** You can save a recipe as a URL and share it with your team. In Lab PH-4, you will build several recipes and save them for reuse in future investigations.

## Building the IOC Table Every phishing investigation should produce a structured IOC table. This table becomes the input for blocking rules, SIEM detections, and threat intelligence sharing. | IOC Type | Value | Source | Context | |---|---|---|---| | **Email Address** | `attacker@evil-domain[.]xyz` | Return-Path header | Envelope sender | | **Domain** | `evil-domain[.]xyz` | Return-Path, Message-ID | Attacker-controlled sending infrastructure | | **IP Address** | `198[.]51[.]100[.]42` | Received header | Originating mail server | | **URL** | `hxxps://evil-domain[.]xyz/harvest` | Email body (href) | Credential harvesting page | | **URL** | `hxxps://cdn[.]evil-domain[.]xyz/logo[.]png` | Email body (img src) | Tracking pixel / brand impersonation asset | | **File Hash (MD5)** | `d41d8cd98f00b204e9800998ecf8427e` | Attachment | Malicious document | | **File Hash (SHA256)** | `e3b0c44298fc1c149afbf4c8996fb924....` | Attachment | Malicious document | | **File Name** | `Invoice_Q4_2026.docm` | Attachment | Macro-enabled document |

**Always defang IOCs in your table.** Even in internal documents. Some ticketing systems, wikis, and chat tools automatically convert URLs into clickable links. Defanging prevents accidental navigation.

## OSINT Enrichment of Extracted IOCs Raw IOCs are useful for blocking. **Enriched** IOCs tell you the story of the attack — who is behind it, how long the infrastructure has been active, and what other campaigns use the same resources. ![IOC enrichment pipeline — from raw extraction through OSINT lookups to actionable intelligence](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-phishing/lesson-ph-4/ioc-enrichment-pipeline.png) ### Domain Enrichment | Tool | What You Learn | |---|---| | **WHOIS lookup** (whois.domaintools.com) | Registration date, registrar, registrant info (often privacy-protected) | | **PassiveDNS** (VirusTotal Relations tab) | Historical IP resolutions — see if the domain recently changed hosting | | **URLScan.io** | Hosting provider, page content, SSL certificate details | | **Shodan** (shodan.io) | Open ports, services, technologies running on the IP | A domain registered 48 hours ago hosting a "Microsoft 365 login page" on a bulletproof hosting provider is almost certainly malicious. ### IP Enrichment | Tool | What You Learn | |---|---| | **AbuseIPDB** (abuseipdb.com) | Abuse reports from other analysts worldwide | | **GreyNoise** (greynoise.io) | Whether the IP is a known scanner/noise vs. targeted attacker | | **IPinfo** (ipinfo.io) | ASN, geolocation, hosting provider | | **VirusTotal** | Files communicating with this IP, URLs hosted on it | ### File Hash Enrichment | Tool | What You Learn | |---|---| | **VirusTotal** | Detection ratio, behavioral analysis, YARA matches, community notes | | **Hybrid Analysis** | Full sandbox report — process tree, network calls, dropped files | | **MalwareBazaar** (bazaar.abuse.ch) | Malware family classification, associated campaigns, download samples | | **Any.Run** (any.run) | Interactive sandbox with visual process tree and network activity |

**Enrichment reveals campaign scope.** If five different phishing emails use five different sender addresses but all link to the same IP — that IP is the campaign's infrastructure. Blocking that single IP neutralizes all five variants. Without enrichment, you would block five addresses and miss the common denominator.

## Putting It All Together: The Extraction Workflow Here is the systematic workflow you will follow in Lab PH-4: 1. **Save the raw email** — download the .eml file or copy the full source 2. **Extract sender artifacts** — From, Return-Path, Reply-To, originating IP, Message-ID domain 3. **Extract URLs** — both visible text and href destinations; use CyberChef "Extract URLs" on HTML source 4. **Defang everything** — all URLs, IPs, and email addresses in your working notes 5. **Hash attachments** — MD5 and SHA256 without opening the file 6. **Analyze URLs** — submit to URLScan.io and VirusTotal 7. **Analyze attachments** — submit hash to VirusTotal, detonate in Hybrid Analysis if needed 8. **Decode obfuscated content** — use CyberChef for Base64, URL encoding, HTML entities 9. **Build the IOC table** — structured, defanged, with source and context columns 10. **Enrich IOCs** — WHOIS, PassiveDNS, AbuseIPDB, GreyNoise for each indicator 11. **Document findings** — feed the IOC table into your investigation report

**This workflow is not just for phishing.** The same extraction and enrichment process applies to any email-borne threat — BEC, malware delivery, invoice fraud, and even legitimate security notifications that need verification. Master this workflow once and you can apply it everywhere.

## Static Analysis of Suspicious Attachments When a phishing email contains a suspicious attachment — a Word document, Excel spreadsheet, PDF, or ZIP file — the extraction process does not stop at hashing and uploading to VirusTotal. Static analysis lets you examine the file's internal structure without executing it, revealing embedded macros, scripts, and encoded payloads that tell you exactly what the attachment would do if a user opened it. ![Static analysis decision workflow — from phishing email to attachment analysis](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-phishing/lesson-ph-4/static-analysis-phishing-workflow.png) ### Office Documents (Word, Excel, PowerPoint) Malicious Office documents typically use VBA macros or OLE objects as their delivery mechanism. The tool of choice is **oletools** (specifically `olevba`): ```bash # Extract and analyze VBA macros from a Word document olevba suspicious_invoice.docm # Key things to look for in output: # - AutoOpen / Document_Open (auto-execution triggers) # - Shell / WScript.Shell (command execution) # - PowerShell / cmd.exe references # - URLDownloadToFile / XMLHTTP (network downloads) # - Base64 encoded strings (obfuscated payloads) ``` Red flags in macro analysis: **auto-execution triggers** (`AutoOpen`, `Workbook_Open`, `Document_Open`) that fire when the user enables macros, **shell commands** that spawn PowerShell or cmd.exe, **network functions** that download second-stage payloads, and **string obfuscation** (concatenation, chr() encoding, Base64) designed to evade static scanners. ### PDF Documents Malicious PDFs exploit JavaScript execution, embedded objects, or launch actions. Use **pdf-parser** or **peepdf** for analysis: ```bash # List all objects in a PDF pdf-parser suspicious_document.pdf # Look for: # - /JavaScript or /JS (embedded JavaScript) # - /OpenAction or /AA (auto-execution) # - /Launch (launch external applications) # - /EmbeddedFile (embedded file objects) # - /URI (external URL references) ``` ### When to Escalate to Dynamic Analysis Static analysis tells you what the attachment *could* do. Dynamic analysis (sandbox detonation) tells you what it *actually* does. Escalate to Module 11: Malware Analysis when static analysis reveals obfuscated payloads you cannot fully decode, multi-stage delivery chains where each stage downloads the next, or exploit code targeting specific vulnerabilities.

**The phishing-to-malware pipeline:** Phishing email → extract attachment → hash check (VirusTotal) → if unknown, static analysis (olevba/pdf-parser) → if suspicious, dynamic analysis (sandbox) → if confirmed malicious, extract IOCs → feed to SIEM/MISP. This workflow spans Modules 4, 7, and 11 of this course.

- A single phishing email can yield **dozens of IOCs**: sender addresses, domains, IPs, URLs, file hashes, and embedded objects - **Always defang** URLs, IPs, and email addresses before sharing — this is a professional standard, not optional - Generate both **MD5 and SHA256** hashes for attachments — MD5 for legacy lookups, SHA256 for modern IOC sharing - Use **URLScan.io** for visual and network analysis of URLs, and **VirusTotal** for vendor consensus and historical associations - **CyberChef** is essential for decoding Base64, URL-encoded strings, and extracting URLs from raw HTML source - Build a structured **IOC table** with type, value, source, and context columns — this feeds blocking rules and SIEM detections - **OSINT enrichment** transforms raw IOCs into campaign intelligence: WHOIS, PassiveDNS, AbuseIPDB, and sandbox analysis reveal the attacker's infrastructure and scope

## What's Next Put what you just learned into practice in **Lab 5.4 — Artifact Extraction & IOC Analysis**, where you'll extract and analyze phishing artifacts hands-on, building a complete IOC table from a realistic phishing email.