Lesson 4 of 6·13 min read·Includes quiz

Artifact Extraction & Analysis

Extracting sender, URLs, attachments, hashes; analyzing with VirusTotal, URLScan, AbuseIPDB, CyberChef

What You'll Learn

  • Extract key artifacts from phishing emails: sender information, URLs, attachments, and embedded objects
  • Defang URLs and IP addresses following safe-handling conventions before sharing or documenting
  • Hash suspicious attachments using MD5 and SHA256 for comparison against threat intelligence databases
  • Analyze extracted URLs using URLScan.io and VirusTotal to assess reputation and hosting infrastructure
  • Analyze file attachments on VirusTotal and Hybrid Analysis for malware verdicts and behavioral indicators
  • Use CyberChef to decode Base64 payloads, URL-encoded strings, and extract URLs from raw HTML
  • Build a structured IOC table from a single phishing email that feeds downstream blocking and detection
  • Perform OSINT enrichment of extracted IOCs using free analyst tools

From Email to Evidence

In Lessons PH-1 through PH-3, you learned to identify phishing emails, read their headers, and verify authentication results. That analysis tells you whether an email is malicious. This lesson answers the next question: what exactly is the threat, and how do we weaponize that knowledge defensively?

Artifact extraction is the process of pulling every Indicator of Compromise (IOC) from a phishing email and analyzing each one to understand the attack's infrastructure, intent, and scope. A single phishing email can yield dozens of IOCs — sender addresses, reply-to addresses, originating IPs, URLs, domain names, attachment hashes, embedded scripts, and more.

The goal is not just to confirm "this is phishing." The goal is to build a complete picture: who is attacking, what infrastructure they are using, how the payload works, and what you need to block across your environment to protect every user — not just the one who reported it.

Artifact extraction workflow — from raw email through extraction, analysis, and IOC table creation

Every artifact you extract becomes an action. A malicious URL becomes a firewall block. A sender domain becomes an email gateway rule. An attachment hash becomes an EDR detection. The difference between a junior analyst who says "this is phishing" and a senior analyst who neutralizes the campaign is the quality of artifact extraction.

Extracting Sender Artifacts

Start with the envelope and header fields you already know from Lesson PH-3:

ArtifactWhere to Find ItWhy It Matters
From addressFrom: header (display name + email)Often spoofed — compare with envelope sender
Envelope senderReturn-Path: or smtp.mailfrom in authentication headersThe actual sending address; may differ from display From:
Reply-To addressReply-To: headerAttackers set this to a different address to capture responses
Originating IPFirst Received: header (bottom of chain)The IP that initiated the SMTP session
X-Originating-IPSometimes present in webmail-originated messagesAdditional source IP indicator
Message-ID domainMessage-ID: header (domain after @)Reveals the actual mail system that generated the message
From: "IT Support Team" <helpdesk@company-secure.com>
Return-Path: <attacker@evil-domain.xyz>
Reply-To: <credential-harvest@protonmail.com>
Received: from mail.evil-domain.xyz (198.51.100.42)
Message-ID: <abc123@evil-domain.xyz>

From this single header block, you extract five IOCs: the spoofed From domain (company-secure.com), the real sender domain (evil-domain.xyz), the reply-to address (credential-harvest@protonmail.com), the originating IP (198.51.100.42), and the Message-ID domain confirming the true origin.

Extracting and Defanging URLs

Phishing emails almost always contain URLs — either in the body text, HTML hyperlinks, or disguised behind buttons. Extracting them requires examining both the visible text and the underlying HTML source.

Never click URLs from a phishing email on your workstation. Always work with raw source, copy-paste into analysis tools, or use a sandboxed browser. Phishing URLs may fingerprint your browser, log your IP, or trigger drive-by downloads.

Finding URLs in HTML Source

The visible text of a link and its actual destination are often different:

<a href="https://evil-domain.xyz/harvest?id=victim123">
  https://company.com/secure-login
</a>

The user sees https://company.com/secure-login. The actual destination is https://evil-domain.xyz/harvest?id=victim123. Always extract from the href attribute, not the display text.

Defanging Conventions

Before writing URLs or IPs in reports, tickets, chat messages, or IOC lists, defang them to prevent accidental clicks or auto-linking:

OriginalDefangedMethod
https://evil-domain.xyz/payloadhxxps://evil-domain[.]xyz/payloadReplace httphxxp, dots in domain → [.]
198.51.100.42198[.]51[.]100[.]42Wrap dots in brackets
evil@attacker.comevil[@]attacker[.]comBracket the @ and domain dots

CyberChef has a built-in Defang URL operation that handles this automatically. In Lab PH-4, you will use it extensively.

💡

Defanging is not optional — it is a professional standard. Sharing a live malicious URL in a Slack channel, email, or Jira ticket can result in someone clicking it. Automated security scanners may also follow live URLs, tipping off the attacker that the campaign has been discovered.

Hashing Attachments

When a phishing email contains an attachment — a Word document, PDF, Excel file, ZIP archive, or executable — the first step is generating cryptographic hashes without opening the file.

# Generate MD5 and SHA256 hashes
md5sum suspicious_invoice.docx
sha256sum suspicious_invoice.docx

# On macOS
md5 suspicious_invoice.docx
shasum -a 256 suspicious_invoice.docx
Hash AlgorithmLengthPrimary Use
MD532 hex charactersQuick lookup on VT/threat feeds (widely indexed)
SHA25664 hex charactersDefinitive identification (collision-resistant)
🚨

Never open suspicious attachments on your work machine. Even "just looking" at a Word document can trigger macros. Always hash first, check the hash against VirusTotal and your threat intel feeds, and only detonate in a sandbox if analysis is needed.

Why Both Hashes?

MD5 is faster to compute and more widely indexed in legacy threat intelligence databases. SHA256 is cryptographically stronger and the standard for modern IOC sharing (STIX/TAXII, MISP). Always generate both.

Analyzing URLs: URLScan.io and VirusTotal

Once you have extracted and defanged URLs, analyze them using free tools before anyone clicks them.

URLScan.io

URLScan.io visits the URL in a sandboxed browser and captures:

  • Screenshot of the rendered page (see the phishing page without visiting it)
  • DOM content — the full HTML source of the destination page
  • Network requests — every resource loaded (scripts, images, redirects)
  • Redirect chain — the full path from initial URL to final destination
  • IP and hosting information — where the page is hosted
  • Verdict — community and automated classification

Submit the URL (re-fang it for the search, or use URLScan's API) and examine the results. A credential harvesting page will typically show a login form mimicking a known brand, hosted on a recently registered domain or compromised site.

VirusTotal URL Scan

VirusTotal aggregates results from 70+ security vendors. For URL analysis:

  • Paste the URL into the URL tab (not the file tab)
  • Review the detection ratio (e.g., 12/87 vendors flagged it as malicious)
  • Check the Community tab for analyst comments
  • Examine Relations — other URLs hosted on the same IP, associated files, redirects
💡

Combine both tools. URLScan.io gives you the visual context (what the victim would see) and the technical context (network behavior). VirusTotal gives you vendor consensus and historical associations. Together, they paint a complete picture.

Analyzing Attachments: VirusTotal and Hybrid Analysis

VirusTotal File Analysis

Upload the file hash (not the file itself, to avoid sharing sensitive data) to VirusTotal:

  • Detection tab: How many AV engines detect it as malicious
  • Behavior tab: If the file has been detonated in VT's sandbox, you see process creation, file drops, network connections, and registry changes
  • Relations tab: Other files dropped, contacted domains, similar samples
  • Community tab: Analyst notes and YARA rule matches

Hybrid Analysis

Hybrid Analysis (hybrid-analysis.com) by CrowdStrike provides deeper behavioral analysis:

  • Submit the file for sandbox detonation (Windows 7/10, Linux)
  • View process trees, network connections, DNS queries, and file system changes
  • See extracted strings, embedded URLs, and dropped payloads
  • Review MITRE ATT&CK technique mapping for observed behaviors
FeatureVirusTotalHybrid Analysis
AV vendor detections70+ enginesCrowdStrike Falcon + selected engines
Sandbox behaviorBasic (VT sandbox)Deep (full OS-level behavioral trace)
Network captureDNS/HTTP summaryFull PCAP available for download
Process treeBasicDetailed with parent-child relationships
ATT&CK mappingLimitedComprehensive per-behavior mapping
Best forQuick hash lookups and vendor consensusDeep-dive behavioral analysis

Using CyberChef for Decoding

CyberChef (gchq.github.io/CyberChef) is the analyst's Swiss Army knife. Phishing emails frequently use encoding to evade detection, and CyberChef can decode virtually anything.

Common Decoding Operations

Base64 Decode: Attackers encode payloads, URLs, or entire scripts in Base64 to bypass email gateways.

Input:  aHR0cHM6Ly9ldmlsLWRvbWFpbi54eXovY3JlZC1oYXJ2ZXN0P3VzZXI9dGFyZ2V0
Recipe: From Base64
Output: https://evil-domain.xyz/cred-harvest?user=target

URL Decode: Percent-encoded URLs hide the true destination.

Input:  https%3A%2F%2Fevil-domain.xyz%2Fpayload%3Fid%3D12345
Recipe: URL Decode
Output: https://evil-domain.xyz/payload?id=12345

Extract URLs from HTML: When you have raw HTML source from an email, CyberChef's "Extract URLs" operation pulls every URL from href attributes, script sources, and embedded content.

Recipe: Extract URLs → Defang URL → Sort → Unique

This four-step recipe takes raw HTML and produces a clean, defanged, deduplicated URL list ready for your IOC table.

CyberChef recipes are shareable. You can save a recipe as a URL and share it with your team. In Lab PH-4, you will build several recipes and save them for reuse in future investigations.

Building the IOC Table

Every phishing investigation should produce a structured IOC table. This table becomes the input for blocking rules, SIEM detections, and threat intelligence sharing.

IOC TypeValueSourceContext
Email Addressattacker@evil-domain[.]xyzReturn-Path headerEnvelope sender
Domainevil-domain[.]xyzReturn-Path, Message-IDAttacker-controlled sending infrastructure
IP Address198[.]51[.]100[.]42Received headerOriginating mail server
URLhxxps://evil-domain[.]xyz/harvestEmail body (href)Credential harvesting page
URLhxxps://cdn[.]evil-domain[.]xyz/logo[.]pngEmail body (img src)Tracking pixel / brand impersonation asset
File Hash (MD5)d41d8cd98f00b204e9800998ecf8427eAttachmentMalicious document
File Hash (SHA256)e3b0c44298fc1c149afbf4c8996fb924....AttachmentMalicious document
File NameInvoice_Q4_2026.docmAttachmentMacro-enabled document

Always defang IOCs in your table. Even in internal documents. Some ticketing systems, wikis, and chat tools automatically convert URLs into clickable links. Defanging prevents accidental navigation.

OSINT Enrichment of Extracted IOCs

Raw IOCs are useful for blocking. Enriched IOCs tell you the story of the attack — who is behind it, how long the infrastructure has been active, and what other campaigns use the same resources.

IOC enrichment pipeline — from raw extraction through OSINT lookups to actionable intelligence

Domain Enrichment

ToolWhat You Learn
WHOIS lookup (whois.domaintools.com)Registration date, registrar, registrant info (often privacy-protected)
PassiveDNS (VirusTotal Relations tab)Historical IP resolutions — see if the domain recently changed hosting
URLScan.ioHosting provider, page content, SSL certificate details
Shodan (shodan.io)Open ports, services, technologies running on the IP

A domain registered 48 hours ago hosting a "Microsoft 365 login page" on a bulletproof hosting provider is almost certainly malicious.

IP Enrichment

ToolWhat You Learn
AbuseIPDB (abuseipdb.com)Abuse reports from other analysts worldwide
GreyNoise (greynoise.io)Whether the IP is a known scanner/noise vs. targeted attacker
IPinfo (ipinfo.io)ASN, geolocation, hosting provider
VirusTotalFiles communicating with this IP, URLs hosted on it

File Hash Enrichment

ToolWhat You Learn
VirusTotalDetection ratio, behavioral analysis, YARA matches, community notes
Hybrid AnalysisFull sandbox report — process tree, network calls, dropped files
MalwareBazaar (bazaar.abuse.ch)Malware family classification, associated campaigns, download samples
Any.Run (any.run)Interactive sandbox with visual process tree and network activity
💡

Enrichment reveals campaign scope. If five different phishing emails use five different sender addresses but all link to the same IP — that IP is the campaign's infrastructure. Blocking that single IP neutralizes all five variants. Without enrichment, you would block five addresses and miss the common denominator.

Putting It All Together: The Extraction Workflow

Here is the systematic workflow you will follow in Lab PH-4:

  1. Save the raw email — download the .eml file or copy the full source
  2. Extract sender artifacts — From, Return-Path, Reply-To, originating IP, Message-ID domain
  3. Extract URLs — both visible text and href destinations; use CyberChef "Extract URLs" on HTML source
  4. Defang everything — all URLs, IPs, and email addresses in your working notes
  5. Hash attachments — MD5 and SHA256 without opening the file
  6. Analyze URLs — submit to URLScan.io and VirusTotal
  7. Analyze attachments — submit hash to VirusTotal, detonate in Hybrid Analysis if needed
  8. Decode obfuscated content — use CyberChef for Base64, URL encoding, HTML entities
  9. Build the IOC table — structured, defanged, with source and context columns
  10. Enrich IOCs — WHOIS, PassiveDNS, AbuseIPDB, GreyNoise for each indicator
  11. Document findings — feed the IOC table into your investigation report

This workflow is not just for phishing. The same extraction and enrichment process applies to any email-borne threat — BEC, malware delivery, invoice fraud, and even legitimate security notifications that need verification. Master this workflow once and you can apply it everywhere.

Key Takeaways

  • A single phishing email can yield dozens of IOCs: sender addresses, domains, IPs, URLs, file hashes, and embedded objects
  • Always defang URLs, IPs, and email addresses before sharing — this is a professional standard, not optional
  • Generate both MD5 and SHA256 hashes for attachments — MD5 for legacy lookups, SHA256 for modern IOC sharing
  • Use URLScan.io for visual and network analysis of URLs, and VirusTotal for vendor consensus and historical associations
  • CyberChef is essential for decoding Base64, URL-encoded strings, and extracting URLs from raw HTML source
  • Build a structured IOC table with type, value, source, and context columns — this feeds blocking rules and SIEM detections
  • OSINT enrichment transforms raw IOCs into campaign intelligence: WHOIS, PassiveDNS, AbuseIPDB, and sandbox analysis reveal the attacker's infrastructure and scope

What's Next

You now know how to tear apart a phishing email and extract every artifact it contains. In Lesson PH-5: Defensive Measures & Response, you will learn what to do with those artifacts — how to block them across your email gateway, firewall, and SIEM, how to check whether anyone else in your organization clicked the link or submitted credentials, and how to build a phishing response process that scales. In Lab PH-4, you will put this lesson into practice by extracting artifacts from a realistic phishing email, analyzing each IOC, and building a complete IOC table.

Knowledge Check: Artifact Extraction & Analysis

10 questions · 70% to pass

1

What is the primary goal of artifact extraction from a phishing email?

2

Why should analysts generate both MD5 and SHA256 hashes for a suspicious attachment?

3

What does defanging a URL accomplish, and which of the following is a correctly defanged URL?

4

In Lab PH-4, you extract URLs from a phishing email's HTML source using CyberChef. Which recipe chain produces a clean, defanged, deduplicated URL list?

5

What key difference exists between URLScan.io and VirusTotal for URL analysis?

6

An attacker encodes a malicious URL in Base64 within a phishing email. What CyberChef operation reveals the hidden URL?

7

When building an IOC table, what four columns should every entry include?

8

In Lab PH-4, you discover that five different phishing emails all link to the same IP address. What does OSINT enrichment reveal in this scenario?

9

Which tool is best suited for deep behavioral analysis of a suspicious attachment, including full process trees and PCAP downloads?

10

Why is the Return-Path header often more valuable than the From header when extracting sender IOCs?

0/10 answered