Lesson 2 of 6·12 min read·Includes quiz

Static Analysis: Hashing, Packing & Imports

File hashing, packer detection, import table analysis, identifying malicious API calls

What You'll Learn

  • Calculate MD5, SHA1, and SHA256 hashes for malware samples on both Windows and Linux
  • Explain why cryptographic hashing is fundamental to malware identification and tracking
  • Use fuzzy hashing (ssdeep) to identify similar samples that share partial code
  • Perform hash lookups on VirusTotal and MalwareBazaar to determine file reputation
  • Identify packed binaries through high entropy sections, minimal imports, and small .text sections
  • Recognize common packers (UPX, Themida, custom) and understand basic unpacking approaches
  • Analyze the Import Address Table (IAT) to identify malicious API call patterns
  • Assess section entropy to detect encrypted or compressed code regions

File Hashing: The Fingerprint of Every Binary

In Lesson 11.1, you extracted strings and examined PE structure. But before you share findings with your team, threat intel feeds, or incident reports, you need one thing: a unique identifier for the file. That identifier is a cryptographic hash.

A hash function takes a file of any size and produces a fixed-length string — a digital fingerprint. Change a single byte in the file, and the hash changes completely. This property makes hashes the universal language of malware identification.

AlgorithmOutput LengthSpeedUse Case
MD5128 bits (32 hex chars)FastestLegacy lookups, quick dedup — collisions exist, not cryptographically secure
SHA1160 bits (40 hex chars)FastVirusTotal primary index, still widely used despite theoretical weaknesses
SHA256256 bits (64 hex chars)ModerateGold standard for malware identification — no known practical collisions

Generating Hashes

Linux:

md5sum suspicious.exe
sha1sum suspicious.exe
sha256sum suspicious.exe

sha256sum suspicious.exe | awk '{print $1}'

Windows (PowerShell):

Get-FileHash suspicious.exe -Algorithm MD5
Get-FileHash suspicious.exe -Algorithm SHA1
Get-FileHash suspicious.exe -Algorithm SHA256

Python (for automation):

import hashlib

with open("suspicious.exe", "rb") as f:
    data = f.read()
    print(f"MD5:    {hashlib.md5(data).hexdigest()}")
    print(f"SHA1:   {hashlib.sha1(data).hexdigest()}")
    print(f"SHA256: {hashlib.sha256(data).hexdigest()}")

Always calculate all three hashes. Different platforms index samples differently. VirusTotal uses SHA256 as the primary key but also accepts MD5 and SHA1 searches. MalwareBazaar primarily uses SHA256. Some legacy SIEM rules still reference MD5 hashes. In Lab 11.2, you will generate a hash report for a sample and cross-reference it across multiple platforms.

Fuzzy Hashing with ssdeep

Cryptographic hashes have a limitation: change one byte and the hash changes completely. Malware authors exploit this by recompiling with minor modifications — a different C2 server, a changed variable name — producing a sample with identical behavior but a completely different SHA256 hash.

ssdeep solves this with context-triggered piecewise hashing (fuzzy hashing). Instead of hashing the entire file, ssdeep divides it into variable-length chunks and hashes each chunk independently. The result is a hash that can be compared for similarity, not just equality.

ssdeep suspicious.exe
# 768:Gj4TtmMbhQovVWq+x3RTHdha9cAqhbB7GoNai:Gj4TtHbhQovVN+xBTya9FqhZ7Gi

ssdeep -r /malware/samples/ > hashes.txt

ssdeep -d hashes.txt
Similarity ScoreInterpretation
100Identical files
> 75Very likely variants of the same malware family
50–75Possibly related — shared code libraries or components
< 50Likely unrelated
0No detectable similarity
💡

ssdeep is your variant hunter. When investigating an incident, take the fuzzy hash of the initial sample and compare it against your entire malware repository. You may discover variants the attacker deployed weeks before the one you found — giving you a fuller picture of the campaign timeline and scope.

Hash Lookups: Has Anyone Seen This Before?

Once you have the hash, the first question is: has the security community already analyzed this file?

VirusTotal

VirusTotal aggregates results from 70+ antivirus engines and provides behavioral analysis, community comments, and network indicators.

# Search by SHA256 hash (API)
curl -s "https://www.virustotal.com/api/v3/files/SHA256_HASH_HERE" \
  -H "x-apikey: YOUR_API_KEY" | python3 -m json.tool

# Web interface: https://www.virustotal.com/gui/file/SHA256_HASH_HERE

What to look for on VirusTotal:

TabWhat It Tells You
DetectionHow many engines flag it and what names they give it
DetailsPE metadata: sections, imports, compilation timestamp, resources
RelationsContacted domains, IPs, dropped files, parent samples
BehaviorSandbox execution results: process tree, file operations, network connections
CommunityAnalyst comments with context, IOCs, and campaign attribution

MalwareBazaar

MalwareBazaar (by abuse.ch) is a malware sample sharing platform focused on tracking active threats:

curl -s -X POST "https://mb-api.abuse.ch/api/v1/" \
  -d "query=get_info&hash=SHA256_HASH_HERE"

MISP Integration

In Module 5, you learned to use MISP for threat intelligence. Every hash you extract during malware analysis should be checked against your MISP instance — and new hashes should be added as IOCs to share with your community.

Be careful what you upload. Submitting a file to VirusTotal makes it available to all premium subscribers — including threat actors who monitor VT for their own malware to detect when it has been discovered. For sensitive samples from active investigations, search by hash first (which does not reveal you have the file). Only upload if the hash returns no results and you need the analysis.

Packer Detection: When the Binary Hides Its True Self

Packing is a technique where malware is compressed, encrypted, or otherwise transformed so that static analysis reveals nothing useful. The packed binary contains a small unpacking stub that reconstructs the original code in memory at runtime.

Packer detection indicators — entropy analysis, section characteristics, import count, and tool signatures

Why Attackers Pack Binaries

  • Evade signatures: Antivirus rules written for the original strings and byte patterns will not match the packed version
  • Defeat static analysis: Strings, imports, and PE structure of the packed binary reveal the packer, not the malware
  • Reduce file size: Some packers (like UPX) compress binaries to 30-50% of original size
  • Slow down analysts: Unpacking adds time and complexity to the investigation

Identifying Packed Binaries

IndicatorWhat to CheckPacked Binary
Import countNumber of imported functionsVery few imports (< 10), often just LoadLibrary + GetProcAddress
Section namesPE section tableNon-standard names: UPX0, UPX1, .themida, .vmp0, random strings
Section entropyShannon entropy per sectionOne or more sections with entropy > 7.0 (near-random data)
.text section sizeRaw size of code sectionAbnormally small — real code is compressed elsewhere
Section characteristicsWritable + executable flagsUnpacking stub needs to write and execute code
String countTotal extractable stringsVery few meaningful strings — most content is encrypted
Entry point locationWhich section contains the entry pointPoints to a section other than .text

Common Packers

PackerTypeDetectionDifficulty to Unpack
UPXOpen-source compressorSection names UPX0/UPX1, or upx -t checkEasy — upx -d sample.exe
Themida / WinLicenseCommercial protectorSection .themida, anti-debug, VM detectionHard — requires dedicated tools
VMProtectVM-based protectorSection .vmp0/.vmp1, code virtualizationVery hard — code is virtualized
MPRESSCompressorSection .MPRESS1/.MPRESS2Moderate — similar to UPX
Custom packersMalware-specificNo known signatures, unusual section layoutVaries — may require manual unpacking

Entropy Analysis

Shannon entropy measures randomness on a scale from 0 (perfectly ordered) to 8 (perfectly random). Encrypted or compressed data has high entropy. Normal compiled code has moderate entropy.

python3 -c "
import math, sys
with open(sys.argv[1], 'rb') as f:
    data = f.read()
freq = [0]*256
for b in data: freq[b] += 1
entropy = -sum((c/len(data)) * math.log2(c/len(data)) for c in freq if c > 0)
print(f'Entropy: {entropy:.4f}')
" suspicious.exe
Entropy RangeInterpretation
0.0 – 1.0Very structured (padding, null bytes)
4.0 – 6.0Normal compiled code or text
6.5 – 7.2Possibly compressed (could be legitimate resources like images)
7.2 – 8.0Almost certainly encrypted or compressed — strong packing indicator
🚨

High entropy alone does not prove packing. Legitimate binaries can contain high-entropy sections for compressed resources (images, fonts, embedded databases). The key is correlating entropy with other indicators: few imports + high entropy + unusual sections + small .text = packed. High entropy in .rsrc with normal everything else = probably just compressed resources.

Basic UPX Unpacking

UPX is the most common packer encountered in malware analysis. Unpacking is trivial when UPX headers are intact:

upx -t suspicious.exe
# suspicious.exe: UPX executable packer [OK]

upx -d suspicious.exe -o unpacked.exe

strings unpacked.exe | wc -l
# Now you see the real strings

Some malware authors modify UPX headers to prevent the standard upx -d command from working. In those cases, you need to fix the headers manually or use dynamic unpacking (dump the process memory after the unpacking stub runs).

Import Address Table (IAT) Analysis

The Import Address Table lists every external function the binary calls from Windows DLLs. This is one of the most powerful static analysis artifacts because the APIs a binary imports directly reveal its capabilities.

Suspicious API calls organized by malware capability — process injection, network, persistence, evasion, and crypto

Reading the Import Table

Linux (on a PE file):

objdump -x suspicious.exe | grep -A 100 "Import Address Table"

python3 -c "
import pefile
pe = pefile.PE('suspicious.exe')
for entry in pe.DIRECTORY_ENTRY_IMPORT:
    print(f'\n{entry.dll.decode()}:')
    for imp in entry.imports:
        print(f'  {imp.name.decode() if imp.name else hex(imp.ordinal)}')
"

Windows (PowerShell with dumpbin):

dumpbin /imports suspicious.exe

Suspicious API Call Categories

Process Injection:

APIWhat It DoesWhy Malware Uses It
OpenProcessOpens a handle to another processRequired first step for injection
VirtualAllocExAllocates memory in another processCreates space for injected code
WriteProcessMemoryWrites data to another process's memoryWrites the payload
CreateRemoteThreadCreates a thread in another processExecutes the injected payload
NtUnmapViewOfSectionUnmaps a section of a processProcess hollowing technique

Execution and Download:

APIWhat It DoesWhy Malware Uses It
WinExec / ShellExecuteAExecutes a command or programLaunches second-stage payloads
CreateProcessA/WCreates a new processSpawns child processes
URLDownloadToFileADownloads a file from a URLRetrieves additional payloads
InternetOpenA / HttpSendRequestAOpens HTTP connectionsC2 communication

Persistence:

APIWhat It DoesWhy Malware Uses It
RegSetValueExASets a registry valueAutorun keys, configuration storage
CreateServiceACreates a Windows serviceService-based persistence

Defense Evasion:

APIWhat It DoesWhy Malware Uses It
IsDebuggerPresentChecks if a debugger is attachedAnti-analysis check
GetTickCount / SleepTime-based checksSandbox evasion (detect fast-forwarding)
VirtualProtectChanges memory protection flagsMakes memory executable for unpacking
💡

A binary that imports only LoadLibraryA and GetProcAddress is almost certainly packed or using dynamic API resolution. These two functions allow a program to load any DLL and resolve any function at runtime — hiding the real imports from static analysis. In Lab 11.2, you will compare the import table of a packed sample (2 imports) versus its unpacked version (50+ imports) to see exactly what was hidden.

DLL Analysis

Malware does not always arrive as a standalone .exe. Many advanced threats use DLL side-loading or DLL injection — placing a malicious DLL where a legitimate application will load it.

When analyzing a DLL:

python3 -c "
import pefile
pe = pefile.PE('suspicious.dll')
print(f'Exports ({len(pe.DIRECTORY_ENTRY_EXPORT.symbols)}):')
for exp in pe.DIRECTORY_ENTRY_EXPORT.symbols:
    print(f'  {exp.ordinal}: {exp.name.decode() if exp.name else "(ordinal only)"}')
"
DLL Red FlagWhat It Means
Export names that mimic legitimate Windows DLLsPossible DLL hijacking/side-loading
Single export function with a generic name (ServiceMain, DllMain)Minimal interface, likely a loader
Export by ordinal only (no function names)Hiding function purposes
DLL with no exports at allDesigned to be loaded via LoadLibrary for DllMain execution only

Key Takeaways

  • Cryptographic hashing (MD5, SHA1, SHA256) provides unique file identification — always compute all three for cross-platform lookups
  • Fuzzy hashing (ssdeep) identifies malware variants that share code despite different cryptographic hashes
  • Hash lookups on VirusTotal, MalwareBazaar, and MISP reveal whether the security community has already analyzed your sample — search by hash before uploading files
  • Packing compresses or encrypts binaries to defeat static analysis — detect it through high entropy, few imports, unusual sections, and small .text
  • UPX is the most common packer and trivially unpacked with upx -d; commercial protectors like Themida and VMProtect require advanced techniques
  • Entropy analysis (Shannon entropy) quantifies randomness — sections above 7.0 are almost certainly encrypted or compressed
  • Import Address Table analysis reveals a binary's capabilities: process injection APIs, network functions, persistence mechanisms, and evasion techniques
  • A binary importing only LoadLibraryA and GetProcAddress is hiding its real imports through dynamic resolution — a key packing indicator
  • DLL analysis extends the same techniques to side-loading and injection scenarios common in advanced threats

What's Next

Static analysis has told you what the file contains, how it is built, and what APIs it calls. But there are questions static analysis cannot answer: What does the malware actually do when it runs? What files does it create? What processes does it spawn? What network connections does it make? In Lesson 11.3, you will cross into dynamic analysis — executing malware in a controlled sandbox and monitoring its behavior with Process Monitor, Process Explorer, and Autoruns to build a complete behavioral profile.

Knowledge Check: Hashing, Packing & Imports

10 questions · 70% to pass

1

Which hashing algorithm is considered the gold standard for malware identification due to its resistance to collisions?

2

What problem does ssdeep (fuzzy hashing) solve that cryptographic hashing cannot?

3

In Lab 11.2, you analyze a packed binary and find it imports only LoadLibraryA and GetProcAddress. Why are these two imports significant?

4

What Shannon entropy range most strongly indicates that a PE section contains encrypted or compressed data?

5

Why should you search VirusTotal by hash rather than uploading the file during an active investigation?

6

Which packer can be trivially removed using its own command-line tool with the -d flag?

7

You find a binary in Lab 11.2 with sections named UPX0 and UPX1, high entropy in UPX1, and only 3 imported functions. What is the most likely explanation?

8

Which combination of Windows API imports most strongly suggests process injection capability?

9

On a Linux system, which command generates the SHA256 hash of a file?

10

A DLL has no named exports and only exports functions by ordinal. What does this suggest?

0/10 answered