Lesson 1 of 6·12 min read·Includes quiz

Static Analysis: PE Structure & Strings

PE file format, sections, headers, extracting strings, identifying suspicious indicators

What You'll Learn

  • Explain what static analysis is and why it is the first step in any malware investigation
  • Identify the key components of the PE (Portable Executable) file format: DOS header, PE header, section table, and entry point
  • Describe the purpose of common PE sections (.text, .data, .rsrc, .reloc) and what anomalies to look for in each
  • Extract strings from a binary using FLOSS and the strings command on both Windows and Linux
  • Identify suspicious string categories: URLs, IP addresses, file paths, API calls, registry keys, and encoded data
  • Apply a string analysis workflow to perform initial triage on an unknown binary
  • Connect static analysis findings to YARA rules (Module 10) and CyberChef for deeper investigation
  • Interpret compilation timestamps and linker metadata to assess binary origin and age

Why Static Analysis Comes First

When a suspicious file lands on your desk — pulled from a quarantine folder, extracted from a phishing email, or flagged by Wazuh — you face a critical decision: do you run it, or do you read it?

Static analysis means examining a binary without executing it. You inspect its structure, read its strings, examine its imports, check its metadata — all without letting it touch a running system. This is always the first step because it is safe, repeatable, and often reveals enough to classify a sample before you ever need a sandbox.

Analysis TypeWhat You DoRisk LevelSpeed
StaticExamine file structure, strings, imports, metadataZero — file never executesMinutes
DynamicExecute in a sandbox and observe behaviorContained — isolated environment10–30 minutes
Manual reverse engineeringDisassemble and read code logicZero — file never executesHours to days

Static analysis is not a replacement for dynamic analysis — it is a prerequisite. The goal is to extract as much intelligence as possible before execution. A 10-minute static pass might reveal the C2 server, the malware family, and the persistence mechanism — all without booting a sandbox. In Lab 11.1, you will perform a complete static analysis workflow on a real PE binary and extract actionable IOCs before any execution.

The PE File Format: Windows Executables Under the Microscope

Every .exe, .dll, .sys, and .scr file on Windows follows the Portable Executable (PE) format. Understanding PE structure is fundamental because malware authors must work within this format — and every shortcut they take leaves artifacts you can detect.

PE file structure — from DOS header through PE header, section table, and section data

DOS Header and DOS Stub

Every PE file begins with the DOS header, a legacy artifact from MS-DOS compatibility. The first two bytes are always 4D 5A (the ASCII characters "MZ" — named after Mark Zbikowski, a DOS architect). This magic number is how the operating system and analysis tools recognize a file as a PE executable.

The DOS header contains one critical field for analysts: e_lfanew — a 4-byte offset at position 0x3C that points to the PE header's location. Malware authors occasionally manipulate this value to confuse basic parsers.

Following the DOS header is the DOS stub — a small program that prints "This program cannot be run in DOS mode" if someone tries to run the executable in a DOS environment. Some malware replaces this stub with custom messages or junk data.

00000000  4D 5A 90 00 03 00 00 00  04 00 00 00 FF FF 00 00  |MZ..............|
00000010  B8 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  |........@.......|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 E0 00 00 00  |................|

PE Header (IMAGE_NT_HEADERS)

The PE header starts with the signature 50 45 00 00 ("PE\0\0") and contains two sub-structures:

File Header (COFF Header) — 20 bytes of critical metadata:

FieldWhat It Tells You
MachineTarget architecture: 0x14C = x86, 0x8664 = x64
NumberOfSectionsHow many sections the binary contains
TimeDateStampCompilation timestamp (Unix epoch format)
CharacteristicsFlags: executable, DLL, large address aware, etc.

Optional Header — despite the name, it is mandatory for executables:

FieldWhat It Tells You
AddressOfEntryPointRVA where execution begins — malware may point this to an unusual section
ImageBasePreferred load address (typically 0x00400000 for EXEs, 0x10000000 for DLLs)
SectionAlignment / FileAlignmentMemory and disk alignment values
SizeOfImageTotal size when loaded in memory
SubsystemGUI (0x02) vs Console (0x03) — a "GUI" app with no window is suspicious
DataDirectoryArray of 16 entries pointing to imports, exports, resources, relocations, etc.

Compilation timestamps are trivially spoofed. Malware authors routinely set fake timestamps to mislead investigators. A timestamp of January 1, 1970 (epoch zero) or a date far in the future is an obvious fake. A timestamp that exactly matches another known-good binary suggests timestomping. Use timestamps as one data point, never as conclusive evidence. Cross-reference with other metadata like the linker version and Rich header hash.

Section Table and Common Sections

After the PE header comes the section table — an array of headers describing each section in the binary. Every section has a name, virtual address, virtual size, raw size, and characteristics flags.

SectionPurposeWhat to Watch For
.textExecutable codeUnusually small .text + large unknown section = packed binary
.dataInitialized global and static variablesStrings, configuration data, embedded payloads
.rdataRead-only data, import/export tablesImport table analysis reveals API usage
.rsrcResources: icons, dialogs, version info, embedded filesEmbedded executables, encrypted payloads hidden as resources
.relocRelocation table for ASLRMissing .reloc with ASLR enabled = anomaly
UPX0, UPX1UPX packer sectionsClear indicator of UPX packing
.themidaThemida protectorCommercial packer/protector, common in crimeware
💡

Section names are cosmetic — the OS ignores them. Malware can name sections anything: .code, .xyz, or even an empty string. What matters is the characteristics flags. A section marked as both writable and executable (0xE0000020) is a red flag — legitimate software rarely needs self-modifying code outside of packers and JIT compilers.

Entry Point Analysis

The AddressOfEntryPoint field tells the OS where to start executing code. In legitimate software, this points into the .text section. Anomalies to watch for:

  • Entry point in a non-standard section (not .text) — suggests packing or injection
  • Entry point at the very end of a section — common in appended shellcode
  • Entry point at offset 0 of a section with high entropy — likely packed or encrypted
  • Entry point in a section with a suspicious name (UPX1, .packed, random characters)

Extracting and Analyzing Strings

Strings are the single most productive static analysis technique for initial triage. Embedded text in a binary reveals what the malware communicates with, what it modifies, and what tools or techniques it uses.

The strings Command

On Linux, the strings command extracts printable ASCII sequences of a minimum length (default 4 characters):

strings suspicious.exe | head -50

strings -n 8 suspicious.exe     # minimum 8 characters (reduces noise)

strings -e l suspicious.exe     # extract UTF-16LE strings (common in Windows binaries)

On Windows, Sysinternals strings.exe provides equivalent functionality:

strings64.exe -n 8 suspicious.exe

strings64.exe -accepteula suspicious.exe | Select-String -Pattern "http"

FLOSS: Beyond Basic Strings

The FLARE Obfuscated String Solver (FLOSS) from Mandiant goes far beyond strings. It uses static analysis techniques to automatically deobfuscate strings that malware encrypts or encodes at compile time:

floss suspicious.exe

floss --no stack_strings suspicious.exe     # skip stack strings for faster results

floss -o floss_output.json suspicious.exe   # JSON output for scripting
ToolFinds Static StringsFinds Stack StringsDeobfuscates Encoded Strings
stringsYesNoNo
FLOSSYesYesYes
🚨

Never run FLOSS on a file you suspect is malicious on your analysis workstation without isolation. FLOSS performs partial emulation to decode strings, which can trigger certain behaviors. Always run string extraction tools inside your analysis VM or container — never on your host system.

Suspicious String Categories

When reviewing extracted strings, categorize them systematically:

Network Indicators:

  • URLs: http://, https://, ftp://
  • IP addresses: 192.168., 10.0., or public IPs
  • Domain names: especially DGA-looking domains (xkjr2.duckdns.org)
  • User-Agent strings: Mozilla/5.0, custom agents

File System Indicators:

  • Windows paths: C:\\Users\\, C:\\Windows\\Temp\\, %APPDATA%
  • Linux paths: /tmp/, /etc/cron.d/, /var/log/
  • File extensions: .bat, .ps1, .vbs, .dll
  • Known malware drop locations: C:\\ProgramData\\, C:\\Users\\Public\\

Windows API Calls:

  • Process manipulation: CreateRemoteThread, VirtualAllocEx, WriteProcessMemory
  • Execution: WinExec, ShellExecute, CreateProcess
  • Network: InternetOpen, URLDownloadToFile, HttpSendRequest
  • Registry: RegSetValueEx, RegCreateKey
  • Crypto: CryptEncrypt, CryptDecrypt, BCryptEncrypt

Persistence Indicators:

  • Registry keys: SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run
  • Service creation: CreateService, sc create
  • Scheduled tasks: schtasks, at.exe

Encoded / Obfuscated Data:

  • Base64 strings: long alphanumeric sequences ending in = or ==
  • Hex-encoded data: continuous hex characters
  • XOR keys: short repeated byte sequences

String Analysis Workflow

Efficient string analysis follows a structured workflow that moves from broad extraction to targeted investigation:

String analysis workflow — from extraction through categorization, pivoting, and IOC generation

Step 1: Extract — Run strings (ASCII and UTF-16) and FLOSS on the binary. Pipe output to a file for reference.

strings -n 6 sample.exe > strings_ascii.txt
strings -n 6 -e l sample.exe > strings_utf16.txt
floss sample.exe > strings_floss.txt

Step 2: Filter noise — Remove common library strings, compiler artifacts, and Windows API boilerplate. Focus on unique, unusual, or contextually suspicious strings.

grep -iE "(http|ftp|\\.[a-z]{2,4}/|[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)" strings_ascii.txt

grep -iE "(CreateRemoteThread|VirtualAlloc|WriteProcessMemory|URLDownload)" strings_ascii.txt

grep -iE "(CurrentVersion\\\\Run|schtasks|cron)" strings_ascii.txt

Step 3: Categorize — Group findings into network IOCs, file system IOCs, behavioral indicators, and encoded data.

Step 4: Pivot — Take discovered IOCs and search for them in threat intelligence platforms. A URL found in strings can be checked in VirusTotal, MISP, or URLhaus. An API call pattern can be matched against known malware family profiles.

Step 5: Document — Record every finding with the offset where the string was found, the category, and its significance.

Connecting Static Analysis to Your Toolkit

Static analysis does not exist in isolation. Every finding connects to tools you already know:

FindingNext StepTool
Suspicious string patternWrite a detection rule for itYARA (Module 10)
Base64-encoded payloadDecode and analyze the payloadCyberChef
C2 domain or IPSearch threat intelligence feedsMISP (Module 5)
Compilation timestampCorrelate with campaign timelinesMISP timeline / ATT&CK
API call patternCreate endpoint detectionVelociraptor (Module 6)
File hash (MD5/SHA256)Check reputation databasesVirusTotal / MalwareBazaar
💡

YARA and static analysis are natural partners. In Module 10, you wrote YARA rules that match on strings and hex patterns. Every suspicious string you extract during static analysis is a candidate for a YARA rule. In Lab 11.1, you will practice the full loop: extract strings → write a YARA rule → scan a directory for additional samples matching the same patterns.

Linux ELF Binaries: The Other Side

While PE is the dominant format on Windows, Linux malware uses the ELF (Executable and Linkable Format). The same static analysis principles apply:

file suspicious_binary
# suspicious_binary: ELF 64-bit LSB executable, x86-64, dynamically linked

readelf -h suspicious_binary     # ELF header (entry point, architecture, type)

readelf -S suspicious_binary     # section headers (similar to PE sections)

readelf -d suspicious_binary     # dynamic section (shared library dependencies)

strings -n 8 suspicious_binary | grep -iE "(http|/tmp/|/bin/|socket|connect)"
PE ConceptELF Equivalent
.text section.text section
.data section.data / .bss sections
.rsrc sectionNo direct equivalent (resources handled differently)
Import Address Table.dynsym / .plt (dynamic symbols and procedure linkage table)
PE headerELF header (readelf -h)
DLL dependenciesShared library dependencies (ldd or readelf -d)

Key Takeaways

  • Static analysis examines a binary without executing it — it is always the first step because it is safe, fast, and often reveals enough to classify a sample
  • The PE format has a predictable structure: DOS header (MZ magic), PE header (compilation timestamp, entry point, characteristics), section table, and section data
  • Section anomalies reveal packing and tampering: writable+executable sections, entry points outside .text, unusual section names, or entropy mismatches
  • Compilation timestamps provide timeline intelligence but are trivially spoofed — always cross-reference with other metadata
  • String extraction using strings and FLOSS is the highest-value static technique: URLs, IPs, API calls, registry keys, and encoded data all reveal malware intent
  • Follow a structured string analysis workflow: extract → filter → categorize → pivot → document
  • Every static finding connects to your existing toolkit: strings feed YARA rules, encoded data feeds CyberChef, network IOCs feed MISP, API patterns feed Velociraptor hunts
  • ELF binaries on Linux follow the same analysis principles — use readelf, file, and strings instead of PE-specific tools

What's Next

You now know how to examine a binary's structure and extract strings — the "what is this file made of?" question. But two critical questions remain: "Have we seen this file before?" and "Is this file hiding something?" In Lesson 11.2, you will learn to hash files for reputation lookups, detect packers that compress and encrypt code, and analyze the Import Address Table to understand what Windows APIs a binary calls — the next layer of static analysis that separates commodity malware from sophisticated threats.

Knowledge Check: PE Structure & String Analysis

10 questions · 70% to pass

1

What is the primary advantage of static analysis over dynamic analysis as the first step in malware investigation?

2

What are the first two bytes (magic number) of every valid PE file?

3

Which PE section typically contains the executable code of a binary?

4

In Lab 11.1, you extract strings from a PE binary and find the string 'CreateRemoteThread'. What category of suspicious activity does this API call indicate?

5

What advantage does FLOSS provide over the standard 'strings' command?

6

You find a PE section with both the writable and executable characteristics flags set. Why is this a red flag?

7

During string analysis of a suspicious binary in Lab 11.1, you find 'SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run'. What does this indicate?

8

Why should compilation timestamps in PE headers be treated with caution during analysis?

9

What is the correct order of steps in a string analysis workflow for malware triage?

10

Which command extracts UTF-16 Little Endian strings from a binary on Linux — a critical step since Windows binaries often store strings in this encoding?

0/10 answered