VM Obfuscation: Layer by Layer

In the last post, I built a basic JavaScript VM that compiles detection logic into bytecode. Push, pop, branch - stack machine fundamentals. It worked, but the VM itself was still readable JavaScript. Beautify it, read the switch cases, map the opcodes. Twenty minutes and you’ve got a decompiler.

This post is about what happens when you start stacking protection on top of that foundation. Each layer targets a different attack surface, and the compounding effect is what makes production VMs like Shape and BotGuard so painful to reverse.

The Build Pipeline

Before getting into individual layers, it’s worth understanding how the system is structured. Each protection is a standalone transform that takes bytecode in and spits modified bytecode out. They chain together:

Source JS → Compile → Fuse → Dead Code → Shuffle → Self-Modify → Encrypt → Output

Each step is a module in /protection/ with a single responsibility. Toggle any of them on or off in the config. This matters because when something breaks (and it will break), you need to isolate which layer caused it. I learned this the hard way after spending an hour debugging what turned out to be a shuffle pass corrupting jump addresses because it didn’t know about a new opcode’s operand size.

The config looks like this:

const config = {
    dispatchMode: 'function_table',
    useIndirectDispatch: true,
    shuffleHandlers: true,
    deadCode: true,
    encrypt: true,
    encryptSections: 3,
    shuffleOps: true,
    fusion: true,
    selfModify: true,
}

Everything composable. Everything toggleable.

Function Table Dispatch

The first version of my VM used a switch statement. Every bytecode VM tutorial does this:

while (true) {
    switch (bytecode[pc++]) {
        case 1: /* CONST */ break;
        case 2: /* ADD */ break;
        // ...
    }
}

The problem is obvious when you think about it from the attacker’s perspective. Each case label is an opcode. The structure of the switch literally maps opcodes to operations. An attacker reads the cases and has a complete instruction set reference.

Function table dispatch replaces the switch with an array of anonymous functions:

const h = new Array(256);
h[87] = function() { stack.push(constants[bytecode[pc++]]); };
h[193] = function() { const r=stack.pop()+stack.pop(); stack.push(r); };
// ...
while (true) { h[bytecode[pc++]](); }

No case labels. No structure telling you which index maps to what. Every handler is just a function at some position in a 256-element array. You can still figure it out by reading each function, but it’s more work - and more importantly, it’s amenable to further obfuscation.

Indirect Dispatch

Function table dispatch is better, but the dispatch loop itself is still readable: h[bytecode[pc++]](). An attacker knows that whatever number comes out of the bytecode array is a direct index into the handler table.

Indirect dispatch adds a mathematical transformation:

while (true) {
    const r = h[(bytecode[pc++] * 35 + 224) & 255]();
    if (r === -1) return variables;
}

Now the raw bytecode values don’t directly correspond to handler indices. The value 47 in the bytecode doesn’t call h[47] - it calls h[(47 * 35 + 224) & 255]. The multiplier and offset change every build.

This is a small thing, but it breaks the most basic assumption a reverse engineer makes: that bytecode values are opcode indices. Every tool, every script they’ve written that assumes direct mapping needs to be updated. The formula itself is easy to extract once you find it, but you have to find it first.

Handler Diversity

Here’s something that bugged me about most VM tutorials. Every handler has exactly one implementation. ADD always looks the same. Once you’ve identified it, you’ve identified it forever.

My handlers randomly select from multiple equivalent implementations at build time:

ADD: (mode) => pick([
    `const a=stack.pop();stack.push(a+stack.pop());`,
    `const b=stack.pop();const a=stack.pop();stack.push(a+b);`,
    `const r=stack.pop()+stack.pop();stack.push(r);`,
    `{let b=stack.pop(),a=stack.pop();stack.push(a+b);}`,
]),

Four ways to add two numbers. Same result, different code shape. Build twice, get different handler bodies. Any pattern-matching tool that identifies ADD by its code structure matches one variant and misses the others.

Combined with opcode shuffling - where ADD might be opcode 87 in one build and opcode 12 in the next - and you’ve got a situation where static analysis tools have an extremely short shelf life.

Superoperator Fusion

Some instruction sequences show up constantly. CONST followed by STORE (initializing a variable), LOAD followed by CONST followed by ADD (incrementing). Standard compiler optimization is to fuse these into single “superoperators”:

// Before: CONST 0, STORE 0 (two instructions, four bytes)
// After:  CONST_STORE 0 0 (one instruction, three bytes)

The performance benefit is real - fewer dispatch cycles, fewer stack operations. But the obfuscation benefit is what matters here. A reverse engineer expecting a standard instruction set now has to deal with compound operations that don’t exist in any reference. Their disassembler doesn’t know CONST_STORE takes two operands. Their tracer misinterprets the bytecode layout. Everything downstream breaks.

I implemented four fused operations:

CONST_STORE: 26,      // push constant + store to slot
LOAD_CONST_ADD: 27,   // load variable + push constant + add
LOAD_CONST_LT: 28,    // load variable + push constant + compare
LOAD_CONST_SUB: 29,   // load variable + push constant + subtract

These cover the hot paths in loop-heavy code. A simple for loop that used to be 8+ instructions per iteration drops to 3-4.

Dead Code Insertion

This one is straightforward but effective. Between real instructions, insert fake instructions that never execute because they’re preceded by unconditional jumps:

CONST 0          ; real
STORE 0          ; real
JUMP [skip]      ; jump over garbage
CONST 3          ; dead - never reached
ADD              ; dead
LOAD 1           ; dead
[skip]:
CHECK_TIME       ; real - execution continues here

The dead instructions are syntactically valid. They reference real constants, real variable slots. A static analysis tool can’t distinguish them from real code without doing control flow analysis - which means building a CFG, which means understanding the jump targets, which means dealing with all the other obfuscation layers first.

I insert dead code blocks of 1-3 instructions after each real instruction, with a configurable density. More dead code means bigger bytecode and slower execution, but harder static analysis. It’s a knob you can tune.

Multi-Section Bytecode Encryption

Single-key encryption is a solved problem. Find the key, XOR the bytecode, done. Multi-section encryption makes this harder by splitting the bytecode into independently encrypted sections:

[DECRYPT header: seed=0xA3F1, length=24]
[24 bytes of encrypted bytecode - section 1]
[DECRYPT header: seed=0x7B22, length=18]
[18 bytes of encrypted bytecode - section 2]
[DECRYPT header: seed=0x5E09, length=31]
[31 bytes of encrypted bytecode - section 3]

Each section has its own seed and must be decrypted separately. The DECRYPT handler uses a PRNG that evolves as it decrypts - each byte’s key depends on all previous bytes in that section. This forces sequential processing. You can’t jump to the middle and decrypt from there.

The tricky part was making DECRYPT self-disabling. In a program with loops, the PC hits the same DECRYPT header multiple times. If it re-encrypts already-decrypted bytecode, everything explodes. The fix: after decrypting, the handler zeros out its own length byte. Subsequent passes see length 0 and skip it as a no-op.

DECRYPT: () => `const sh=bytecode[pc++];const sl=bytecode[pc++];
    let st=(sh<<8)|sl;const l=bytecode[pc++];
    bytecode[pc-1]=0;  // self-disable
    for(let i=0;i<l;i++){
        bytecode[pc+i]=bytecode[pc+i]^(st&0xFF);
        st=(st*31+7)&0xFFFF;
    }`,

This was one of the harder debugging sessions. Jump addresses need to account for the inserted DECRYPT headers, which shift everything forward. I built a centralized bytecode_utils.js that tracks operand sizes for every opcode, so the address fixup pass knows exactly how wide each instruction is.

Self-Modifying Opcodes

This is the layer I’m most happy with. Some handler functions aren’t installed in the handler table at build time. Instead, they’re stored as encrypted strings in the constants array. At runtime, before the main program executes, INSTALL_HANDLER instructions decrypt and eval them into the handler table:

INSTALL_HANDLER: () => `const idx=bytecode[pc++];const key=bytecode[pc++];
    let a=constants[idx];let d='';
    for(let i=0;i<a.length;i++) d+=String.fromCharCode(a[i]^key);
    eval(d);`,

An attacker doing static analysis sees an incomplete handler table. Some indices are empty. The code that fills them is buried in encrypted constant data that only makes sense at runtime. They can’t just read the output.js and build a complete opcode map - they have to actually execute (or simulate) the install phase first.

The build system decides which handlers to defer based on complexity. Simple handlers like CONST and STORE stay inline. More interesting ones - the comparison operators, the arithmetic, the control flow - get deferred. Each build randomly selects which handlers to defer, so the split is different every time.

Putting It All Together

Here’s what the output looks like with everything enabled:

function run(bytecode,constants){
const stack=[];const variables=[];let pc=0;const callStack=[];let lastCheck=0;
const h=new Array(256);
h[199]=function(){/* CHECK_HOOKS */};
h[181]=function(){/* GET_PROP */};
h[87]=function(){stack.push(constants[bytecode[pc++]]);};
h[77]=function(){/* DECRYPT */};
// ... 15 more inline handlers ...
while(true){const r=h[(bytecode[pc++]*35+224)&255]();if(r===-1)return variables;}
}
const BYTECODE=[47,120,238,27,243,220,94,100,168,202,19,190...];
const CONSTANTS=[0,0,10,1,1,[120,75,34,33,34,77,45,118...],...];

The bytecode is encrypted in multiple sections. Some handlers are missing from the table - they get installed from encrypted constants during the first few instructions. The dispatch uses indirect addressing through a mathematical formula. The opcode values are shuffled. Dead code is interleaved throughout.

An attacker looking at this has to:

Find the dispatch formula to map bytecode values to handler indices
Read each handler function to build an instruction set
Notice that some handlers are missing and trace the INSTALL_HANDLER calls
Decrypt the deferred handlers from the constants array
Decrypt each bytecode section with its own seed
Filter out dead code blocks
Account for fused superoperators in their disassembler
Deal with the fact that all of the above changes on the next build

Each layer alone is beatable. Stacked together, they compound. That’s the whole point.

What’s Next

The VM itself is one piece. The other piece is what it protects. Right now my compiler handles enough JavaScript to run real detection logic - variables, arithmetic, comparisons, loops, if/else, function calls, browser API access via method calls. Today I added i++/i--, !==/>=/<=, typeof, unary operators, and console.log() style method invocations.

The language coverage matters because a VM that can only add two numbers isn’t protecting anything real. The goal is compiling actual fingerprinting code - canvas hashing, WebGL renderer strings, font enumeration - into bytecode that runs through all these obfuscation layers.

On the anti-debugging side, I experimented with environment checks (detecting hooked console methods, headless browsers, webdriver flags) and timing-based debugger detection. The environment checks work but are brittle - a sophisticated attacker can spoof any API. Talking to people who’ve built production VMs, the consensus is that instruction dispatch latency and bytecode checksums are the two that actually matter. Latency catches debuggers, checksums catch patching. Everything else is noise.

I’ll be looking at bytecode integrity checksums next, along with fake source maps and stack trace pollution - tricks that target the attacker’s tooling rather than trying to detect the attacker directly.

Source: private repository (VM code not public while in development)