Building a JavaScript VM for Antibot Protection

I’ve spent months reversing other people’s VMs - TikTok, OpenAI Sentinel. At some point you start wondering: how hard is it to actually build one?

Turns out, not that hard. But more importantly, building one taught me things that reversing never did.

The Problem with Readable Code

Here’s the dirty secret of client-side bot detection: if an attacker can read your checks, they can bypass them.

Take a basic webdriver detection:

if (navigator.webdriver === true) {
    blockUser();
}

An attacker opens DevTools, searches for “webdriver”, finds this line, and knows exactly what to spoof. Game over in 30 seconds.

This is why serious antibot vendors don’t ship readable JavaScript. They ship bytecode.

Why VMs Win

When I first started reversing TikTok’s protection, I expected obfuscated JavaScript. Variable renaming, string encoding, control flow flattening - the usual tricks.

What I found instead was this:

const BYTECODE = [47,0,89,1,23,47,2,89,3,156,12,47,4...];

Numbers. Hundreds of them. No function names, no string literals, no recognizable structure. Just an array of integers fed into an interpreter.

The detection logic - all the fingerprinting, all the checks - compiled down to opcodes that mean nothing without the VM source. And even with the VM source, you’re reading assembly, not JavaScript.

This is the same approach used by Shape Security, Google BotGuard, and most serious antibot vendors. They all converged on the same solution. That’s not a coincidence.

What Makes VMs Hard to Reverse

Three things compound to make VM-protected code painful:

1. Loss of semantics. In normal JS, a function called checkWebdriver tells you what it does. In bytecode, it’s just [6, 0, 7, 1, 1, 2, 8]. You have to trace through the VM execution to understand what those numbers mean.

2. Custom instruction sets. Every vendor uses different opcodes. TikTok’s 47 might mean PUSH_CONST. OpenAI Sentinel’s 47 might mean something completely different. Your decompiler for one doesn’t work on the other.

3. Server-controlled updates. The bytecode comes from the server. They can change the detection logic, shuffle opcodes, add new checks - all without pushing any client update. By the time you’ve reversed today’s version, they’ve deployed a new one.

Building My Own

I wanted to understand VMs from the inside. Not just “how do I reverse this” but “why did they design it this way.”

The core is deceptively simple. A compiler that turns JavaScript into bytecode:

var x = 5 + 10

Becomes:

CONST 0      // push 5
CONST 1      // push 10  
ADD          // pop both, push 15
STORE 0      // save to variable slot 0
HALT

And a VM that executes it - basically a switch statement in a while loop. Push, pop, branch, repeat.

I started with 5 opcodes. By the end of the day I had 20. That’s enough to compile real detection logic:

var score = 0
var isWebdriver = navigator.webdriver === true
var pluginCount = navigator.plugins.length

if (isWebdriver) { score += 100 }
if (pluginCount === 0) { score += 30 }
if (pluginCount < 3) { score += 10 }

The output? An array of numbers and a constants table. The checks are there, but you’d never know it from looking.

The Token Trick

Detection alone isn’t enough. You need to communicate results to the server in a way that can’t be faked.

The naive approach: send a headlessScore field. Attacker intercepts, changes it to 0, done.

Better approach: generate a token that’s mathematically tied to the fingerprint data.

var token = score * 7919 + width * 31 + height * 17 + pluginCount * 13
token = token ^ 48879313  // XOR with secret
token = token % 1000000   // bound to 6 digits

The server knows the formula. It receives the fingerprint data and the token. It recalculates what the token should be. If they don’t match, someone tampered.

The attacker can’t just change score to 0 anymore. They’d also need to recalculate the token. But they don’t know the formula - it’s buried in bytecode. And even if they reverse it, the constants can change on the next build.

Opcode Shuffling

Here’s something I didn’t appreciate until I built it: randomizing opcodes is trivially easy and surprisingly effective.

In my first version, CONST was always opcode 1. An attacker could pattern-match: “lots of 1s followed by small numbers? Probably pushing constants.”

One shuffle function later:

// Build 1
{"CONST":143,"ADD":17,"STORE":148,"LOAD":169...}

// Build 2
{"CONST":12,"ADD":89,"STORE":203,"LOAD":7...}

Same source code. Completely different bytecode. Any static analysis tool tuned to Build 1 is useless against Build 2.

This is standard practice for serious antibot vendors. Rotate the opcode mappings regularly, and generic solvers have a shelf life measured in days.

Integrating with Real Detection

I had an existing project - antibot-sim - with detection logic sitting in plain JavaScript. Anyone could read exactly what I was checking.

Moving it into the VM took maybe 20 minutes. Load the compiled bytecode, grab the token from window.VM_RESULT, send it with the form. Server validates.

The detection logic is now opaque. An attacker looking at my JavaScript sees:

const BYTECODE = [143,0,148,0,54,1,152,2,143,3,216...];

They’d have to reverse the VM, trace through execution, figure out what’s being checked. Doable, but it went from 30 seconds to 30 minutes. For a determined attacker, that’s a speedbump. For script kiddies copying code from GitHub, it’s a wall.

The WASM Ceiling

JavaScript VMs have a fundamental weakness: the VM itself is JavaScript. Beautify it, read it, understand the opcodes.

This is where WebAssembly enters.

I ported my VM to C and compiled it to WASM as a proof of concept. Same logic, binary output. Now an attacker needs WASM disassembly tools, not just a JS beautifier.

EMSCRIPTEN_KEEPALIVE
int run_vm(unsigned char* bytecode, int* constants, int len) {
    int stack[256];
    int sp = 0;
    int pc = 0;
    
    while (pc < len) {
        switch (bytecode[pc++]) {
            case OP_CONST: stack[sp++] = constants[bytecode[pc++]]; break;
            case OP_ADD: { int b = stack[--sp]; stack[sp-1] += b; break; }
            case OP_HALT: return stack[--sp];
        }
    }
    return -1;
}

This is what BotGuard does. This is what Shape does. The VM interpreter compiled to WASM, bytecode served from the server, everything wrapped in additional obfuscation.

Getting 5 + 10 = 15 from a WASM VM took me about 15 minutes. Scaling that to full detection logic would take longer, but the path is clear.

What I Actually Learned

The complexity isn’t in the VM. Stack machines are simple. Push, pop, branch. The complexity is in everything around it - opcode shuffling, bytecode encryption, integrity checks, anti-debugging. The VM is just the foundation.

Server control is the real moat. My VM source is public. Doesn’t matter. If I control the bytecode, I control the detection logic. I can update checks without touching the client. The attacker is always reversing yesterday’s version.

Building reveals design tradeoffs. Why does TikTok use a stack machine with 77 opcodes when OpenAI gets by with 28? Why does TikTok use RC4 encryption while Sentinel uses simple XOR? When you build one yourself, these questions stop being academic.

The ceiling is high. Production antibot VMs have features I haven’t touched - stack encryption, string obfuscation, timing checks, self-modifying bytecode. Each layer adds reversing time. Stack enough layers and you’ve bought yourself meaningful protection.

Where This Goes

The JS VM works. The WASM version is a proof of concept. To make either production-grade:

Stack encryption (XOR values on push, decrypt on pop)
Encrypted constants array
Handler indirection (function table instead of switch)
Integrity verification (detect VM tampering)
Anti-debugging (timing checks, breakpoint detection)

But honestly? For learning purposes, the basic version taught me what I needed. I understand why these systems are built the way they are. Next time I’m reversing one, I’ll know what I’m looking at.

Antibot integration: github.com/B9ph0met/antibot-sim

Content on this site is licensed CC BY-NC-SA 4.0