Reverse Engineering TikTok’s Client-Side Protection: A Deep Dive into webmssdk.js

Introduction

This article documents my journey reverse engineering TikTok’s web-based anti-bot protection system. As someone new to JavaScript reverse engineering, my goal was twofold: to develop a practical understanding of how modern bot protection works from an adversarial perspective, and to document the process in a way that helps others learn these techniques.

The end objective is to fully understand and replicate the generation of TikTok’s request signatures—specifically the X-Bogus and X-Gnarly tokens—which are required to make authenticated API requests to TikTok’s servers.

Why TikTok?

TikTok presents one of the most sophisticated client-side protection systems currently deployed on the web. Unlike simpler anti-bot measures that rely on CAPTCHAs or basic fingerprinting, TikTok implements a custom virtual machine that executes obfuscated bytecode directly in the browser. This VM, contained within a file called webmssdk.js, handles fingerprint collection, request signing, and bot detection.

This makes it an ideal case study for several reasons: the protection is actively maintained and regularly updated, there’s existing community research to reference, and the techniques used represent the current state-of-the-art in client-side security.

What Does TikTok’s Protection Actually Do?

When you browse TikTok’s website, every API request is intercepted and signed before being sent to the server. The protection system:

  1. Collects device fingerprints — Hardware characteristics, browser properties, screen dimensions, installed fonts, WebGL renderer info, and dozens of other signals that help identify your specific device.

  2. Detects automation — Checks for signs of headless browsers (Puppeteer, Playwright), modified browser environments, and inconsistent JavaScript execution patterns.

  3. Signs requests — Generates cryptographic tokens (X-Bogus, X-Gnarly) based on the request parameters and collected fingerprint data. Without valid signatures, API requests return empty responses.

  4. Obfuscates everything — The code responsible for all of this runs inside a custom VM with encrypted bytecode, making static analysis extremely difficult.


Part 1: Initial Reconnaissance

Analyzing the Login Request

The first step in any reverse engineering project is understanding what you’re looking at. I started by examining a login request in Chrome DevTools.

When making a login request to TikTok, the following parameters are included:

Query String Parameters

Parameter Example Value Purpose
multi_login 1 Login mode flag
did 7584629183746502891 Device ID (unique per browser)
locale en Language setting
app_language en App language
aid 1459 Application ID
account_sdk_source web SDK source identifier
sdk_version 2.1.11-tiktokbeta.3 SDK version
verifyFp verify_m3xkj7bn_xK503qw_vdUQ_48bl_BcZ8_9vnYtwuCYQDv Fingerprint verification
shark_extra {JSON blob} Extended device info
msToken PmcQL8oXEU6Bphd1KBt_qSTdDv_m9W07WNvY086ZeKb7kk... Session token
X-Bogus DFSzsIVLfyp7IY7tCYMJ1xhGbwJ5 Request signature
X-Gnarly M8Fn3UJZ2SfRxcbhzmFlzHmqGy4xOjuvXtlfG14CgSraZily... Secondary signature

Request Headers

Header Purpose
X-Msdk-Info Encrypted device fingerprint blob
X-Tt-Passport-Csrf-Token CSRF protection
X-Tt-Passport-Ttwid-Ticket Session/auth ticket
Tt-Ticket-Guard-Public-Key Encryption public key
Tt-Ticket-Guard-Version 2
Tt-Ticket-Guard-Web-Version 1

The shark_extra parameter contains a JSON blob with detailed device information:

{
  "aid": 1459,
  "app_name": "Tik_Tok_Login",
  "channel": "tiktok_web",
  "device_platform": "web_pc",
  "device_id": "7584629183746502891",
  "region": "US",
  "os": "mac",
  "referer": "https://www.google.com/",
  "cookie_enabled": true,
  "screen_width": 2560,
  "screen_height": 1440,
  "browser_language": "en-US",
  "browser_platform": "MacIntel",
  "browser_name": "Mozilla",
  "browser_version": "5.0 (Macintosh; Intel Mac OS X 10_15_7)...",
  "browser_online": true,
  "tz_name": "America/Denver",
  "is_page_visible": true,
  "focus_state": true,
  "is_fullscreen": false,
  "history_len": 6,
  "user_is_login": false,
  "data_collection_enabled": false
}

This is behavioral fingerprinting in action—they’re collecting everything from screen dimensions to whether the page is currently visible.


Part 2: Setting Up the Deobfuscation Environment

Tools Used

  • Node.js — Runtime for Babel scripts
  • Babel (@babel/core, @babel/parser, @babel/traverse, @babel/generator, @babel/types) — AST manipulation
  • Chrome DevTools — Dynamic analysis and breakpoints
  • Tampermonkey (optional) — Script injection for live debugging

Project Structure

tiktok-re/
├── deobf/
│   ├── deobf.js      # Deobfuscation script
│   ├── vm.js         # Original webmssdk.js
│   └── output.js     # Deobfuscated output
├── node_modules/
└── package.json

Basic Deobfuscator

I started with a simple Babel-based deobfuscator that handles constant folding and boolean simplification:

const fs = require('fs');
const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default;
const generate = require('@babel/generator').default;
const t = require('@babel/types');

const code = fs.readFileSync('./deobf/vm.js', 'utf-8');

const ast = parser.parse(code, {
    sourceType: 'script',
    allowReturnOutsideFunction: true,
});

// Transform 1: Constant Folding
traverse(ast, {
    BinaryExpression(path) {
        const { left, right, operator } = path.node;
        
        if (t.isLiteral(left) && t.isLiteral(right)) {
            const leftVal = left.value;
            const rightVal = right.value;
            
            let result;
            switch (operator) {
                case '+': result = leftVal + rightVal; break;
                case '-': result = leftVal - rightVal; break;
                case '*': result = leftVal * rightVal; break;
                // ... other operators
                default: return;
            }
            
            path.replaceWith(t.valueToNode(result));
        }
    },
    
    // Transform !0 -> true, !1 -> false
    UnaryExpression(path) {
        const { operator, argument } = path.node;
        
        if (operator === '!' && t.isNumericLiteral(argument)) {
            path.replaceWith(t.booleanLiteral(!argument.value));
        }
        
        if (operator === 'void' && t.isNumericLiteral(argument) && argument.value === 0) {
            path.replaceWith(t.identifier('undefined'));
        }
    },
});

const output = generate(ast, { comments: false, compact: false });
fs.writeFileSync('./deobf/output.js', output.code);

Running this transformed !0 to true, !1 to false, and evaluated constant expressions, making the code significantly more readable.


Part 3: Understanding the Virtual Machine Architecture

The VM Structure

After deobfuscation, I identified the core VM interpreter function N:

function N(n, t, r, i, o, e) {
    var u = {
        C: n,      // Instruction pointer (program counter)
        o: [],     // Registers/operand stack  
        A: [],     // Call stack (exception handling)
        I: [],     // Local stack frame
        u: t,      // 'this' context
        D: e       // Additional state
    };
    
    // Initialize registers with constants
    for (u.o[0] = null, u.o[1] = undefined, u.o[2] = true, u.o[3] = false, 
         u.o[4] = m, u.o[5] = r, u.o[6] = i; u.C < /*bytecode length*/; ) {
        
        // Fetch 16-bit opcode
        var f = k[u.C++] << 8 | k[u.C++];
        
        try {
            I[f](u);  // Execute opcode handler
        } catch (n) {
            if (0 === u.A.length) throw n;
            u.I = [], u.I.push({...});
            u.C = u.A[u.A.length - 1].h;  // Jump to exception handler
        }
    }
    return U(u, 4);  // Return value from register 4
}

Key Components

Component Variable Purpose
Instruction Pointer u.C Current position in bytecode
Registers u.o Stack-based operand storage
Call Stack u.A Exception/call frame stack
Local Frame u.I Current stack frame
Bytecode k The actual instructions (82,852 bytes)
Opcode Handlers I Array of 349 handler functions

Opcode Handlers

The I array contains 349 handler functions, each implementing one VM instruction:

I = [
    function (n) {  // Opcode 0
        var r = n.o[6][0];
        n.o[4] = (n.u.o[14].v = "function" == typeof Symbol...
    },
    function (n) {  // Opcode 1
        var t = B(n), r = B(n), i = B(n);
        F(n, r, U(n, B(n))), F(n, t, U(n, i));
    },
    // ... 347 more handlers
];

Helper Functions

  • B(n) — Reads next value from bytecode
  • U(n, x) — Gets value from register x
  • F(n, x, val) — Sets register x to val
  • O[x] — String lookup table
  • M(a, b) — Module/property access

Key Opcodes Identified

Opcode Purpose Description
8 Custom Hash Uses 0xDEADBEEF magic number, multiplier 65599
41 UTF-8 Encoding String to bytes conversion
100 Bit Rotation Left rotate operation
170 MD5 Implementation Full MD5 hash function

Interesting Opcode: Custom Hash Function (Opcode 8)

// Opcode 8 - Custom hash using 0xDEADBEEF
function(n){
    for(var t=n, r=t.o[6][0], i=3735928559, o=0; o<32; o++)
        i = 65599 * i + r.charCodeAt(i % r.length) >>> 0;
    t.o[4] = i
}

This takes a string input and produces a 32-bit hash through 32 iterations with multiplier 65599.


Part 4: The Module System

TikTok’s VM uses a module object pattern rather than a traditional string array:

i.o[982].v = "X-Mssdk-Info"
i.o[986].v = "X-Mssdk-RC"
i.o[958].v = {encode: f, decode: f}  // Encoding utilities
i.o[970].v = function(n,t){return N(51076,i,this,arguments,0,96)}  // X-Bogus generator
i.o[971].v = function(n,t,r,o){return N(53059,i,this,arguments,0,134)}  // X-Gnarly generator

Key Module Indices

Index Content Purpose
958 {encode, decode} Hex encoding utilities
969 Function URL parameter setter
970 Function X-Bogus generator (entry: bytecode 51076)
971 Function X-Gnarly generator (entry: bytecode 53059)
982 "X-Mssdk-Info" Header name
986 "X-Mssdk-RC" Header name

Part 5: Locating the Token Generation

Using Chrome DevTools, I searched for “X-Bogus” in the Sources panel and found the exact code that generates the tokens:

// Token generation flow
v = n.u.o[970].v.call(void 0, c, i)      // Generate X-Bogus
s = n.u.o[971].v.call(void 0, c, o, t, e) // Generate X-Gnarly

// Add to URL
d = n.u.o[969].v.call(void 0, a, ["X-Bogus", v]);
n.o[4] = n.u.o[969].v.call(void 0, d, ["X-Gnarly", s])

Setting Breakpoints

By setting a breakpoint on the X-Bogus generation line, I was able to capture the exact inputs:

// Scope when breakpoint hit:
a: "https://www.tiktok.com/passport/web/user/login/?multi_login=..."
c: "multi_login=1&did=7584629183746502891&locale=en&app_language=en&aid=..."
i: "mix_mode=1&username=6264686068646b456268..."  // Hex-encoded credentials
o: "mix_mode=1&username=6264686068646b456268..."
e: {totalXHRRequests: 100, totalFetchRequests: 7, interceptedXHRRequests: ...}

The e object reveals behavioral fingerprinting—they track how many XHR and fetch requests you’ve made!


Part 6: Dynamic Analysis with Hooks

To capture multiple input/output pairs, I hooked the X-Bogus generator:

const original970 = n.u.o[970].v;
n.u.o[970].v = function(...args) {
    console.log('X-Bogus INPUT:', args);
    const result = original970.apply(this, args);
    console.log('X-Bogus OUTPUT:', result);
    return result;
};

Captured Input/Output Pairs

Input Args Output
['aid=1988&app_language=en&app_name=...', 'aid=1459'] DFSzsIVYPySuS1ofCY/k5XhGbwJQ
['multi_login=1&did=7584629183746502891...', 'aid=1459&support_webview=1'] DFSzsIVYUXVkoZofCY/k3XhGbwri
['msToken=xBARVhvrV_s6XE5raZ9i3W1Fe...', '{\"magic\":538969122,...}'] DFSzsIVYjN0CU1ofCY/k3XhGbwnT

X-Bogus Function Signature

X-Bogus = generate(
    arg0: string,  // Query string (including msToken)
    arg1: string   // Body data OR fingerprint JSON
)

Part 7: The Fingerprint Object

When the second argument is a fingerprint object (as JSON string):

{
    "magic": 538969122,        // Constant identifier
    "version": 1,              // Protocol version
    "dataType": 8,             // Data format type
    "strData": "3dN8Qq4q0GWT...",  // Encoded fingerprint
    "tspFromClient": 1765294852747  // Client timestamp (milliseconds)
}

Encoding Analysis

Testing the encode/decode module (958):

// Hex encoding
n.u.o[958].v.encode(new Uint8Array([116, 101, 115, 116]))
// Returns: '74657374' (hex for "test")

// Decoding X-Bogus reveals binary structure
n.u.o[958].v.decode("DFSzsIVL6D-7IY7tCY/hRXhGbwJh")
// Returns: Uint8Array with the token's binary representation

Part 8: Cracking the Encoding

Custom Base64 Alphabet

Through analysis, I discovered TikTok uses a custom base64 alphabet instead of the standard one:

// Standard Base64:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

// TikTok X-Bogus:
"Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe="

This explains why X-Bogus tokens always start with “DFS” — in this alphabet:

  • D is at index 0
  • F is at index 47
  • S is at index 60

Implementing the Decoder

const XBOGUS_ALPHABET = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=";

function xbDecode(str) {
    const alphabet = XBOGUS_ALPHABET;
    let result = [];
    for (let i = 0; i < str.length; i += 4) {
        const a = alphabet.indexOf(str[i]);
        const b = alphabet.indexOf(str[i + 1]);
        const c = alphabet.indexOf(str[i + 2]);
        const d = alphabet.indexOf(str[i + 3]);
        result.push((a << 2) | (b >> 4));
        if (c !== 64) result.push(((b & 15) << 4) | (c >> 2));
        if (d !== 64) result.push(((c & 3) << 6) | d);
    }
    return new Uint8Array(result);
}

function xbEncode(bytes) {
    const alphabet = XBOGUS_ALPHABET;
    let result = '';
    for (let i = 0; i < bytes.length; i += 3) {
        const a = bytes[i];
        const b = bytes[i + 1] || 0;
        const c = bytes[i + 2] || 0;
        result += alphabet[a >> 2];
        result += alphabet[((a & 3) << 4) | (b >> 4)];
        result += alphabet[((b & 15) << 2) | (c >> 6)];
        result += alphabet[c & 63];
    }
    return result;
}

Decoding X-Bogus Tokens

When decoded, X-Bogus tokens reveal a 21-byte structure:

xbDecode("DFSzsIVYPySuS1ofCY/D6hhGbwJn")
// Returns: Uint8Array(21) [2, 255, 45, 37, 110, 40, 175, 79, 44, 241, ...]

X-Bogus Byte Structure

Bytes Content Purpose
0-4 [2, 255, 45, 37, 110] Magic header (constant, encodes to “DFSzs”)
5-20 Variable (16 bytes) MD5 hash payload

The first 5 bytes are always the same — they encode to “DFSzs”, which is why all X-Bogus tokens start with this prefix!

The Payload is MD5

Bytes 5-20 (16 bytes) represent an MD5 hash. Comparing multiple samples:

Sample Payload (hex)
1 28af4f2cf17990fa8380505179952c72
2 0da5a02cf16086fa838b9c5179952cd1

This confirms the algorithm uses MD5 hashing of the input data.


Part 9: Complete Algorithm

Based on all findings, here’s the X-Bogus generation algorithm:

X-Bogus = customBase64Encode(
    MAGIC_HEADER +                    // [2, 255, 45, 37, 110]
    MD5(transformedInput)             // 16-byte hash
)

Key Components

  1. Magic Header: [2, 255, 45, 37, 110] — Encodes to “DFSzs”
  2. MD5 Payload: Hash of transformed input (query string + body data)
  3. Custom Encoding: Uses TikTok’s shuffled base64 alphabet

Implementation Skeleton

const XBOGUS_ALPHABET = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=";
const MAGIC_HEADER = [2, 255, 45, 37, 110];

function generateXBogus(queryString, bodyData) {
    // Transform inputs (exact transformation TBD)
    const combined = transformInput(queryString, bodyData);
    
    // Generate MD5 hash
    const hash = md5(combined);
    const hashBytes = hexToBytes(hash);
    
    // Build token
    const tokenBytes = new Uint8Array(21);
    tokenBytes.set(MAGIC_HEADER, 0);
    tokenBytes.set(hashBytes, 5);
    
    return xbEncode(tokenBytes);
}

What Remains

To achieve 100% accuracy:

  1. Determine exact input transformation — The inputs may be preprocessed before MD5
  2. Handle fingerprint JSON format — When second argument is a fingerprint object
  3. Verify across different request types — Login, API calls, etc.

Part 10: Findings Summary

What We Discovered

Component Finding
VM Architecture Custom stack-based VM with 349 opcodes, 82,852 bytes of bytecode
Module System Functions stored in i.o[index].v pattern
X-Bogus Entry Bytecode position 51076
X-Gnarly Entry Bytecode position 53059
Custom Alphabet Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=
Magic Header [2, 255, 45, 37, 110] (constant prefix)
Token Size 21 bytes → 28 characters encoded
Hash Algorithm MD5 (16 bytes = 128 bits)
Key Opcodes 8 (custom hash), 41 (UTF-8), 100 (bit rotate), 170 (MD5)

Key Opcodes

// Opcode 8 - Custom hash (0xDEADBEEF)
for(var i=3735928559, o=0; o<32; o++)
    i = 65599 * i + str.charCodeAt(i % str.length) >>> 0;

// Opcode 100 - Bit rotation
t.o[4] = r << i | r >>> 32 - i;

// Opcode 170 - Full MD5 implementation
// (Complete MD5 with all rounds, identical to standard MD5)

Conclusion

This research demonstrates the complexity of modern client-side protection systems. TikTok’s approach of using a custom VM with encrypted bytecode significantly raises the bar for reverse engineering compared to simple JavaScript obfuscation.

Key Takeaways

  1. Dynamic analysis is essential — Static analysis alone is insufficient for VM-protected code
  2. Hook and trace — Intercepting functions at runtime reveals input/output relationships
  3. Understand the architecture first — Mapping the VM structure before diving into opcodes saves time
  4. Document everything — Screenshots and logs are invaluable for complex analysis
  5. Pattern recognition — Magic numbers like 0xDEADBEEF help identify algorithm types

The Techniques

The methodology shown here forms a solid foundation for analyzing any JavaScript-based protection:

  1. Network analysis — Identify what tokens/signatures are being sent
  2. Babel deobfuscation — Make obfuscated code readable
  3. VM mapping — Understand the interpreter structure
  4. Breakpoint debugging — Capture exact inputs/outputs
  5. Function hooking — Log data flow at runtime
  6. Encoding analysis — Reverse custom base64/encoding schemes

Tools & Resources

Tools Used

  • Chrome DevTools (Network, Sources, Console)
  • Node.js + Babel (@babel/core, @babel/parser, @babel/traverse, @babel/generator)
  • VS Code

Useful Console Hooks

// Hook X-Bogus generation
window.orig970 = n.u.o[970].v;
n.u.o[970].v = function(...args) {
    console.log('=== X-Bogus Generation ===');
    console.log('Input 0:', args[0]);
    console.log('Input 1:', args[1]);
    const result = window.orig970.apply(this, args);
    console.log('Output:', result);
    return result;
};

// Decode function
window.xbDecode = function(str) {
    const alphabet = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=";
    let result = [];
    for (let i = 0; i < str.length; i += 4) {
        const a = alphabet.indexOf(str[i]);
        const b = alphabet.indexOf(str[i+1]);
        const c = alphabet.indexOf(str[i+2]);
        const d = alphabet.indexOf(str[i+3]);
        result.push((a << 2) | (b >> 4));
        if (c !== 64) result.push(((b & 15) << 4) | (c >> 2));
        if (d !== 64) result.push(((c & 3) << 6) | d);
    }
    return new Uint8Array(result);
};

References


This research is for educational purposes only. Understanding protection mechanisms helps build better security systems.