Reverse Engineering TikTok’s Client-Side Protection: A Deep Dive into webmssdk.js
Introduction
This article documents my journey reverse engineering TikTok’s web-based anti-bot protection system. As someone new to JavaScript reverse engineering, my goal was twofold: to develop a practical understanding of how modern bot protection works from an adversarial perspective, and to document the process in a way that helps others learn these techniques.
The end objective is to fully understand and replicate the generation of TikTok’s request signatures—specifically the X-Bogus and X-Gnarly tokens—which are required to make authenticated API requests to TikTok’s servers.
Why TikTok?
TikTok presents one of the most sophisticated client-side protection systems currently deployed on the web. Unlike simpler anti-bot measures that rely on CAPTCHAs or basic fingerprinting, TikTok implements a custom virtual machine that executes obfuscated bytecode directly in the browser. This VM, contained within a file called webmssdk.js, handles fingerprint collection, request signing, and bot detection.
This makes it an ideal case study for several reasons: the protection is actively maintained and regularly updated, there’s existing community research to reference, and the techniques used represent the current state-of-the-art in client-side security.
What Does TikTok’s Protection Actually Do?
When you browse TikTok’s website, every API request is intercepted and signed before being sent to the server. The protection system:
-
Collects device fingerprints — Hardware characteristics, browser properties, screen dimensions, installed fonts, WebGL renderer info, and dozens of other signals that help identify your specific device.
-
Detects automation — Checks for signs of headless browsers (Puppeteer, Playwright), modified browser environments, and inconsistent JavaScript execution patterns.
-
Signs requests — Generates cryptographic tokens (X-Bogus, X-Gnarly) based on the request parameters and collected fingerprint data. Without valid signatures, API requests return empty responses.
-
Obfuscates everything — The code responsible for all of this runs inside a custom VM with encrypted bytecode, making static analysis extremely difficult.
Part 1: Initial Reconnaissance
Analyzing the Login Request
The first step in any reverse engineering project is understanding what you’re looking at. I started by examining a login request in Chrome DevTools.
When making a login request to TikTok, the following parameters are included:
Query String Parameters
| Parameter | Example Value | Purpose |
|---|---|---|
multi_login |
1 |
Login mode flag |
did |
7584629183746502891 |
Device ID (unique per browser) |
locale |
en |
Language setting |
app_language |
en |
App language |
aid |
1459 |
Application ID |
account_sdk_source |
web |
SDK source identifier |
sdk_version |
2.1.11-tiktokbeta.3 |
SDK version |
verifyFp |
verify_m3xkj7bn_xK503qw_vdUQ_48bl_BcZ8_9vnYtwuCYQDv |
Fingerprint verification |
shark_extra |
{JSON blob} |
Extended device info |
msToken |
PmcQL8oXEU6Bphd1KBt_qSTdDv_m9W07WNvY086ZeKb7kk... |
Session token |
X-Bogus |
DFSzsIVLfyp7IY7tCYMJ1xhGbwJ5 |
Request signature |
X-Gnarly |
M8Fn3UJZ2SfRxcbhzmFlzHmqGy4xOjuvXtlfG14CgSraZily... |
Secondary signature |
Request Headers
| Header | Purpose |
|---|---|
X-Msdk-Info |
Encrypted device fingerprint blob |
X-Tt-Passport-Csrf-Token |
CSRF protection |
X-Tt-Passport-Ttwid-Ticket |
Session/auth ticket |
Tt-Ticket-Guard-Public-Key |
Encryption public key |
Tt-Ticket-Guard-Version |
2 |
Tt-Ticket-Guard-Web-Version |
1 |
The shark_extra parameter contains a JSON blob with detailed device information:
{
"aid": 1459,
"app_name": "Tik_Tok_Login",
"channel": "tiktok_web",
"device_platform": "web_pc",
"device_id": "7584629183746502891",
"region": "US",
"os": "mac",
"referer": "https://www.google.com/",
"cookie_enabled": true,
"screen_width": 2560,
"screen_height": 1440,
"browser_language": "en-US",
"browser_platform": "MacIntel",
"browser_name": "Mozilla",
"browser_version": "5.0 (Macintosh; Intel Mac OS X 10_15_7)...",
"browser_online": true,
"tz_name": "America/Denver",
"is_page_visible": true,
"focus_state": true,
"is_fullscreen": false,
"history_len": 6,
"user_is_login": false,
"data_collection_enabled": false
}
This is behavioral fingerprinting in action—they’re collecting everything from screen dimensions to whether the page is currently visible.
Part 2: Setting Up the Deobfuscation Environment
Tools Used
- Node.js — Runtime for Babel scripts
- Babel (
@babel/core,@babel/parser,@babel/traverse,@babel/generator,@babel/types) — AST manipulation - Chrome DevTools — Dynamic analysis and breakpoints
- Tampermonkey (optional) — Script injection for live debugging
Project Structure
tiktok-re/
├── deobf/
│ ├── deobf.js # Deobfuscation script
│ ├── vm.js # Original webmssdk.js
│ └── output.js # Deobfuscated output
├── node_modules/
└── package.json
Basic Deobfuscator
I started with a simple Babel-based deobfuscator that handles constant folding and boolean simplification:
const fs = require('fs');
const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default;
const generate = require('@babel/generator').default;
const t = require('@babel/types');
const code = fs.readFileSync('./deobf/vm.js', 'utf-8');
const ast = parser.parse(code, {
sourceType: 'script',
allowReturnOutsideFunction: true,
});
// Transform 1: Constant Folding
traverse(ast, {
BinaryExpression(path) {
const { left, right, operator } = path.node;
if (t.isLiteral(left) && t.isLiteral(right)) {
const leftVal = left.value;
const rightVal = right.value;
let result;
switch (operator) {
case '+': result = leftVal + rightVal; break;
case '-': result = leftVal - rightVal; break;
case '*': result = leftVal * rightVal; break;
// ... other operators
default: return;
}
path.replaceWith(t.valueToNode(result));
}
},
// Transform !0 -> true, !1 -> false
UnaryExpression(path) {
const { operator, argument } = path.node;
if (operator === '!' && t.isNumericLiteral(argument)) {
path.replaceWith(t.booleanLiteral(!argument.value));
}
if (operator === 'void' && t.isNumericLiteral(argument) && argument.value === 0) {
path.replaceWith(t.identifier('undefined'));
}
},
});
const output = generate(ast, { comments: false, compact: false });
fs.writeFileSync('./deobf/output.js', output.code);
Running this transformed !0 to true, !1 to false, and evaluated constant expressions, making the code significantly more readable.
Part 3: Understanding the Virtual Machine Architecture
The VM Structure
After deobfuscation, I identified the core VM interpreter function N:
function N(n, t, r, i, o, e) {
var u = {
C: n, // Instruction pointer (program counter)
o: [], // Registers/operand stack
A: [], // Call stack (exception handling)
I: [], // Local stack frame
u: t, // 'this' context
D: e // Additional state
};
// Initialize registers with constants
for (u.o[0] = null, u.o[1] = undefined, u.o[2] = true, u.o[3] = false,
u.o[4] = m, u.o[5] = r, u.o[6] = i; u.C < /*bytecode length*/; ) {
// Fetch 16-bit opcode
var f = k[u.C++] << 8 | k[u.C++];
try {
I[f](u); // Execute opcode handler
} catch (n) {
if (0 === u.A.length) throw n;
u.I = [], u.I.push({...});
u.C = u.A[u.A.length - 1].h; // Jump to exception handler
}
}
return U(u, 4); // Return value from register 4
}
Key Components
| Component | Variable | Purpose |
|---|---|---|
| Instruction Pointer | u.C |
Current position in bytecode |
| Registers | u.o |
Stack-based operand storage |
| Call Stack | u.A |
Exception/call frame stack |
| Local Frame | u.I |
Current stack frame |
| Bytecode | k |
The actual instructions (82,852 bytes) |
| Opcode Handlers | I |
Array of 349 handler functions |
Opcode Handlers
The I array contains 349 handler functions, each implementing one VM instruction:
I = [
function (n) { // Opcode 0
var r = n.o[6][0];
n.o[4] = (n.u.o[14].v = "function" == typeof Symbol...
},
function (n) { // Opcode 1
var t = B(n), r = B(n), i = B(n);
F(n, r, U(n, B(n))), F(n, t, U(n, i));
},
// ... 347 more handlers
];
Helper Functions
B(n)— Reads next value from bytecodeU(n, x)— Gets value from registerxF(n, x, val)— Sets registerxtovalO[x]— String lookup tableM(a, b)— Module/property access
Key Opcodes Identified
| Opcode | Purpose | Description |
|---|---|---|
| 8 | Custom Hash | Uses 0xDEADBEEF magic number, multiplier 65599 |
| 41 | UTF-8 Encoding | String to bytes conversion |
| 100 | Bit Rotation | Left rotate operation |
| 170 | MD5 Implementation | Full MD5 hash function |
Interesting Opcode: Custom Hash Function (Opcode 8)
// Opcode 8 - Custom hash using 0xDEADBEEF
function(n){
for(var t=n, r=t.o[6][0], i=3735928559, o=0; o<32; o++)
i = 65599 * i + r.charCodeAt(i % r.length) >>> 0;
t.o[4] = i
}
This takes a string input and produces a 32-bit hash through 32 iterations with multiplier 65599.
Part 4: The Module System
TikTok’s VM uses a module object pattern rather than a traditional string array:
i.o[982].v = "X-Mssdk-Info"
i.o[986].v = "X-Mssdk-RC"
i.o[958].v = {encode: f, decode: f} // Encoding utilities
i.o[970].v = function(n,t){return N(51076,i,this,arguments,0,96)} // X-Bogus generator
i.o[971].v = function(n,t,r,o){return N(53059,i,this,arguments,0,134)} // X-Gnarly generator
Key Module Indices
| Index | Content | Purpose |
|---|---|---|
| 958 | {encode, decode} |
Hex encoding utilities |
| 969 | Function | URL parameter setter |
| 970 | Function | X-Bogus generator (entry: bytecode 51076) |
| 971 | Function | X-Gnarly generator (entry: bytecode 53059) |
| 982 | "X-Mssdk-Info" |
Header name |
| 986 | "X-Mssdk-RC" |
Header name |
Part 5: Locating the Token Generation
Using Chrome DevTools, I searched for “X-Bogus” in the Sources panel and found the exact code that generates the tokens:
// Token generation flow
v = n.u.o[970].v.call(void 0, c, i) // Generate X-Bogus
s = n.u.o[971].v.call(void 0, c, o, t, e) // Generate X-Gnarly
// Add to URL
d = n.u.o[969].v.call(void 0, a, ["X-Bogus", v]);
n.o[4] = n.u.o[969].v.call(void 0, d, ["X-Gnarly", s])
Setting Breakpoints
By setting a breakpoint on the X-Bogus generation line, I was able to capture the exact inputs:
// Scope when breakpoint hit:
a: "https://www.tiktok.com/passport/web/user/login/?multi_login=..."
c: "multi_login=1&did=7584629183746502891&locale=en&app_language=en&aid=..."
i: "mix_mode=1&username=6264686068646b456268..." // Hex-encoded credentials
o: "mix_mode=1&username=6264686068646b456268..."
e: {totalXHRRequests: 100, totalFetchRequests: 7, interceptedXHRRequests: ...}
The e object reveals behavioral fingerprinting—they track how many XHR and fetch requests you’ve made!
Part 6: Dynamic Analysis with Hooks
To capture multiple input/output pairs, I hooked the X-Bogus generator:
const original970 = n.u.o[970].v;
n.u.o[970].v = function(...args) {
console.log('X-Bogus INPUT:', args);
const result = original970.apply(this, args);
console.log('X-Bogus OUTPUT:', result);
return result;
};
Captured Input/Output Pairs
| Input Args | Output |
|---|---|
['aid=1988&app_language=en&app_name=...', 'aid=1459'] |
DFSzsIVYPySuS1ofCY/k5XhGbwJQ |
['multi_login=1&did=7584629183746502891...', 'aid=1459&support_webview=1'] |
DFSzsIVYUXVkoZofCY/k3XhGbwri |
['msToken=xBARVhvrV_s6XE5raZ9i3W1Fe...', '{\"magic\":538969122,...}'] |
DFSzsIVYjN0CU1ofCY/k3XhGbwnT |
X-Bogus Function Signature
X-Bogus = generate(
arg0: string, // Query string (including msToken)
arg1: string // Body data OR fingerprint JSON
)
Part 7: The Fingerprint Object
When the second argument is a fingerprint object (as JSON string):
{
"magic": 538969122, // Constant identifier
"version": 1, // Protocol version
"dataType": 8, // Data format type
"strData": "3dN8Qq4q0GWT...", // Encoded fingerprint
"tspFromClient": 1765294852747 // Client timestamp (milliseconds)
}
Encoding Analysis
Testing the encode/decode module (958):
// Hex encoding
n.u.o[958].v.encode(new Uint8Array([116, 101, 115, 116]))
// Returns: '74657374' (hex for "test")
// Decoding X-Bogus reveals binary structure
n.u.o[958].v.decode("DFSzsIVL6D-7IY7tCY/hRXhGbwJh")
// Returns: Uint8Array with the token's binary representation
Part 8: Cracking the Encoding
Custom Base64 Alphabet
Through analysis, I discovered TikTok uses a custom base64 alphabet instead of the standard one:
// Standard Base64:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
// TikTok X-Bogus:
"Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe="
This explains why X-Bogus tokens always start with “DFS” — in this alphabet:
Dis at index 0Fis at index 47Sis at index 60
Implementing the Decoder
const XBOGUS_ALPHABET = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=";
function xbDecode(str) {
const alphabet = XBOGUS_ALPHABET;
let result = [];
for (let i = 0; i < str.length; i += 4) {
const a = alphabet.indexOf(str[i]);
const b = alphabet.indexOf(str[i + 1]);
const c = alphabet.indexOf(str[i + 2]);
const d = alphabet.indexOf(str[i + 3]);
result.push((a << 2) | (b >> 4));
if (c !== 64) result.push(((b & 15) << 4) | (c >> 2));
if (d !== 64) result.push(((c & 3) << 6) | d);
}
return new Uint8Array(result);
}
function xbEncode(bytes) {
const alphabet = XBOGUS_ALPHABET;
let result = '';
for (let i = 0; i < bytes.length; i += 3) {
const a = bytes[i];
const b = bytes[i + 1] || 0;
const c = bytes[i + 2] || 0;
result += alphabet[a >> 2];
result += alphabet[((a & 3) << 4) | (b >> 4)];
result += alphabet[((b & 15) << 2) | (c >> 6)];
result += alphabet[c & 63];
}
return result;
}
Decoding X-Bogus Tokens
When decoded, X-Bogus tokens reveal a 21-byte structure:
xbDecode("DFSzsIVYPySuS1ofCY/D6hhGbwJn")
// Returns: Uint8Array(21) [2, 255, 45, 37, 110, 40, 175, 79, 44, 241, ...]
X-Bogus Byte Structure
| Bytes | Content | Purpose |
|---|---|---|
| 0-4 | [2, 255, 45, 37, 110] |
Magic header (constant, encodes to “DFSzs”) |
| 5-20 | Variable (16 bytes) | MD5 hash payload |
The first 5 bytes are always the same — they encode to “DFSzs”, which is why all X-Bogus tokens start with this prefix!
The Payload is MD5
Bytes 5-20 (16 bytes) represent an MD5 hash. Comparing multiple samples:
| Sample | Payload (hex) |
|---|---|
| 1 | 28af4f2cf17990fa8380505179952c72 |
| 2 | 0da5a02cf16086fa838b9c5179952cd1 |
This confirms the algorithm uses MD5 hashing of the input data.
Part 9: Complete Algorithm
Based on all findings, here’s the X-Bogus generation algorithm:
X-Bogus = customBase64Encode(
MAGIC_HEADER + // [2, 255, 45, 37, 110]
MD5(transformedInput) // 16-byte hash
)
Key Components
- Magic Header:
[2, 255, 45, 37, 110]— Encodes to “DFSzs” - MD5 Payload: Hash of transformed input (query string + body data)
- Custom Encoding: Uses TikTok’s shuffled base64 alphabet
Implementation Skeleton
const XBOGUS_ALPHABET = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=";
const MAGIC_HEADER = [2, 255, 45, 37, 110];
function generateXBogus(queryString, bodyData) {
// Transform inputs (exact transformation TBD)
const combined = transformInput(queryString, bodyData);
// Generate MD5 hash
const hash = md5(combined);
const hashBytes = hexToBytes(hash);
// Build token
const tokenBytes = new Uint8Array(21);
tokenBytes.set(MAGIC_HEADER, 0);
tokenBytes.set(hashBytes, 5);
return xbEncode(tokenBytes);
}
What Remains
To achieve 100% accuracy:
- Determine exact input transformation — The inputs may be preprocessed before MD5
- Handle fingerprint JSON format — When second argument is a fingerprint object
- Verify across different request types — Login, API calls, etc.
Part 10: Findings Summary
What We Discovered
| Component | Finding |
|---|---|
| VM Architecture | Custom stack-based VM with 349 opcodes, 82,852 bytes of bytecode |
| Module System | Functions stored in i.o[index].v pattern |
| X-Bogus Entry | Bytecode position 51076 |
| X-Gnarly Entry | Bytecode position 53059 |
| Custom Alphabet | Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe= |
| Magic Header | [2, 255, 45, 37, 110] (constant prefix) |
| Token Size | 21 bytes → 28 characters encoded |
| Hash Algorithm | MD5 (16 bytes = 128 bits) |
| Key Opcodes | 8 (custom hash), 41 (UTF-8), 100 (bit rotate), 170 (MD5) |
Key Opcodes
// Opcode 8 - Custom hash (0xDEADBEEF)
for(var i=3735928559, o=0; o<32; o++)
i = 65599 * i + str.charCodeAt(i % str.length) >>> 0;
// Opcode 100 - Bit rotation
t.o[4] = r << i | r >>> 32 - i;
// Opcode 170 - Full MD5 implementation
// (Complete MD5 with all rounds, identical to standard MD5)
Conclusion
This research demonstrates the complexity of modern client-side protection systems. TikTok’s approach of using a custom VM with encrypted bytecode significantly raises the bar for reverse engineering compared to simple JavaScript obfuscation.
Key Takeaways
- Dynamic analysis is essential — Static analysis alone is insufficient for VM-protected code
- Hook and trace — Intercepting functions at runtime reveals input/output relationships
- Understand the architecture first — Mapping the VM structure before diving into opcodes saves time
- Document everything — Screenshots and logs are invaluable for complex analysis
- Pattern recognition — Magic numbers like 0xDEADBEEF help identify algorithm types
The Techniques
The methodology shown here forms a solid foundation for analyzing any JavaScript-based protection:
- Network analysis — Identify what tokens/signatures are being sent
- Babel deobfuscation — Make obfuscated code readable
- VM mapping — Understand the interpreter structure
- Breakpoint debugging — Capture exact inputs/outputs
- Function hooking — Log data flow at runtime
- Encoding analysis — Reverse custom base64/encoding schemes
Tools & Resources
Tools Used
- Chrome DevTools (Network, Sources, Console)
- Node.js + Babel (
@babel/core,@babel/parser,@babel/traverse,@babel/generator) - VS Code
Useful Console Hooks
// Hook X-Bogus generation
window.orig970 = n.u.o[970].v;
n.u.o[970].v = function(...args) {
console.log('=== X-Bogus Generation ===');
console.log('Input 0:', args[0]);
console.log('Input 1:', args[1]);
const result = window.orig970.apply(this, args);
console.log('Output:', result);
return result;
};
// Decode function
window.xbDecode = function(str) {
const alphabet = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=";
let result = [];
for (let i = 0; i < str.length; i += 4) {
const a = alphabet.indexOf(str[i]);
const b = alphabet.indexOf(str[i+1]);
const c = alphabet.indexOf(str[i+2]);
const d = alphabet.indexOf(str[i+3]);
result.push((a << 2) | (b >> 4));
if (c !== 64) result.push(((b & 15) << 4) | (c >> 2));
if (d !== 64) result.push(((c & 3) << 6) | d);
}
return new Uint8Array(result);
};
References
- nullpt.rs - Reverse Engineering TikTok’s VM Obfuscation
- notemrovsky/tiktok-reverse-engineering
- LukasOgunfeitimi/TikTok-ReverseEngineering
This research is for educational purposes only. Understanding protection mechanisms helps build better security systems.