Preamble
This article documents my journey reverse engineering TikTok’s web-based anti-bot protection system. As someone new to JavaScript reverse engineering, my goal was twofold: to develop a practical understanding of how modern bot protection works from an adversarial perspective, and to document the process in a way that helps others learn these techniques.
What started as curiosity about how TikTok’s anti-bot worked turned into a deep dive into virtual machine obfuscation, TLS fingerprinting, custom encoding schemes, and behavioral analysis. I’ll walk through exactly what I discovered, the dead ends I hit, and the techniques that actually worked.
The tools and scripts from this research are available at github.com/B9ph0met/tiktok-re.
Discovery
The first thing I did was load up TikTok’s login URL with DevTools open and submit a test login to see what I was dealing with.
Initial Reconnaissance
Opening the Network tab and submitting test@gmail.com / test123 revealed a surprisingly complex request:
POST https://login-nola.www.tiktok.com/passport/web/user/login/
?multi_login=1
&did=7582653297049732622
&verifyFp=verify_mj1po7mg_CaJWIYhS_NCz8_4J4r_B9jG_h2ENJl2zDczF
&msToken=UeNPPoF_a91JnnNmYtq1xtl...
&X-Bogus=DFSzsIVLDit5znB/CYeg3XhGbwrN
&X-Gnarly=MR4VNDByOO4lJ1hYVn0joNm...
The form data was encrypted as well:
username=7160767145626864646c692b666a68
email=7160767145626864646c692b666a68
password=7160767134373636
mix_mode=1
My plaintext credentials had been transformed into hex strings. And those X-Bogus and X-Gnarly parameters? No idea where they came from.
Testing with Go
I decided to write a simple Go script to see what values I could extract from just a GET request, with no JavaScript execution:
package main
import (
"fmt"
"net/http"
"net/http/cookiejar"
)
func main() {
jar, _ := cookiejar.New(nil)
client := &http.Client{Jar: jar}
req, _ := http.NewRequest("GET",
"https://www.tiktok.com/login/phone-or-email/email", nil)
req.Header.Set("User-Agent",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36")
resp, err := client.Do(req)
if err != nil {
fmt.Println("Error:", err)
return
}
defer resp.Body.Close()
fmt.Println("Status:", resp.Status)
fmt.Println("\nCookies received:")
for _, cookie := range resp.Cookies() {
fmt.Printf(" %s: %s\n", cookie.Name, cookie.Value[:50]+"...")
}
}
Result: The request succeeded, returning tt_csrf_token and ttwid cookies. But this was just a GET request for the login page. The real challenge would be POST requests to protected API endpoints.
The TLS Fingerprinting Problem
My first real challenge came when I tried hitting actual API endpoints. Requests that worked in my browser failed silently from Go, returning empty responses or generic errors.
After some research, I learned about TLS fingerprinting. When a client establishes an HTTPS connection, the TLS handshake reveals:
- Supported cipher suites (and their order)
- TLS extensions
- Elliptic curves supported
- Signature algorithms
This creates a unique “fingerprint” (JA3/JA4 hash) that differs between browsers and programming languages:
| Client | Behavior |
|---|---|
| Chrome | Specific cipher order, GREASE extensions |
Go net/http |
Go’s crypto/tls defaults, looks nothing like a browser |
Python requests |
OpenSSL defaults |
The solution: use a library like utls (Go) or tls-client (Python) that can impersonate browser TLS fingerprints.
The “Small Sample” Problem
I also learned something important about anti-bot systems: most will allow a small number of bot requests through. This is intentional. It makes it much harder to understand exactly which signals trigger blocking.
If every malformed request was immediately rejected, attackers could easily binary-search their way to the correct format. By letting some requests through randomly, defenders create uncertainty. This makes determining which headers and parameters are actually enforced incredibly difficult at scale.
What I Found
From my initial discovery:
Server-side (obtainable via HTTP):
ttwid: Device/session ID, set as cookiett_csrf_token: CSRF token, set as cookiemsToken: Appears in some API responses
Client-side (requires JavaScript):
X-Bogus: Request signature, generated bywebmssdk.jsX-Gnarly: Secondary signatureverifyFp: Fingerprint verification token- Encoded credentials : Custom XOR-based encoding
Custom CAPTCHA:
When I triggered rate limiting, I discovered TikTok uses their own CAPTCHA system:
https://verification16-normal-nola.tiktokw.eu/captcha/verify?
subtype=whirl
&h5_check_version=3.8.27-alpha.3
Not reCAPTCHA, not hCaptcha, but their own solution with types like whirl (rotating puzzle), slide, 3d, and same (image matching). This means no off-the-shelf solving APIs work directly.
Picking a Target
With so many unknowns, I needed to focus. The credential encoding seemed like a good starting point since it was self-contained and didn’t require understanding the full VM. But the real prize was X-Bogus. Without that signature, no API request would succeed.
Time to understand what I was up against.
What is a VM?
When I started searching for information about X-Bogus, I found references to TikTok using “virtualized obfuscation.” I had no idea what that meant.
The Problem with JavaScript Obfuscation
Traditional JavaScript obfuscation uses techniques like:
- Renaming variables (
usernamebecomes_0x4a3b) - Encoding strings (storing them in arrays, base64)
- Control flow flattening (turning linear code into switch statements)
- Dead code insertion
But all of these can be reversed with enough patience. The code is still JavaScript. You can set breakpoints, log values, and eventually understand it.
Virtual Machine Obfuscation
VM-based obfuscation takes a different approach: compile the sensitive JavaScript into custom bytecode, then ship an interpreter that executes it.
Instead of this:
function sign(data) {
return md5(data + secret);
}
You get something like this:
function N(n, t, r, i, o, e) {
var u = { C: n, o: [], A: [], I: [], u: t, D: e };
for (u.o[0] = null, u.o[1] = undefined; u.C < k.length; ) {
var f = k[u.C++] << 8 | k[u.C++];
I[f](u);
}
}
// ... plus tens of kilobytes of bytecode in array 'k'
// ... plus dozens to hundreds of opcode handlers in array 'I'
The original logic is gone. It’s been compiled into bytecode that only makes sense to the custom VM interpreter. You can’t just read the code. You have to reverse engineer an entire virtual machine.
How the VM Works
After reading existing research, here’s my understanding:
-
Bytecode Array (
k): An array of bytes representing the compiled program. -
Opcode Handlers (
I): An array of functions, each handling one type of instruction. TikTok uses dozens to hundreds of unique opcodes, depending on the version. -
VM State (
u): An object tracking:C: Program counter (current position in bytecode)o: Operand stack / registersA: Call stack for exception handlingI: Local variables
-
Execution Loop: Fetch two bytes, combine into 16-bit opcode, call handler, repeat.
while (u.C < bytecode.length) {
var opcode = k[u.C++] << 8 | k[u.C++]; // Fetch 16-bit opcode
I[opcode](u); // Execute handler
}
The clever part: even if you understand the VM structure, you still need to trace through thousands of bytecode instructions to understand what any particular function does.
The Bytecode Format
TikTok’s bytecode strings follow a specific format. Each module starts with magic bytes 0x484e4f4a and 0x403f5243, followed by:
- A version/separator byte
- An XOR key for string decryption
- Bytecode instructions
- Encrypted string table
Part 2: Deobfuscating X-Bogus
Finding the Entry Point
Using Chrome DevTools, I set breakpoints on network requests and traced back to find where X-Bogus was generated. The trail led to webmssdk.js, specifically functions accessed via n.u.o[970].v and n.u.o[971].v.
The Algorithm
After extensive tracing, I reconstructed the X-Bogus algorithm:
- Collect fingerprint data: Canvas hash, screen info, timezone, user agent
- Build payload: Combine fingerprint with request URL parameters
- Encrypt: XOR with timestamp-derived key
- Encode: Custom Base64 with alphabet
Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=
Note: TikTok frequently updates their protection mechanisms. The specific encryption protocols, encoding schemes, and algorithm details described here may change in future versions. Always verify against the current implementation.
function generateXBogus(queryString, userAgent) {
// Step 1: Generate fingerprint
const canvas = getCanvasFingerprint();
const screen = getScreenInfo();
// Step 2: Build payload
const payload = buildPayload(queryString, userAgent, canvas, screen);
// Step 3: Encrypt
const timestamp = Date.now();
const encrypted = xorEncrypt(payload, timestamp);
// Step 4: Encode
return customB64Encode(encrypted);
}
Browser Verification
I tested the implementation directly in the browser console on TikTok:
// Generate our X-Bogus
const params = "keyword=gamerman&count=10&...";
const xBogus = generateXBogus(params, navigator.userAgent);
// Make a request with it
const response = await fetch(
`https://www.tiktok.com/api/search/user/full/?${params}&X-Bogus=${xBogus}`
);
const data = await response.json();
console.log(data.user_list); // Success! Got search results
The algorithm works. From the browser, with a valid session, my generated X-Bogus tokens are accepted by TikTok’s API.
Part 3: The Go Challenge
With a working X-Bogus implementation, I attempted to make requests from Go instead of the browser.
The Problem
Despite having:
- Correct X-Bogus algorithm (verified in browser)
- Real browser cookies
- Chrome TLS fingerprint spoofing (using SURF library)
- Proper HTTP headers in correct order
Every request returned 200 OK with an empty body (0 bytes).
What I Tried
| Approach | Result |
|---|---|
| Standard Go HTTP client | Empty response |
| tls-client with Chrome profile | Empty response |
| SURF library with Chrome impersonation | Empty response |
| Real browser cookies + Go | Empty response |
| Simulating browser navigation flow | Empty response |
The Root Cause
Through debugging, I discovered that critical cookies (ttwid, msToken, s_v_web_id) are generated by JavaScript at runtime. TikTok’s webmssdk.js creates these tokens using browser fingerprinting APIs:
- Canvas fingerprinting
- WebGL fingerprinting
- Audio context fingerprinting
- Font enumeration
- And more…
Without a JavaScript engine executing these fingerprinting calls, the cookies simply don’t exist. And without these cookies, TikTok’s API returns nothing.
A quick test confirmed this. From the browser console, a request without X-Bogus returns an error message:
fetch('https://www.tiktok.com/api/search/user/full/?keyword=test&count=1')
.then(r => r.json())
.then(d => console.log(d));
// Returns: {"status_code": 0}
But the same request from Go returns nothing at all. Not even an error. TikTok’s edge servers are dropping the request before it even reaches the application layer.
This goes beyond just TLS fingerprinting. TikTok appears to be checking:
- HTTP/2 frame ordering and settings
- TCP/IP stack behavior
- Missing JavaScript execution context
- Device trust scores based on browsing history
Part 4: Validating the Algorithm
Before diving deeper into why my web requests were failing, I needed to confirm my X-Bogus algorithm was actually correct. I used Playwright to control a real Chrome browser, intercepted TikTok’s requests, and compared their X-Bogus tokens against mine generated with the same inputs.
const { chromium } = require('playwright');
async function validateXBogus() {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
// Capture X-Bogus tokens from real requests
page.on('request', request => {
const url = request.url();
if (url.includes('X-Bogus')) {
const captured = url.match(/X-Bogus=([^&]+)/)?.[1];
console.log('TikTok generated:', captured);
// Generate our own with same inputs
const params = url.split('?')[1].replace(/&X-Bogus.*/, '');
const ours = generateXBogus(params, navigator.userAgent);
console.log('Our algorithm:', ours);
console.log('Match:', captured === ours);
}
});
await page.goto('https://www.tiktok.com/search?q=test');
await page.waitForTimeout(5000);
await browser.close();
}
The tokens matched. My reverse-engineered algorithm was producing valid signatures. The problem wasn’t the algorithm—it was how I was making the requests.
Part 5: The Working Solution
With the algorithm validated, I went back to debug the web approach. Two discoveries made everything work:
Discovery 1: msToken Source
I had assumed msToken was generated client-side by JavaScript. Wrong. Using Chrome DevTools, I traced its origin and found it’s simply returned in Set-Cookie headers from API endpoints:
curl -s -D - 'https://www.tiktok.com/api/recommend/item_list/?aid=1988' | grep -i mstoken
set-cookie: msToken=NsonircdZIUnl8Zu0j006iCessgTGoG9kEH3I9cnj18Xx3...
The flow is simple:
- Make any API request → receive msToken in Set-Cookie
- Use that msToken in subsequent requests
- Each response includes a fresh msToken for the next request
Discovery 2: URL Encoding Sensitivity
The signatures are computed on the exact URL string. Any modification—reordering parameters, changing encoding (%20 vs +)—invalidates the signature.
Implementation
With these insights, the implementation is straightforward:
const crypto = require('crypto');
const { execSync } = require('child_process');
const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...';
const XBOGUS_ALPHABET = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe";
// Step 1: Get msToken from any API endpoint
function getMsToken() {
const result = execSync(
`curl -s -D - 'https://www.tiktok.com/api/recommend/item_list/?aid=1988'`,
{ encoding: 'utf-8' }
);
return result.match(/set-cookie:\s*msToken=([^;]+)/i)[1];
}
// Step 2: Generate X-Bogus signature
function generateXBogus(url, userAgent) {
const timestamp = Math.floor(Date.now() / 1000);
const paramsHash = doubleMD5(url);
const uaHash = userAgentHash(userAgent);
// Build payload, encrypt, encode with custom Base64
const payload = buildPayload(timestamp, paramsHash, uaHash);
const encrypted = rc4Encrypt(payload);
return customBase64Encode([2, 255, ...encrypted]);
}
// Step 3: Build and sign the request
function search(keyword) {
const msToken = getMsToken();
const baseURL = buildSearchURL(keyword, msToken);
const xBogus = generateXBogus(baseURL, USER_AGENT);
const signedURL = `${baseURL}&X-Bogus=${xBogus}`;
return execSync(`curl -s '${signedURL}'`, { encoding: 'utf-8' });
}
Results
$ node search.js test
Getting msToken... done
Generating signature... done
Making request... done
Found 10 users:
@_powervision_ - 5,900,000 followers
@test - 100,100 followers
@te5tt - 424,800 followers
...
The key insight: get msToken from the server, sign with the exact URL bytes, and the API responds normally.
Part 6: Scaling with Proxies and Concurrency
With the signatures working, the next step was testing at scale. I built a scraper with proxy support and concurrent workers.
Architecture
┌─────────────────────────────────────────┐
│ Main Controller │
├─────────────────────────────────────────┤
│ - Load keyword list │
│ - Manage worker pool │
│ - Collect results │
└──────────────┬──────────────────────────┘
│
┌───────┴───────┐
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Worker 1 │ │ Worker 2 │ ... N workers
├─────────────┤ ├─────────────┤
│ - Own session│ │ - Own session│
│ - Get msToken│ │ - Get msToken│
│ - Sign URLs │ │ - Sign URLs │
│ - Make reqs │ │ - Make reqs │
└─────────────┘ └─────────────┘
Each worker:
- Gets its own msToken and session cookies
- Processes a chunk of keywords
- Generates signatures for each request
- Routes through rotating residential proxy
Implementation
// Worker function
async function worker(keywords, workerId, useProxy) {
// Get session for this worker
const msToken = getMsToken(useProxy);
const sessionData = getSessionCookies(useProxy);
for (const keyword of keywords) {
// Build and sign URL
const { baseURL, queryString } = buildSearchURL(keyword, msToken, ...);
const xBogus = generateXBogus(baseURL);
const xGnarly = generateXGnarly(queryString);
// Make request through proxy
const result = execSync(
`curl -s --proxy "${PROXY_URL}" '${signedURL}' -H 'Cookie: ${cookies}'`
);
// Parse and save results
saveResults(keyword, JSON.parse(result));
await sleep(1000); // Rate limiting
}
}
// Start N workers in parallel
const chunks = splitKeywords(keywords, concurrency);
await Promise.all(chunks.map((chunk, i) => worker(chunk, i, true)));
Results
Running with 20 concurrent workers against 99 keywords:
============================================================
COMPLETE
============================================================
Total requests: 89
Successful: 88
Failed: 1
Users found: 860
Time: 285.7s
Rate: 18.7 requests/min
Results saved to: results.txt
98.9% success rate with concurrent requests through a rotating proxy. No captchas triggered.
Sample output from results.txt:
keyword username nickname followers
test @_powervision_ Power Vision Tests 5900000
test @test user39494307298 100100
dogs @hoootdogs HOOOTDOGS 14400000
dogs @funnydogsofficial Funny Cute 1000000
hair @hairby_chrissy Hairby_chrissy 3500000
golf @callawaygolf Callaway Golf 1100000
Validation
To confirm the signatures were actually being validated (not just passed through), I tested with garbage values:
const garbage = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAA';
const url = baseURL + '&X-Bogus=' + garbage + '&X-Gnarly=' + garbage;
// Result: Response length 0 (empty - blocked)
With valid signatures: 23,000+ bytes returned. With garbage: 0 bytes. The reverse-engineered algorithms are working correctly.
Tools and Resources
All the tools and scripts from this research are available on GitHub: B9ph0met/tiktok-re
Tools I Used
- Chrome DevTools (Network, Sources, Console)
- Node.js + Babel for deobfuscation
- Go for HTTP testing
- Playwright for browser automation
- Insomnia for request debugging
Useful Console Snippets
// Hook X-Bogus generation
window.orig970 = n.u.o[970].v;
n.u.o[970].v = function(...args) {
console.log('X-Bogus inputs:', args);
const result = window.orig970.apply(this, args);
console.log('X-Bogus output:', result);
return result;
};
// Track VM execution
window.thatarray = [];
// Then check window.thatarray after triggering a request
// Decode X-Bogus tokens
window.xbDecode = function(str) {
const alphabet = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=";
let result = [];
for (let i = 0; i < str.length; i += 4) {
const a = alphabet.indexOf(str[i]);
const b = alphabet.indexOf(str[i+1]);
const c = alphabet.indexOf(str[i+2]);
const d = alphabet.indexOf(str[i+3]);
result.push((a << 2) | (b >> 4));
if (c !== 64) result.push(((b & 15) << 4) | (c >> 2));
if (d !== 64) result.push(((c & 3) << 6) | d);
}
return new Uint8Array(result);
};
// Hook fetch to see login requests
const _fetch = window.fetch;
window.fetch = async function(url, options) {
if (url.toString().includes('login')) {
console.log('=== LOGIN REQUEST ===');
console.log('URL:', url);
console.log('Body:', options?.body);
}
return _fetch.apply(this, arguments);
};
References
- nullpt.rs - Reverse Engineering TikTok’s VM
- notemrovsky/tiktok-reverse-engineering
- justbeluga/tiktok-web-reverse-engineering
- Ibiyemi Abiodun - Reversing TikTok Part 2
This research is for educational purposes only.