Preamble

This article documents my journey reverse engineering TikTok’s web-based anti-bot protection system. As someone new to JavaScript reverse engineering, my goal was twofold: to develop a practical understanding of how modern bot protection works from an adversarial perspective, and to document the process in a way that helps others learn these techniques.

What started as curiosity about how TikTok’s anti-bot worked turned into a deep dive into virtual machine obfuscation, TLS fingerprinting, custom encoding schemes, and behavioral analysis. I’ll walk through exactly what I discovered, the dead ends I hit, and the techniques that actually worked.

The tools and scripts from this research are available at github.com/B9ph0met/tiktok-re.


Discovery

The first thing I did was load up TikTok’s login URL with DevTools open and submit a test login to see what I was dealing with.

Initial Reconnaissance

Opening the Network tab and submitting test@gmail.com / test123 revealed a surprisingly complex request:

POST https://login-nola.www.tiktok.com/passport/web/user/login/
    ?multi_login=1
    &did=7582653297049732622
    &verifyFp=verify_mj1po7mg_CaJWIYhS_NCz8_4J4r_B9jG_h2ENJl2zDczF
    &msToken=UeNPPoF_a91JnnNmYtq1xtl...
    &X-Bogus=DFSzsIVLDit5znB/CYeg3XhGbwrN
    &X-Gnarly=MR4VNDByOO4lJ1hYVn0joNm...

The form data was encrypted as well:

username=7160767145626864646c692b666a68
email=7160767145626864646c692b666a68
password=7160767134373636
mix_mode=1

My plaintext credentials had been transformed into hex strings. And those X-Bogus and X-Gnarly parameters? No idea where they came from.

Testing with Go

I decided to write a simple Go script to see what values I could extract from just a GET request, with no JavaScript execution:

package main

import (
    "fmt"
    "net/http"
    "net/http/cookiejar"
)

func main() {
    jar, _ := cookiejar.New(nil)
    client := &http.Client{Jar: jar}
    
    req, _ := http.NewRequest("GET", 
        "https://www.tiktok.com/login/phone-or-email/email", nil)
    req.Header.Set("User-Agent", 
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36")
    
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    defer resp.Body.Close()
    
    fmt.Println("Status:", resp.Status)
    fmt.Println("\nCookies received:")
    for _, cookie := range resp.Cookies() {
        fmt.Printf("  %s: %s\n", cookie.Name, cookie.Value[:50]+"...")
    }
}

Result: The request succeeded, returning tt_csrf_token and ttwid cookies. But this was just a GET request for the login page. The real challenge would be POST requests to protected API endpoints.

The TLS Fingerprinting Problem

My first real challenge came when I tried hitting actual API endpoints. Requests that worked in my browser failed silently from Go, returning empty responses or generic errors.

After some research, I learned about TLS fingerprinting. When a client establishes an HTTPS connection, the TLS handshake reveals:

  • Supported cipher suites (and their order)
  • TLS extensions
  • Elliptic curves supported
  • Signature algorithms

This creates a unique “fingerprint” (JA3/JA4 hash) that differs between browsers and programming languages:

Client Behavior
Chrome Specific cipher order, GREASE extensions
Go net/http Go’s crypto/tls defaults, looks nothing like a browser
Python requests OpenSSL defaults

The solution: use a library like utls (Go) or tls-client (Python) that can impersonate browser TLS fingerprints.

The “Small Sample” Problem

I also learned something important about anti-bot systems: most will allow a small number of bot requests through. This is intentional. It makes it much harder to understand exactly which signals trigger blocking.

If every malformed request was immediately rejected, attackers could easily binary-search their way to the correct format. By letting some requests through randomly, defenders create uncertainty. This makes determining which headers and parameters are actually enforced incredibly difficult at scale.

What I Found

From my initial discovery:

Server-side (obtainable via HTTP):

  • ttwid : Device/session ID, set as cookie
  • tt_csrf_token : CSRF token, set as cookie
  • msToken : Appears in some API responses

Client-side (requires JavaScript):

  • X-Bogus : Request signature, generated by webmssdk.js
  • X-Gnarly : Secondary signature
  • verifyFp : Fingerprint verification token
  • Encoded credentials : Custom XOR-based encoding

Custom CAPTCHA:

When I triggered rate limiting, I discovered TikTok uses their own CAPTCHA system:

https://verification16-normal-nola.tiktokw.eu/captcha/verify?
    subtype=whirl
    &h5_check_version=3.8.27-alpha.3

Not reCAPTCHA, not hCaptcha, but their own solution with types like whirl (rotating puzzle), slide, 3d, and same (image matching). This means no off-the-shelf solving APIs work directly.

Picking a Target

With so many unknowns, I needed to focus. The credential encoding seemed like a good starting point since it was self-contained and didn’t require understanding the full VM. But the real prize was X-Bogus. Without that signature, no API request would succeed.

Time to understand what I was up against.


What is a VM?

When I started searching for information about X-Bogus, I found references to TikTok using “virtualized obfuscation.” I had no idea what that meant.

The Problem with JavaScript Obfuscation

Traditional JavaScript obfuscation uses techniques like:

  • Renaming variables (username becomes _0x4a3b)
  • Encoding strings (storing them in arrays, base64)
  • Control flow flattening (turning linear code into switch statements)
  • Dead code insertion

But all of these can be reversed with enough patience. The code is still JavaScript. You can set breakpoints, log values, and eventually understand it.

Virtual Machine Obfuscation

VM-based obfuscation takes a different approach: compile the sensitive JavaScript into custom bytecode, then ship an interpreter that executes it.

Instead of this:

function sign(data) {
    return md5(data + secret);
}

You get something like this:

function N(n, t, r, i, o, e) {
    var u = { C: n, o: [], A: [], I: [], u: t, D: e };
    for (u.o[0] = null, u.o[1] = undefined; u.C < k.length; ) {
        var f = k[u.C++] << 8 | k[u.C++];
        I[f](u);
    }
}
// ... plus tens of kilobytes of bytecode in array 'k'
// ... plus dozens to hundreds of opcode handlers in array 'I'

The original logic is gone. It’s been compiled into bytecode that only makes sense to the custom VM interpreter. You can’t just read the code. You have to reverse engineer an entire virtual machine.

How the VM Works

After reading existing research, here’s my understanding:

  1. Bytecode Array (k): An array of bytes representing the compiled program.

  2. Opcode Handlers (I): An array of functions, each handling one type of instruction. TikTok uses dozens to hundreds of unique opcodes, depending on the version.

  3. VM State (u): An object tracking:

    • C : Program counter (current position in bytecode)
    • o : Operand stack / registers
    • A : Call stack for exception handling
    • I : Local variables
  4. Execution Loop: Fetch two bytes, combine into 16-bit opcode, call handler, repeat.

while (u.C < bytecode.length) {
    var opcode = k[u.C++] << 8 | k[u.C++];  // Fetch 16-bit opcode
    I[opcode](u);                            // Execute handler
}

The clever part: even if you understand the VM structure, you still need to trace through thousands of bytecode instructions to understand what any particular function does.

The Bytecode Format

TikTok’s bytecode strings follow a specific format. Each module starts with magic bytes 0x484e4f4a and 0x403f5243, followed by:

  • A version/separator byte
  • An XOR key for string decryption
  • Bytecode instructions
  • Encrypted string table

Part 2: Deobfuscating X-Bogus

Finding the Entry Point

Using Chrome DevTools, I set breakpoints on network requests and traced back to find where X-Bogus was generated. The trail led to webmssdk.js, specifically functions accessed via n.u.o[970].v and n.u.o[971].v.

The Algorithm

After extensive tracing, I reconstructed the X-Bogus algorithm:

  1. Collect fingerprint data: Canvas hash, screen info, timezone, user agent
  2. Build payload: Combine fingerprint with request URL parameters
  3. Encrypt: XOR with timestamp-derived key
  4. Encode: Custom Base64 with alphabet Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=

Note: TikTok frequently updates their protection mechanisms. The specific encryption protocols, encoding schemes, and algorithm details described here may change in future versions. Always verify against the current implementation.

function generateXBogus(queryString, userAgent) {
    // Step 1: Generate fingerprint
    const canvas = getCanvasFingerprint();
    const screen = getScreenInfo();
    
    // Step 2: Build payload
    const payload = buildPayload(queryString, userAgent, canvas, screen);
    
    // Step 3: Encrypt
    const timestamp = Date.now();
    const encrypted = xorEncrypt(payload, timestamp);
    
    // Step 4: Encode
    return customB64Encode(encrypted);
}

Browser Verification

I tested the implementation directly in the browser console on TikTok:

// Generate our X-Bogus
const params = "keyword=gamerman&count=10&...";
const xBogus = generateXBogus(params, navigator.userAgent);

// Make a request with it
const response = await fetch(
    `https://www.tiktok.com/api/search/user/full/?${params}&X-Bogus=${xBogus}`
);
const data = await response.json();
console.log(data.user_list);  // Success! Got search results

The algorithm works. From the browser, with a valid session, my generated X-Bogus tokens are accepted by TikTok’s API.


Part 3: The Go Challenge

With a working X-Bogus implementation, I attempted to make requests from Go instead of the browser.

The Problem

Despite having:

  • Correct X-Bogus algorithm (verified in browser)
  • Real browser cookies
  • Chrome TLS fingerprint spoofing (using SURF library)
  • Proper HTTP headers in correct order

Every request returned 200 OK with an empty body (0 bytes).

What I Tried

Approach Result
Standard Go HTTP client Empty response
tls-client with Chrome profile Empty response
SURF library with Chrome impersonation Empty response
Real browser cookies + Go Empty response
Simulating browser navigation flow Empty response

The Root Cause

Through debugging, I discovered that critical cookies (ttwid, msToken, s_v_web_id) are generated by JavaScript at runtime. TikTok’s webmssdk.js creates these tokens using browser fingerprinting APIs:

  • Canvas fingerprinting
  • WebGL fingerprinting
  • Audio context fingerprinting
  • Font enumeration
  • And more…

Without a JavaScript engine executing these fingerprinting calls, the cookies simply don’t exist. And without these cookies, TikTok’s API returns nothing.

A quick test confirmed this. From the browser console, a request without X-Bogus returns an error message:

fetch('https://www.tiktok.com/api/search/user/full/?keyword=test&count=1')
    .then(r => r.json())
    .then(d => console.log(d));
// Returns: {"status_code": 0}

But the same request from Go returns nothing at all. Not even an error. TikTok’s edge servers are dropping the request before it even reaches the application layer.

This goes beyond just TLS fingerprinting. TikTok appears to be checking:

  • HTTP/2 frame ordering and settings
  • TCP/IP stack behavior
  • Missing JavaScript execution context
  • Device trust scores based on browsing history

Part 4: Validating the Algorithm

Before diving deeper into why my web requests were failing, I needed to confirm my X-Bogus algorithm was actually correct. I used Playwright to control a real Chrome browser, intercepted TikTok’s requests, and compared their X-Bogus tokens against mine generated with the same inputs.

const { chromium } = require('playwright');

async function validateXBogus() {
    const browser = await chromium.launch({ headless: false });
    const page = await browser.newPage();

    // Capture X-Bogus tokens from real requests
    page.on('request', request => {
        const url = request.url();
        if (url.includes('X-Bogus')) {
            const captured = url.match(/X-Bogus=([^&]+)/)?.[1];
            console.log('TikTok generated:', captured);
            
            // Generate our own with same inputs
            const params = url.split('?')[1].replace(/&X-Bogus.*/, '');
            const ours = generateXBogus(params, navigator.userAgent);
            console.log('Our algorithm:', ours);
            console.log('Match:', captured === ours);
        }
    });

    await page.goto('https://www.tiktok.com/search?q=test');
    await page.waitForTimeout(5000);
    await browser.close();
}

The tokens matched. My reverse-engineered algorithm was producing valid signatures. The problem wasn’t the algorithm—it was how I was making the requests.


Part 5: The Working Solution

With the algorithm validated, I went back to debug the web approach. Two discoveries made everything work:

Discovery 1: msToken Source

I had assumed msToken was generated client-side by JavaScript. Wrong. Using Chrome DevTools, I traced its origin and found it’s simply returned in Set-Cookie headers from API endpoints:

curl -s -D - 'https://www.tiktok.com/api/recommend/item_list/?aid=1988' | grep -i mstoken
set-cookie: msToken=NsonircdZIUnl8Zu0j006iCessgTGoG9kEH3I9cnj18Xx3...

The flow is simple:

  1. Make any API request → receive msToken in Set-Cookie
  2. Use that msToken in subsequent requests
  3. Each response includes a fresh msToken for the next request

Discovery 2: URL Encoding Sensitivity

The signatures are computed on the exact URL string. Any modification—reordering parameters, changing encoding (%20 vs +)—invalidates the signature.

Implementation

With these insights, the implementation is straightforward:

const crypto = require('crypto');
const { execSync } = require('child_process');

const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...';
const XBOGUS_ALPHABET = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe";

// Step 1: Get msToken from any API endpoint
function getMsToken() {
    const result = execSync(
        `curl -s -D - 'https://www.tiktok.com/api/recommend/item_list/?aid=1988'`,
        { encoding: 'utf-8' }
    );
    return result.match(/set-cookie:\s*msToken=([^;]+)/i)[1];
}

// Step 2: Generate X-Bogus signature
function generateXBogus(url, userAgent) {
    const timestamp = Math.floor(Date.now() / 1000);
    const paramsHash = doubleMD5(url);
    const uaHash = userAgentHash(userAgent);
    
    // Build payload, encrypt, encode with custom Base64
    const payload = buildPayload(timestamp, paramsHash, uaHash);
    const encrypted = rc4Encrypt(payload);
    return customBase64Encode([2, 255, ...encrypted]);
}

// Step 3: Build and sign the request
function search(keyword) {
    const msToken = getMsToken();
    const baseURL = buildSearchURL(keyword, msToken);
    const xBogus = generateXBogus(baseURL, USER_AGENT);
    
    const signedURL = `${baseURL}&X-Bogus=${xBogus}`;
    return execSync(`curl -s '${signedURL}'`, { encoding: 'utf-8' });
}

Results

$ node search.js test

Getting msToken... done
Generating signature... done
Making request... done

Found 10 users:
  @_powervision_ - 5,900,000 followers
  @test - 100,100 followers
  @te5tt - 424,800 followers
  ...

The key insight: get msToken from the server, sign with the exact URL bytes, and the API responds normally.


Part 6: Scaling with Proxies and Concurrency

With the signatures working, the next step was testing at scale. I built a scraper with proxy support and concurrent workers.

Architecture

┌─────────────────────────────────────────┐
│           Main Controller               │
├─────────────────────────────────────────┤
│  - Load keyword list                    │
│  - Manage worker pool                   │
│  - Collect results                      │
└──────────────┬──────────────────────────┘
               │
       ┌───────┴───────┐
       ▼               ▼
┌─────────────┐ ┌─────────────┐
│  Worker 1   │ │  Worker 2   │  ... N workers
├─────────────┤ ├─────────────┤
│ - Own session│ │ - Own session│
│ - Get msToken│ │ - Get msToken│
│ - Sign URLs │ │ - Sign URLs │
│ - Make reqs │ │ - Make reqs │
└─────────────┘ └─────────────┘

Each worker:

  1. Gets its own msToken and session cookies
  2. Processes a chunk of keywords
  3. Generates signatures for each request
  4. Routes through rotating residential proxy

Implementation

// Worker function
async function worker(keywords, workerId, useProxy) {
    // Get session for this worker
    const msToken = getMsToken(useProxy);
    const sessionData = getSessionCookies(useProxy);
    
    for (const keyword of keywords) {
        // Build and sign URL
        const { baseURL, queryString } = buildSearchURL(keyword, msToken, ...);
        const xBogus = generateXBogus(baseURL);
        const xGnarly = generateXGnarly(queryString);
        
        // Make request through proxy
        const result = execSync(
            `curl -s --proxy "${PROXY_URL}" '${signedURL}' -H 'Cookie: ${cookies}'`
        );
        
        // Parse and save results
        saveResults(keyword, JSON.parse(result));
        
        await sleep(1000);  // Rate limiting
    }
}

// Start N workers in parallel
const chunks = splitKeywords(keywords, concurrency);
await Promise.all(chunks.map((chunk, i) => worker(chunk, i, true)));

Results

Running with 20 concurrent workers against 99 keywords:

============================================================
COMPLETE
============================================================
Total requests: 89
Successful: 88
Failed: 1
Users found: 860
Time: 285.7s
Rate: 18.7 requests/min
Results saved to: results.txt

98.9% success rate with concurrent requests through a rotating proxy. No captchas triggered.

Sample output from results.txt:

keyword   username           nickname              followers
test      @_powervision_     Power Vision Tests    5900000
test      @test              user39494307298       100100
dogs      @hoootdogs         HOOOTDOGS             14400000
dogs      @funnydogsofficial Funny Cute            1000000
hair      @hairby_chrissy    Hairby_chrissy        3500000
golf      @callawaygolf      Callaway Golf         1100000

Validation

To confirm the signatures were actually being validated (not just passed through), I tested with garbage values:

const garbage = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAA';
const url = baseURL + '&X-Bogus=' + garbage + '&X-Gnarly=' + garbage;
// Result: Response length 0 (empty - blocked)

With valid signatures: 23,000+ bytes returned. With garbage: 0 bytes. The reverse-engineered algorithms are working correctly.


Tools and Resources

All the tools and scripts from this research are available on GitHub: B9ph0met/tiktok-re

Tools I Used

  • Chrome DevTools (Network, Sources, Console)
  • Node.js + Babel for deobfuscation
  • Go for HTTP testing
  • Playwright for browser automation
  • Insomnia for request debugging

Useful Console Snippets

// Hook X-Bogus generation
window.orig970 = n.u.o[970].v;
n.u.o[970].v = function(...args) {
    console.log('X-Bogus inputs:', args);
    const result = window.orig970.apply(this, args);
    console.log('X-Bogus output:', result);
    return result;
};

// Track VM execution
window.thatarray = [];
// Then check window.thatarray after triggering a request

// Decode X-Bogus tokens
window.xbDecode = function(str) {
    const alphabet = "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=";
    let result = [];
    for (let i = 0; i < str.length; i += 4) {
        const a = alphabet.indexOf(str[i]);
        const b = alphabet.indexOf(str[i+1]);
        const c = alphabet.indexOf(str[i+2]);
        const d = alphabet.indexOf(str[i+3]);
        result.push((a << 2) | (b >> 4));
        if (c !== 64) result.push(((b & 15) << 4) | (c >> 2));
        if (d !== 64) result.push(((c & 3) << 6) | d);
    }
    return new Uint8Array(result);
};

// Hook fetch to see login requests
const _fetch = window.fetch;
window.fetch = async function(url, options) {
    if (url.toString().includes('login')) {
        console.log('=== LOGIN REQUEST ===');
        console.log('URL:', url);
        console.log('Body:', options?.body);
    }
    return _fetch.apply(this, arguments);
};

References


This research is for educational purposes only.