Reversing TikTok's Captcha Encryption

After reversing TikTok’s web protection and building a JS VM for antibot, I wanted to tackle something more self-contained: their captcha system. TikTok uses BDTuring (ByteDance Turing) which serves several captcha types - slide puzzles, 3D rotation, icon matching. The one I went after is “whirl” - a rotation captcha where you drag a slider to align an inner circle with an outer ring.

TikTok’s whirl captcha on the login page

The plan: reverse the protocol, figure out the encryption, understand how the images work, and map out what it would take to solve these programmatically. The encryption turned out to be the interesting part. What the server actually validates turned out to be more than just the angle.

The Protocol

Open TikTok, trigger a login flow, and the captcha loads via the BDTuring SDK. Two requests matter:

GET verification-sg.tiktok.com/captcha/get fetches the challenge. POST verification-sg.tiktok.com/captcha/verify submits the answer.

Both carry the usual telemetry - device ID, fingerprint token, SDK version, screen dimensions. The interesting part: both the response and submission are wrapped in a field called edata - a base64 blob, not plaintext JSON.

{
  "edata": "AfOFlN/RrFePd1K/GvGrAslmaw2dXgZCFZPTZXCUbEhitHJaZ1HO...",
  "data": {
    "verify_id": "Verify_d55d6b4f-396c-4115-aa87-aa86a8b0cfac"
  }
}

No challenge data visible. Everything goes through edata. So step one was figuring out the encryption.

Breaking the Encryption

I started by hooking JSON.stringify to catch data before it enters the crypto layer:

const origStringify = JSON.stringify;
JSON.stringify = function() {
    const result = origStringify.apply(this, arguments);
    if (typeof result === 'string' && result.includes('"mode":"whirl"')) {
        console.log('PLAINTEXT:', result.substring(0, 500));
        debugger;
    }
    return result;
};

This caught the solve payload - a JSON object with the mouse trajectory (reply), challenge ID, and verify ID. The final x value in the trajectory is the answer. Stepping through the call stack led to TikTok’s VM bytecode interpreter, where I found a ct object:

ct: {key: Uint8Array(32), nonce: Uint8Array(12), reset: f}

32-byte key, 12-byte nonce. I assumed AES-256-GCM and built my first implementation around that.

It wasn’t AES. A Uint32Array proxy caught key material being loaded into the crypto engine, and the state initialization contained [1634760805, 857760878, ...] - in ASCII, “expand 32-byte k”. The ChaCha20 constants.

The next question was where the key comes from. With the debugger paused one level up from the crypto function, the full data flow was visible:

raw edata bytes: [0x01, key[0], key[1], ..., key[31], nonce[0], ..., nonce[11], ciphertext...]

The key is embedded directly in every message. Byte 0 is a version flag, bytes 1-32 are the key, bytes 33-44 are the nonce, and the rest is ciphertext. No key derivation, no key exchange. The format:

[0x01] [32-byte key] [12-byte nonce] [ciphertext]

I initially tried ChaCha20_Poly1305 since the JS code had Poly1305 state initialization. MAC check failed every time. Tried raw ChaCha20 without authentication - it worked immediately:

from Crypto.Cipher import ChaCha20

def decrypt(edata_b64):
    raw = base64.b64decode(edata_b64)
    key   = raw[1:33]
    nonce = raw[33:45]
    ct    = raw[45:]
    cipher = ChaCha20.new(key=key, nonce=nonce)
    return json.loads(cipher.decrypt(ct))

The Poly1305 tag was either not being computed or silently discarded. Unauthenticated ChaCha20 stream cipher with the key in the blob.

The Decrypted Challenge

With decryption working, the full challenge structure:

{
  "code": 200,
  "data": {
    "challenges": [{
      "challenge_code": 99996,
      "mode": "whirl",
      "question": {
        "url1": "https://p16-rc-captcha-sg.ibyteimg.com/.../outer.png",
        "url2": "https://p19-rc-captcha-sg.ibyteimg.com/.../inner.png"
      }
    }],
    "cyfreso": 41,
    "verify_id": "Verify_d55d6b4f-..."
  }
}

url1 is the outer ring image, url2 is the inner circle. The cyfreso field tracks retry state - it starts high and shifts after failed attempts (41 → 88 → 11 → 5 in one session), suggesting the server adjusts its acceptance threshold over time.

Encryption in the other direction is the same format. Generate random key and nonce, encrypt the solve payload, prepend the header:

def encrypt(data):
    key = os.urandom(32)
    nonce = os.urandom(12)
    cipher = ChaCha20.new(key=key, nonce=nonce)
    ct = cipher.encrypt(json.dumps(data, separators=(',',':')).encode())
    return base64.b64encode(bytes([1]) + key + nonce + ct).decode()

Solving the Image

The outer image (347×347) is a donut - a ring of content with a black hole in the center. The inner image (211×211) is a filled circle that fits inside that hole. Rotate the inner to the correct angle and the content continues seamlessly across the boundary.

The whirl captcha - outer ring with inner circle

I tried ORB feature matching first. Got 1 good match out of the minimum 8 needed - the images are too circular and uniform for keypoint detectors.

What worked: sample pixels along a circle at the boundary radius in both images, unwrap them into 1D signals, then cross-correlate to find the rotation offset.

def sample_ring(img, cx, cy, radius, n=720):
    pixels = []
    for i in range(n):
        angle = 2 * math.pi * i / n
        x = int(cx + radius * math.cos(angle))
        y = int(cy + radius * math.sin(angle))
        pixels.append(img[y, x].astype(float))
    return np.array(pixels)

The geometry is consistent across images: both the outer ring content and the inner circle content meet at radius ~105 from center. Sampling at multiple radii near the boundary and averaging the normalized cross-correlation across all channels and depths makes it robust.

NCC scores across rotation angles - clear peak at the correct angle

Testing on captured images consistently produces clear peaks with the top results clustering within ±3 degrees. The final x value maps linearly: target_x = (angle / 360) * 348, where 348 is the drag_width from the protocol.

Here’s what the solver is doing visually - overlaying the inner circle on the outer ring at different rotation angles, searching for the alignment where the boundary pixels match:

Comparing rotation angles - the solver searches for seamless alignment

What the Server Actually Checks

With the crypto reversed and the image solver producing correct angles, I built an end-to-end pipeline. The server kept returning code: 500, msg: "VerifyFailedErr".

To isolate the problem, I set up a browser intercept using Playwright. The idea: let the real SDK handle all anti-bot token generation (X-Bogus, X-Gnarly, detail, msToken), but hook XMLHttpRequest.send to decrypt the edata, swap in the solver’s trajectory, and re-encrypt - all before the SDK computes its signatures.

Playwright intercepting the captcha - terminal shows the full pipeline running

The hook worked. The browser sent modified requests with valid anti-bot tokens. The server still rejected them.

What became clear:

The angle isn’t the bottleneck. The SDK’s own computed x value was within 1-2px of the solver’s answer. The image analysis is correct.

The trajectory matters. The reply array isn’t just checked for the final value. The server analyzes the full drag path - timing, acceleration, y-axis jitter. Generated trajectories with smooth ease-out curves don’t match the micro-pauses and overshoot-correct patterns of real human drags.

There are signals outside the payload. The SDK collects telemetry throughout the page session - mouse movements before the captcha, timing between page load and interaction. The detail query parameter on the verify request is a large encoded blob that likely carries this behavioral data.

The server tracks state across attempts. The cyfreso counter shifted across my attempts (28 → 88 → 11 → 5), suggesting the server adjusts its acceptance threshold. Getting flagged early makes subsequent attempts harder.

Defense in Depth

TikTok’s captcha has three layers:

Encryption - ChaCha20 with embedded keys prevents casual inspection. It forces you to reverse the crypto or use the SDK. But the key is in the blob - this is obfuscation, not security.

The visual puzzle - the rotation challenge requires computer vision to solve programmatically. Boundary cross-correlation handles it. Keypoint matching doesn’t work due to circular symmetry.

Behavioral analysis - the real defense. The server validates not just whether the answer is correct, but whether the interaction looks human. Trajectory dynamics, session telemetry, anti-bot signatures from behavioral data. This is why correct angles still get rejected programmatically.

Most captcha write-ups focus on breaking the first two layers. The third is where the actual bot detection lives, and it’s harder to defeat because the signals aren’t contained in the captcha interaction - they come from the entire browsing session.

Takeaways

Hook the boundaries, not the internals. I spent time trying to access objects inside VM closures. What worked was hooking TextEncoder, Uint32Array, and JSON.stringify at the boundary between the VM and browser APIs.

Check your constants. [1634760805, 857760878, ...] immediately identifies ChaCha20 if you recognize the “expand 32-byte k” magic. Would have saved an hour of assuming AES.

Try the dumb thing. The key being in the first 32 bytes of the blob seemed too simple. It was real.

The captcha is the easy part. Breaking encryption and solving images are tractable problems. The behavioral fingerprinting that wraps the system is where the real investment is. If you’re evaluating captcha security, look at what happens around the puzzle, not just at the puzzle itself.

Tools used: Chrome DevTools, Python, Playwright, pycryptodome, OpenCV, numpy

SDK version: captcha-ttp.0a4bb10f.js (h5_sdk_version 2.34.12)