I’ve been poking at anti-bot systems for a while now. After spending way too long on TikTok’s VM, I figured OpenAI’s Sentinel would be a nice change of pace. Spoiler: it was. Their approach is significantly simpler, though still interesting.
Finding the SDK
Open ChatGPT, pop open DevTools, look at the network tab. You’ll see requests to /backend-api/sentinel/ carrying tokens that look like this:
{
"p": "gAAAAAB...",
"t": "MjM=",
"c": "gAAAAAC...~S"
}
The SDK lives at https://cdn.oaistatic.com/sentinel/. There’s a loader script and the actual SDK. Grab the SDK, run it through beautifier.io, and you get something like this:
var SentinelSDK = function(t) {
"use strict";
const n = o,
e = function() {
let t = !0;
return function(n, e) {
const r = t ? function() {
if (e) {
const t = e[o(1)](n, arguments);
return e = null, t
}
} : function() {};
Classic string array obfuscation. Nothing fancy.
The String Arrays
Scrolling through the beautified code, I counted 9 different string arrays. Each one has the same pattern:
function c() {
const t = ["toLowerCase", "apply", "toString", "constructor", "slice", "push", "(((.+)+)+)+$", "search"];
return (c = function() {
return t
})()
}
The lookup functions subtract from the index (usually t -= 0, which does nothing, but sometimes there’s an actual offset). The obfuscator also creates aliases everywhere:
const n = o; // n is now an alias for o
const C = _; // C is now an alias for _
This is where things got annoying.
Writing the Deobfuscator
I started with a simple Babel transform. Map each lookup function to its array, replace calls with the resolved string:
const stringArrays = {
c: ["toLowerCase", "apply", "toString", ...],
a: ["crypto.getRandomValues() not supported...", ...],
// ... 7 more arrays
};
const lookupToArray = {
'o': 'c', // o() looks up from c()
'd': 'a', // d() looks up from a()
'_': 'M',
// etc
};
First run: 40 replacements. The code still had calls like n(5) and t(7) everywhere. The aliases weren’t being resolved.
My first fix was dumb. I just hardcoded known aliases:
const knownAliases = {
'C': '_',
'Lt': 'sn',
'n': 'o',
'p': 'w',
};
This broke everything. Turns out n and t are used as parameter names in like 50 different scopes. When a function has const t = _; at the top, that t should resolve to _. But when another function uses t as a loop variable, it shouldn’t.
The fix was letting Babel’s scope tracking do the work:
CallExpression(path) {
const callee = path.node.callee;
if (t.isIdentifier(callee)) {
let funcName = callee.name;
let array = resolveToArray(funcName);
// Check if there's a local binding that aliases a lookup
if (!array) {
const binding = path.scope.getBinding(funcName);
if (binding && binding.path.isVariableDeclarator()) {
const init = binding.path.node.init;
if (t.isIdentifier(init)) {
array = resolveToArray(init.name);
}
}
}
// ... replace if found
}
}
Second run: 377 replacements. Now we’re talking.
Before:
i[n(5)]((t + 256)[n(2)](16)[n(4)](1));
After:
i["push"]((t + 256)["toString"](16)["slice"](1));
What’s Actually in There
With the strings resolved, the SDK structure becomes clear. There’s a class that handles everything:
class O {
answers = new Map;
maxAttempts = 500000;
errorPrefix = "wQ8Lk5FbGpA2NcR9dShT6gYjU7VxZ4D";
async getEnforcementToken(t, n) { ... }
async getRequirementsToken() { ... }
getConfig() { ... }
_runCheck(t, n, e, r, o) { ... }
}
The Fingerprint (P field)
getConfig() builds an array of 18 values:
return [
screen?.width + screen?.height,
"" + new Date(),
performance?.memory?.jsHeapSizeLimit,
Math?.random(), // placeholder for attempt number
navigator.userAgent,
// random script src from the page
// build identifier
navigator.language,
navigator.languages?.join(","),
// placeholder for elapsed time
// random navigator property
// random document key
// random window key
performance.now(),
this.sid, // UUID
// URL search params
navigator.hardwareConcurrency,
performance.timeOrigin
];
Nothing sophisticated. No canvas fingerprinting, no WebGL hashes. Just basic browser properties. Some fields are randomized on each call (indices 5, 10, 11, 12) which explains why the P field varies between requests.
The Proof of Work (C field)
The PoW uses FNV-1a with extra mixing:
function fnv1a(str) {
let hash = 2166136261;
for (let i = 0; i < str.length; i++) {
hash ^= str.charCodeAt(i);
hash = Math.imul(hash, 16777619) >>> 0;
}
// Additional mixing
hash ^= hash >>> 16;
hash = Math.imul(hash, 2246822507) >>> 0;
hash ^= hash >>> 13;
hash = Math.imul(hash, 3266489909) >>> 0;
hash ^= hash >>> 16;
return (hash >>> 0).toString(16).padStart(8, "0");
}
The solver concatenates the seed with the base64-encoded config, hashes it, and checks if the result meets the difficulty threshold. Standard hashcash pattern. They allow up to 500k attempts before giving up.
The ~S suffix on solutions indicates success. Failed attempts use the error prefix instead.
The VM (T field)
This is the interesting part. Sentinel has a Map-based VM for executing server-sent bytecode. The turnstile challenge sends encrypted instructions, and the client runs them.
The VM initialization looks like this:
gt.set(I, bt); // 0: init
gt.set(N, (n, e) => gt.set(n, xor(gt.get(n), gt.get(e)))); // 1: xor
gt.set(q, (n, e) => gt.set(n, e)); // 2: set value
gt.set(D, (t) => resolve(btoa(t))); // 3: success
gt.set($, (t) => reject(btoa(t))); // 4: error
// ... 28 more opcodes
Bytecode comes in as base64, gets XOR’d with the PoW solution as the key, then parsed as JSON. Each instruction is [opcode, ...args]. The executor pops from a stack and runs until empty:
function vt() {
while (gt.get(B).length > 0) {
const [opcode, ...args] = gt.get(B).shift();
gt.get(opcode)(...args);
}
}
I mapped out 28 active opcodes. Most are data operations (set, copy, add, array access). A few handle control flow (conditional execution, try/catch). The interesting one is opcode 30 which lets the server define new handlers at runtime.
Compared to TikTok
If you’ve read my TikTok VM writeup, Sentinel feels like a warmup exercise.
| TikTok | Sentinel | |
|---|---|---|
| Opcodes | 77 | ~28 |
| Architecture | Stack-based | Map-based |
| Encryption | RC4 | XOR |
| Fingerprinting | Extensive | Basic |
| String obfuscation | Heavy | Moderate |
TikTok’s VM is genuinely complex. Multiple register sets, control flow graphs, the works. Sentinel is straightforward by comparison. The server can still change the bytecode whenever they want, but the instruction set is small enough to fully document in an afternoon.
The Repo
I put together a toolkit with everything:
- Working PoW solver
- Fingerprint generator
- Babel deobfuscator (377 replacements)
- Opcode documentation
github.com/B9ph0met/chatgpt-re
The deobfuscated SDK is in samples/. Run npm run deobfuscate if you want to process a fresh copy.
Takeaways
Sentinel is what I’d call “appropriate security.” It’s not trying to be unbreakable. The fingerprinting is basic, the PoW is standard, the VM is simple. But it raises the bar enough that casual scrapers have to put in effort.
The real defense isn’t in the client code at all. It’s that the bytecode comes from the server. They can update fingerprinting logic, add new opcodes, or change the challenge format without pushing any client updates. By the time you’ve figured out what today’s bytecode does, they might have changed it.
For a chat interface that needs to balance security with user experience, this seems about right.