Reversing PerimeterX's Web Sensor

A deep dive into HUMAN Security’s PerimeterX web sensor - from obfuscated init.js to a complete understanding of the collector protocol, payload encoding, response decoding, and _px3 cookie generation.

Introduction

After spending time reversing Akamai BMP on mobile and Castle’s request tokens, I wanted to look at something on the web side. PerimeterX (now HUMAN Security) is one of the more widely deployed bot management solutions out there - they protect major sportsbooks, e-commerce platforms, and a lot of other high-value targets. Most of the public research on PX is surface-level: people sharing cookie values and basic bypass scripts. I wanted to go deeper and actually understand how the system works end to end.

The goal: reverse engineer the complete lifecycle of a _px3 cookie. How the client collects fingerprints, how it encodes and sends them, how the server responds, and how the cookie gets set. Not just “what does it do” but “how does it do it, and where could it be improved.”

The Target

A PX-protected site loads a script from https://client.px-cloud.net/<APP_ID>/main.min.js. This bootstraps PX by injecting a second script - the real sensor - from a dynamically generated URL. The sensor is init.js, about 800KB of heavily minified JavaScript. One line, 80,000+ characters.

The first thing PX does on page load is collect browser fingerprints, encode them into a payload, and POST them to a collector endpoint. The server evaluates the data and responds with cookies - including _px3, the clearance token that proves you’re human.

https://collector-<APP_ID>.px-cloud.net/api/v2/collector

The APP_ID is static per protected site.

Deobfuscation

Working with a single 80K-character line isn’t practical. I built an 8-pass Babel deobfuscator to make the code readable:

Constant folding - evaluate static expressions like 3 + 4 → 7
Dead code elimination - remove if(false) blocks
Void-to-undefined - replace void 0 with undefined
Comma expression unwinding - split comma sequences into individual statements
Ternary expansion - convert a ? b() : c() to if/else
Sequence expression splitting - break (a(), b(), c()) into separate statements
Return statement expansion - split return a(), b into a(); return b
Variable renaming - give meaningful names where possible

After deobfuscation, the init.js expanded to ~5,200 lines. Readable enough to work with, though PX also uses two string lookup arrays (_l and tf) with rotated indices that need runtime resolution.

String Array Obfuscation

PX uses a common pattern: function calls like _l(246) and tf(239) that index into rotated arrays of strings. The arrays get shifted at load time, so the indices don’t map to the same values between page loads.

Rather than reimplementing the rotation, I resolved them the easy way - set a breakpoint in the debugger and typed the calls directly into the console:

_l(239)  // "slice"
_l(246)  // "length"
_l(242)  // "floor"
_l(251)  // "charCodeAt"
_l(247)  // "indexOf"
_l(240)  // "push"
_l(236)  // "sort"
_l(250)  // "split"
_l(254)  // "substring"

Two separate lookup functions, two separate arrays. Easy to confuse. tf(246) resolves to "sort" but _l(246) resolves to "length". Mixing these up would silently produce wrong code.

The Collector Request

Every collector POST carries both URL parameters and a payload body. Here’s the full parameter set:

Parameter	Value	Source
`payload`	Encoded fingerprint blob	`Gl()` encoder
`appId`	`<APP_ID>`	Static per site
`tag`	`<TAG>`	Static per site
`uuid`	UUID v1	Generated
`ft`	`369`	Static
`seq`	`1`, `2`, `3`…	Incrementing
`en`	`NTA`	Base64 of “50” (XOR key)
`cs`	Hex string	Server echo
`pc`	Hex string	Custom MD5
`sid`	UUID	Session ID
`vid`	UUID	Visitor ID
`cts`	UUID	Client timestamp
`rsc`	`2`	Request sequence

The en parameter is interesting - it’s the base64 encoding of “50”, which is the XOR key used in the payload encoding. PX is literally telling the server which key it used. This makes sense architecturally: the server needs to decode the payload, and the key isn’t a secret from the server, just from anyone sniffing traffic without understanding the protocol.

Payload Encoding: The Gl() Function

The payload goes through a multi-step encoding pipeline. After resolving the string lookups, the Gl function reads like this:

1. events.slice()                        → copy the events array
2. JSON.stringify(events)                → serialize
3. _xorCipher(json, 50)                  → XOR each byte with key 50
4. _btoa(xored)                          → base64 encode
5. scatter-insert(base64, sessionKey)    → interleave key at pseudorandom positions

The scatter-insert is the clever part. PX generates a session key by XOR-ciphering the base64-encoded session token with key 10. Then it calculates pseudorandom insertion positions using the key’s character codes, and interleaves the key characters into the base64 payload at those positions. The server reverses this by extracting the known key from the known positions.

This isn’t encryption in any meaningful sense - the XOR key is sent as a URL parameter, and the scatter positions are deterministic from the session token. It’s obfuscation designed to prevent casual replay attacks and make the payload opaque to network-level inspection tools. Anyone who takes the time to read Gl can reverse it.

Payload Structure: Three Event Types

Setting a breakpoint on uh (the payload assembly function) and clearing cookies to force a fresh PX session, I captured the raw events array before encoding. PX sends three distinct event types in a single payload:

Event 0: Page Load + Browser Fingerprint

The big one. ~100 fields covering everything PX cares about:

Browser identity: userAgent, appVersion, platform, vendor, productSub
Screen: width, height, availWidth, availHeight, colorDepth, devicePixelRatio, orientation
Languages: language, languages array
Hardware: hardwareConcurrency, deviceMemory, maxTouchPoints
Network: connection.effectiveType, RTT, downlink
~15 fingerprint hashes: canvas, audio, fonts, WebGL renders
~10 CRC32 short hashes: various sub-fingerprints
~25 boolean flags: feature detection (localStorage, indexedDB, WebSocket, etc.)
~15 bot detection flags: webdriver, phantom, selenium, headless indicators
Timezone, page URL, timestamps, User Agent Client Hints

One field stood out: "d41d8cd98f00b204e9800998ecf8427e" - the MD5 of an empty string. PX is checking whether certain APIs return empty results. In a normal browser they don’t, so this hash is a bot indicator.

Event 1: WebGL Deep Fingerprint

A dedicated event just for WebGL data:

GL vendor, renderer, GLSL version
UNMASKED_VENDOR_WEBGL and UNMASKED_RENDERER_WEBGL (the real GPU info)
All 38 supported WebGL extensions
56 WebGL parameter values (max texture size, viewport dims, precision formats)
Multiple WebGL render hashes

The WebGL fingerprint is one of the strongest signals PX has. The extension list and parameter values are highly specific to the GPU/driver combination. My M3 Pro reports values that would be impossible on an Intel integrated GPU. Spoofing these requires knowing the exact parameter set for the target hardware.

Event 2: Behavioral Events

Mouse movements captured as individual events:

{
  "t": "Bz99fUJTckw=",
  "d": {
    "event_type": "mousemove",
    "x": 785,
    "y": 407,
    "timestamp": 1770867293337,
    "isTrusted": true,
    "target": "HEADER>DIV"
  }
}

The isTrusted flag is worth noting - browsers set this to true only for events initiated by actual user interaction, not by dispatchEvent(). PX records it but I suspect the server doesn’t rely on it heavily since it’s trivially spoofable in a controlled environment.

Field Key Architecture

This was one of the more interesting discoveries. Every field in the payload uses a base64-encoded key like GU1jT1wjb3k= or CzNxcU1de0o=. I initially assumed these were runtime-randomized - that each page load would generate new keys.

They’re not. Decoding them reveals random binary data:

atob("GU1jT1wjb3k=")  // random bytes, not meaningful text
atob("CzNxcU1de0o=")  // same - random binary

These keys are generated at build time and hardcoded into init.js as string literals. They stay the same for all visitors until PX pushes a new build. The server maintains a mapping: GU1jT1wjb3k= → “this is the userAgent field.”

This has implications for building a generator. You can extract the current field keys from init.js with a regex:

re := regexp.MustCompile(`a\.d\["([A-Za-z0-9+/=]+)"\]`)
keys := re.FindAllStringSubmatch(initJS, -1)

The key-to-meaning mapping is determined by position in the uh function, which stays consistent across builds even when the key values change. Map the positions once, and you can re-extract keys automatically when PX updates.

The cs Checksum: Not What I Expected

I traced the cs parameter expecting to find a hash computation. The function oc() is simple:

function oc() {
    return ni
}

It just returns a stored variable. Tracing where ni gets set, I found three assignment locations, all in response handlers. Specifically, the server’s response includes a command lllll0|cu|<checksum_value>, and the client stores that value in ni. The next collector request echoes it back as cs.

The first request has no cs (or a default value). Every subsequent request echoes what the server sent last time. It’s a simple state synchronization mechanism, not a client-side computation. One less thing to implement.

The pc Integrity Hash: Custom MD5

The pc parameter was more involved. It’s computed by ee(ft(t)) where:

ft(t) is a custom JSON serializer for the events array
ee() computes an MD5 hash, then applies custom post-processing

The post-processing is where it diverges from standard MD5. After hashing, ee splits the hex output by character code ranges, concatenates the groups differently, then takes every other character:

function ee(t, e) {
    var n = F(t, e);           // MD5 hash
    // Split chars by code range
    for (var r = 0; r < n.length; r++) {
        var a = n.charCodeAt(r);
        a >= _t && a <= Ut ? e += n[r] : n += a % Ht
    }
    // Take every other character
    for (var a = "", o = 0; o < r.length; o += 2)
        a += r[o];
    return a
}

I verified this in the console:

ee("test")        → "0842338224290718"
ee("test", "key") → "1240663937712921"

Standard MD5 of “test” with every-other-char would give "086c42d7cd4822bf" - different. The custom post-processing means you can’t use a standard MD5 library; you need to port PX’s specific implementation.

Decoding the Server Response

This was the most satisfying part to crack. The collector responds with:

{"do": null, "ob": "PT09PWFhLTIkLy8vL2E9PT09YWE9LQ4..."}

do is for directives (captcha challenges, etc.). ob is the obfuscated response blob. Tracing through the response handler function _y:

var s = _atob(ob);                           // base64 decode
var f = sf(tag);                             // derive key from tag
var commands = _xorCipher(s, parseInt(f, 10) % 128)  // XOR with derived key
                .split("~~~~");               // split into command array

The full decode pipeline:

ob → base64 decode → XOR(sf(tag) % 128) → split("~~~~") → command array

Each byte of the base64-decoded blob gets XORed with the derived key to produce readable text.

Testing it against a captured ob value:

var decoded = Buffer.from(ob, 'base64');
var xored = Buffer.from(decoded.map(b => b ^ derivedKey));
console.log(xored.toString());

l00ll0|_px3|330|f0e0bf2e76e32526a16539666e9afa6f...|true|300
~~~~llll00|cu
~~~~0llll00l|_pxde|330|93ac1c4f7dd641227946db53...|true|300

Three commands, separated by ~~~~, pipe-delimited:

Opcode	Action
`l00ll0`	Set `_px3` cookie
`llll00`	Set `cu` (checksum/uuid)
`0llll00l`	Set `_pxde` cookie

The opcode format uses l and 0 characters - binary-looking but not actual binary. Each cookie command includes: name, TTL (330 minutes = 5.5 hours), value, secure flag, and max-age.

One thing I noticed: _px3 doesn’t come back on every collector response. The first request typically returns _pxde and cu only. The _px3 clearance cookie comes on subsequent requests after PX has collected enough fingerprint and behavioral data. The server is building a risk profile across multiple requests before issuing clearance.

The Complete Round-Trip

Putting it all together:

1. Fetch init.js → extract field keys, config
2. Build fingerprint payload (3 event types, 100+ fields)
3. Encode: JSON → XOR(50) → base64 → scatter-insert session key
4. Compute cs (echo server's previous value) + pc (custom MD5)
5. POST to collector endpoint with all parameters
6. Parse response: JSON → extract ob
7. Decode ob: base64 → XOR(sf(tag) % 128) → split("~~~~")
8. Parse pipe-delimited commands → extract _px3 cookie value
9. Use _px3 in subsequent requests to the protected site

Where PX Could Be Strengthened

PX’s design puts the intelligence server-side and keeps the client lightweight. That’s a solid architecture. But it means everything on the client is static per build: the field keys, the XOR keys, the encoding logic, the response format. Reverse it once, and you’re set until the next deploy.

The core issue is that the security boundary is the build, not the session. A few changes would shift that.

VM Obfuscation

The biggest gap. Eight Babel passes and the code is readable. You can step through Gl, uh, and the response decoder in DevTools like any other JavaScript. A bytecode VM over just those critical functions would change the economics from “one afternoon of deobfuscation” to “re-trace the VM every build.” The sensor already runs async after page load, so the performance cost on the critical path would be minimal.

Session-Scoped Secrets

PX already does multiple round trips before issuing _px3. That existing flow could carry per-session keys without adding latency.

Field key mappings could come from the server’s first response instead of being hardcoded in init.js. The script stays cacheable, but the payload structure becomes unique per visitor. Same idea for the response decode key: send it per-session instead of deriving it from the tag. Captured traffic from one session becomes useless for replaying another.

The en parameter is the simplest example. en=NTA tells you the XOR key is 50. Deriving it server-side costs nothing.

Script Integrity and Timing as Scoring Signals

Rather than hard-blocking on instrumentation detection, PX could feed it into the existing scoring model. Hash the sensor at runtime and send it as a payload field. A deobfuscated or patched script produces a wrong hash, which bumps the risk score without creating an arms race around hard blocks.

Similarly, timing the execution of key functions catches breakpoints and hooks cheaply. Normal execution of Gl takes single-digit milliseconds. Stepping through it in DevTools takes minutes. That delta is an easy signal to score.

Both of these fit PX’s architecture naturally since the server already evaluates dozens of signals. These are just two more.

The Bigger Picture

PX’s strength is the server-side model: thorough fingerprinting, behavioral tracking, multi-request scoring. The improvements above protect that model. A VM raises the cost of static analysis. Session-scoped secrets make reversed knowledge perishable. Integrity and timing signals catch active instrumentation. Together they move the problem from “reverse the script once” to “maintain a live, per-session pipeline that evades detection.” That’s a much harder problem to scale.

Tools used: Chrome DevTools, Babel, Node.js

Content on this site is licensed CC BY-NC-SA 4.0