Building a Multi-Layer Bot Detection System

A learning project to understand how anti-bot systems work from the defender’s side. Flask, vanilla JS, no frameworks.

Why Rebuild

I had an old project called login-defender sitting on my GitHub. It worked, but it was bloated. Way too much code for what it actually did. I wanted something leaner that demonstrated understanding without the cruft.

The goal: build a multi-layer bot detection system for login forms. Each layer adds points to a score. Hit the threshold, get blocked. Simple concept, but the implementation touches a lot of different areas - API integration, client-side fingerprinting, cryptography, behavioral analysis.

Flask for the backend since I wanted to learn it. Vanilla JS for the client-side stuff. No frameworks, no unnecessary dependencies.

The Architecture

The core idea is a scoring engine. Instead of hard-blocking on any single signal, everything contributes points:

IP fraud score:     0-100 (from API)
Geo block:          100 (if blocked country)  
Headless signals:   varies (webdriver=100, no plugins=30, etc)
Engine checks:      varies

Total hits 60+? Blocked before we even look at the password.

I wrapped everything in a BotDetector class:

class BotDetector:
    def __init__(self, ip_address):
        self.ip_address = ip_address
        self.signals = {}
        self.totalscore = 0
        self.country = None
    
    def check_ip(self):
        # API call, sets self.signals["ip"] and self.signals["geo"]
        
    def headless_score(self, headless_score):
        self.signals["headless"] = headless_score
    
    def is_bot(self, threshold=60):
        self.totalscore = sum(self.signals.values())
        return self.totalscore >= threshold

Each check adds to self.signals. At the end, sum them up and compare to threshold. Easy to extend - want to add a new check? Just add another key to the dict.

IP Reputation

First layer. Before we even look at client-side signals, check if the IP is known bad.

Signed up for IPQualityScore. Free tier gives 5000 lookups/month, good enough for a demo. Their API returns a fraud score 0-100 plus location data.

def check_ip(self):
    if self.ip_address in WITHHELD_IPS:
        self.signals["ip"] = 0
        self.signals["geo"] = 0
        self.country = "LOCAL"
        return
    
    try:
        url = f"https://ipqualityscore.com/api/json/ip/{IPQS_API_KEY}/{self.ip_address}"
        response = requests.get(url)
        data = response.json()
        
        self.signals["ip"] = data.get('fraud_score', 0)
        self.country = data.get('country_code', 'UNKNOWN')
        
        if self.country in BLOCKED_COUNTRIES:
            self.signals["geo"] = 100
        else:
            self.signals["geo"] = 0
            
    except Exception as e:
        print(f"Error checking IP: {e}")
        self.signals["ip"] = 0
        self.signals["geo"] = 0

Had to whitelist localhost IPs for development. Otherwise I’d block myself every time.

Headless Detection

Client-side checks for automation signatures. The basic stuff:

window.addEventListener('DOMContentLoaded', function() {
    let isWebdriver = navigator.webdriver === true;
    let pluginCount = navigator.plugins.length;
    let hasLanguages = navigator.languages && navigator.languages.length > 0;
    
    let headlessScore = 0;
    if (isWebdriver) headlessScore += 100;
    if (pluginCount === 0) headlessScore += 30;
    if (!hasLanguages) headlessScore += 20;
    
    document.getElementById('headless_score').value = headlessScore;
});

navigator.webdriver is the obvious one. Chrome sets this to true when controlled by automation. Easy to spoof though.

Plugin count is better. Real browsers have plugins (PDF viewer, etc). Headless usually has zero.

Languages is a sanity check. Real browsers have language preferences set.

These go in a hidden form field, sent with the login POST.

Engine-Level Checks

The basic checks above are trivial to spoof. Just set navigator.webdriver = false in your Puppeteer script. Done.

Engine-level checks are harder to fake because they test how the browser actually works, not just what properties it claims to have.

function checkEngine() {
    let score = 0;
    
    // Native function check
    let alertStr = Function.prototype.toString.call(window.alert);
    if (alertStr.indexOf('[native code]') === -1) {
        score += 30;
    }
    
    // Puppeteer injection
    if (window.__puppeteer_evaluation_script__ !== undefined) {
        score += 50;
    }
    
    // Languages array check (not string)
    if (!navigator.languages || navigator.languages.length === 0) {
        score += 20;
    }
    
    // Chrome runtime presence
    if (window.chrome && !window.chrome.runtime) {
        score += 10;
    }
    
    return score;
}

The Function.prototype.toString trick catches when native functions have been modified. Real browsers return function alert() { [native code] }. Some automation frameworks patch these and the output changes.

__puppeteer_evaluation_script__ is injected by Puppeteer. Lazy detection, but catches lazy attackers.

Real Chrome has chrome.runtime. Headless Chrome often doesn’t, or it’s incomplete.

Browser Fingerprinting

Collect device attributes and hash them into a single identifier.

function generateFingerprint() {
    let data = {
        userAgent: navigator.userAgent,
        platform: navigator.platform,
        screenWidth: screen.width,
        screenHeight: screen.height,
        colorDepth: screen.colorDepth,
        timezoneOffset: new Date().getTimezoneOffset(),
        canvas: simpleHash(getCanvasFingerprint())
    };
    
    return btoa(JSON.stringify(data));
}

Base64 encode the JSON and send it along. Server can decode and log it.

The fingerprint itself isn’t used for scoring in this demo. But you could track fingerprints over time - if the same user suddenly has a completely different fingerprint, that’s suspicious.

Canvas Fingerprinting

Different devices render graphics slightly differently. GPU, drivers, fonts, anti-aliasing - they all affect the output. Drawing the same thing produces different pixel data on different machines.

function getCanvasFingerprint() {
    let canvas = document.createElement('canvas');
    let ctx = canvas.getContext('2d');
    
    ctx.textBaseline = "top";
    ctx.font = "14px 'Arial'";
    ctx.textBaseline = "alphabetic";
    ctx.fillStyle = "#f60";
    ctx.fillRect(125, 1, 62, 20);
    ctx.fillStyle = "#069";
    ctx.fillText("Cwm fjordbank glyphs vext quiz, 😃", 2, 15);
    ctx.fillStyle = "rgba(102, 204, 0, 0.7)";
    ctx.fillText("Cwm fjordbank glyphs vext quiz, 😃", 4, 17);
    
    return canvas.toDataURL();
}

The data URL is huge though. So hash it down:

function simpleHash(str) {
    let hash = 0, i, chr;
    if (str.length === 0) return hash;
    for (i = 0; i < str.length; i++) {
        chr = str.charCodeAt(i);
        hash = ((hash << 5) - hash) + chr;
        hash |= 0;
    }
    return hash;
}

Now I get a nice integer like -475993058 instead of a megabyte of base64.

Client-Side Encryption

Credentials in plaintext in the POST body? Gross. Even with HTTPS, I wanted to add a layer.

RSA encryption. Server generates a keypair, sends the public key embedded in the page. Client encrypts username/password with it. Only the server can decrypt.

Server side:

from cryptography.hazmat.primitives.asymmetric import rsa, padding
from cryptography.hazmat.primitives import serialization, hashes

def generate_keypair():
    private_key = rsa.generate_private_key(
        public_exponent=65537,
        key_size=2048
    )
    return private_key

def get_public_key_pem(private_key):
    public_key = private_key.public_key()
    pem = public_key.public_bytes(
        encoding=serialization.Encoding.PEM,
        format=serialization.PublicFormat.SubjectPublicKeyInfo
    )
    return pem.decode('utf-8')

def decrypt_data(private_key, encrypted_base64):
    import base64
    encrypted_bytes = base64.b64decode(encrypted_base64)
    
    decrypted = private_key.decrypt(
        encrypted_bytes,
        padding.OAEP(
            mgf=padding.MGF1(algorithm=hashes.SHA256()),
            algorithm=hashes.SHA256(),
            label=None
        )
    )
    return decrypted.decode('utf-8')

Client side uses Web Crypto API:

async function encryptWithPublicKey(publicKeyPem, data) {
    const pemContents = publicKeyPem
        .replace('-----BEGIN PUBLIC KEY-----', '')
        .replace('-----END PUBLIC KEY-----', '')
        .replace(/\s/g, '');
    
    const binaryKey = Uint8Array.from(atob(pemContents), c => c.charCodeAt(0));
    
    const cryptoKey = await crypto.subtle.importKey(
        'spki',
        binaryKey,
        { name: 'RSA-OAEP', hash: 'SHA-256' },
        false,
        ['encrypt']
    );
    
    const encodedData = new TextEncoder().encode(data);
    const encrypted = await crypto.subtle.encrypt(
        { name: 'RSA-OAEP' },
        cryptoKey,
        encodedData
    );
    
    return btoa(String.fromCharCode(...new Uint8Array(encrypted)));
}

Intercept form submit, encrypt credentials, clear the plaintext fields, submit:

document.querySelector('form').addEventListener('submit', async function(e) {
    e.preventDefault();
    
    const username = document.querySelector('input[name="username"]').value;
    const password = document.querySelector('input[name="password"]').value;
    
    const encryptedUsername = await encryptWithPublicKey(PUBLIC_KEY, username);
    const encryptedPassword = await encryptWithPublicKey(PUBLIC_KEY, password);
    
    document.getElementById('encrypted_username').value = encryptedUsername;
    document.getElementById('encrypted_password').value = encryptedPassword;
    
    // Clear plaintext
    document.querySelector('input[name="username"]').value = '';
    document.querySelector('input[name="password"]').value = '';
    
    this.submit();
});

One gotcha: Flask’s debug mode reloads the server on file changes. This regenerates the keypair. So the page has public key A, server now has private key B, decryption fails. Fixed by disabling the reloader:

app.run(debug=True, port=5001, use_reloader=False)

Geolocking

IPQS already returns country code. Just check against a blocklist:

BLOCKED_COUNTRIES = ['CN', 'RU', 'KP', 'IR']

# In check_ip():
self.country = data.get('country_code', 'UNKNOWN')
if self.country in BLOCKED_COUNTRIES:
    self.signals["geo"] = 100
else:
    self.signals["geo"] = 0

No database needed. Just a Python list.

The Scoring System

Everything flows through BotDetector:

@main.route('/login', methods=['POST'])
def login():
    encrypted_username = request.form.get('encrypted_username')
    encrypted_password = request.form.get('encrypted_password')
    
    username = decrypt_data(private_key, encrypted_username)
    password = decrypt_data(private_key, encrypted_password)
    
    ip_address = request.remote_addr
    
    detector = BotDetector(ip_address)
    detector.check_ip()
    detector.headless_score(headless_score=int(request.form.get('headless_score', 0)))
    
    if detector.is_bot():
        log_event(detector.get_summary())
        return "Access Denied: Bot Detected", 403
    else:
        log_event(detector.get_summary())
        return f"Welcome {username}!", 200

Logs look like:

2025-12-22 17:47:03 - IP: 127.0.0.1, Signals: {'ip': 0, 'geo': 0, 'headless': 0}, Total: 0 Blocked: False

Easy to see exactly why something got blocked.

Lessons Learned

Building the defender side taught me things I didn’t pick up from just reversing.

Additive scoring beats hard blocks. Any single check can be spoofed. The idea is to make attackers work harder on multiple fronts.

Engine-level checks are underrated. navigator.webdriver is trivial to fake. Checking if Function.prototype.toString returns expected output is harder to spoof without actually understanding what’s being checked.

Keep it simple. My original login-defender project was way too complicated. This does the same thing in a fraction of the code.

Debug mode will ruin your crypto. Learned that one the hard way. Server restart = new keypair = decryption fails = confused debugging for 10 minutes.

ID mismatches are silent killers. Had id="generateFingerprint" in HTML and getElementById('fingerprint') in JS. Form submitted fine, field was just empty. No errors. Check your IDs.

What’s Missing

This is a learning project, not production code. Things I’d add for real use:

Rate limiting - Block IPs making too many attempts
CSRF validation - Actually verify the token server-side (currently just generated)
Timing analysis - How long did they spend on the page before submitting?
Mouse movement - Real users move the mouse. Bots often don’t.
Persistent keys - Store RSA keypair instead of regenerating

But for demonstrating the concepts? This covers it.

Code: github.com/B9ph0met/antibot-sim