A learning project to understand how anti-bot systems work from the defender’s side. Flask, vanilla JS, no frameworks.
Why Rebuild
I had an old project called login-defender sitting on my GitHub. It worked, but it was bloated. Way too much code for what it actually did. I wanted something leaner that demonstrated understanding without the cruft.
The goal: build a multi-layer bot detection system for login forms. Each layer adds points to a score. Hit the threshold, get blocked. Simple concept, but the implementation touches a lot of different areas - API integration, client-side fingerprinting, cryptography, behavioral analysis.
Flask for the backend since I wanted to learn it. Vanilla JS for the client-side stuff. No frameworks, no unnecessary dependencies.
The Architecture
The core idea is a scoring engine. Instead of hard-blocking on any single signal, everything contributes points:
IP fraud score: 0-100 (from API)
Geo block: 100 (if blocked country)
Headless signals: varies (webdriver=100, no plugins=30, etc)
Engine checks: varies
Total hits 60+? Blocked before we even look at the password.
I wrapped everything in a BotDetector class:
class BotDetector:
def __init__(self, ip_address):
self.ip_address = ip_address
self.signals = {}
self.totalscore = 0
self.country = None
def check_ip(self):
# API call, sets self.signals["ip"] and self.signals["geo"]
def headless_score(self, headless_score):
self.signals["headless"] = headless_score
def is_bot(self, threshold=60):
self.totalscore = sum(self.signals.values())
return self.totalscore >= threshold
Each check adds to self.signals. At the end, sum them up and compare to threshold. Easy to extend - want to add a new check? Just add another key to the dict.
IP Reputation
First layer. Before we even look at client-side signals, check if the IP is known bad.
Signed up for IPQualityScore. Free tier gives 5000 lookups/month, good enough for a demo. Their API returns a fraud score 0-100 plus location data.
def check_ip(self):
if self.ip_address in WITHHELD_IPS:
self.signals["ip"] = 0
self.signals["geo"] = 0
self.country = "LOCAL"
return
try:
url = f"https://ipqualityscore.com/api/json/ip/{IPQS_API_KEY}/{self.ip_address}"
response = requests.get(url)
data = response.json()
self.signals["ip"] = data.get('fraud_score', 0)
self.country = data.get('country_code', 'UNKNOWN')
if self.country in BLOCKED_COUNTRIES:
self.signals["geo"] = 100
else:
self.signals["geo"] = 0
except Exception as e:
print(f"Error checking IP: {e}")
self.signals["ip"] = 0
self.signals["geo"] = 0
Had to whitelist localhost IPs for development. Otherwise I’d block myself every time.
Headless Detection
Client-side checks for automation signatures. The basic stuff:
window.addEventListener('DOMContentLoaded', function() {
let isWebdriver = navigator.webdriver === true;
let pluginCount = navigator.plugins.length;
let hasLanguages = navigator.languages && navigator.languages.length > 0;
let headlessScore = 0;
if (isWebdriver) headlessScore += 100;
if (pluginCount === 0) headlessScore += 30;
if (!hasLanguages) headlessScore += 20;
document.getElementById('headless_score').value = headlessScore;
});
navigator.webdriver is the obvious one. Chrome sets this to true when controlled by automation. Easy to spoof though.
Plugin count is better. Real browsers have plugins (PDF viewer, etc). Headless usually has zero.
Languages is a sanity check. Real browsers have language preferences set.
These go in a hidden form field, sent with the login POST.
Engine-Level Checks
The basic checks above are trivial to spoof. Just set navigator.webdriver = false in your Puppeteer script. Done.
Engine-level checks are harder to fake because they test how the browser actually works, not just what properties it claims to have.
function checkEngine() {
let score = 0;
// Native function check
let alertStr = Function.prototype.toString.call(window.alert);
if (alertStr.indexOf('[native code]') === -1) {
score += 30;
}
// Puppeteer injection
if (window.__puppeteer_evaluation_script__ !== undefined) {
score += 50;
}
// Languages array check (not string)
if (!navigator.languages || navigator.languages.length === 0) {
score += 20;
}
// Chrome runtime presence
if (window.chrome && !window.chrome.runtime) {
score += 10;
}
return score;
}
The Function.prototype.toString trick catches when native functions have been modified. Real browsers return function alert() { [native code] }. Some automation frameworks patch these and the output changes.
__puppeteer_evaluation_script__ is injected by Puppeteer. Lazy detection, but catches lazy attackers.
Real Chrome has chrome.runtime. Headless Chrome often doesn’t, or it’s incomplete.
Browser Fingerprinting
Collect device attributes and hash them into a single identifier.
function generateFingerprint() {
let data = {
userAgent: navigator.userAgent,
platform: navigator.platform,
screenWidth: screen.width,
screenHeight: screen.height,
colorDepth: screen.colorDepth,
timezoneOffset: new Date().getTimezoneOffset(),
canvas: simpleHash(getCanvasFingerprint())
};
return btoa(JSON.stringify(data));
}
Base64 encode the JSON and send it along. Server can decode and log it.
The fingerprint itself isn’t used for scoring in this demo. But you could track fingerprints over time - if the same user suddenly has a completely different fingerprint, that’s suspicious.
Canvas Fingerprinting
Different devices render graphics slightly differently. GPU, drivers, fonts, anti-aliasing - they all affect the output. Drawing the same thing produces different pixel data on different machines.
function getCanvasFingerprint() {
let canvas = document.createElement('canvas');
let ctx = canvas.getContext('2d');
ctx.textBaseline = "top";
ctx.font = "14px 'Arial'";
ctx.textBaseline = "alphabetic";
ctx.fillStyle = "#f60";
ctx.fillRect(125, 1, 62, 20);
ctx.fillStyle = "#069";
ctx.fillText("Cwm fjordbank glyphs vext quiz, 😃", 2, 15);
ctx.fillStyle = "rgba(102, 204, 0, 0.7)";
ctx.fillText("Cwm fjordbank glyphs vext quiz, 😃", 4, 17);
return canvas.toDataURL();
}
The data URL is huge though. So hash it down:
function simpleHash(str) {
let hash = 0, i, chr;
if (str.length === 0) return hash;
for (i = 0; i < str.length; i++) {
chr = str.charCodeAt(i);
hash = ((hash << 5) - hash) + chr;
hash |= 0;
}
return hash;
}
Now I get a nice integer like -475993058 instead of a megabyte of base64.
Client-Side Encryption
Credentials in plaintext in the POST body? Gross. Even with HTTPS, I wanted to add a layer.
RSA encryption. Server generates a keypair, sends the public key embedded in the page. Client encrypts username/password with it. Only the server can decrypt.
Server side:
from cryptography.hazmat.primitives.asymmetric import rsa, padding
from cryptography.hazmat.primitives import serialization, hashes
def generate_keypair():
private_key = rsa.generate_private_key(
public_exponent=65537,
key_size=2048
)
return private_key
def get_public_key_pem(private_key):
public_key = private_key.public_key()
pem = public_key.public_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PublicFormat.SubjectPublicKeyInfo
)
return pem.decode('utf-8')
def decrypt_data(private_key, encrypted_base64):
import base64
encrypted_bytes = base64.b64decode(encrypted_base64)
decrypted = private_key.decrypt(
encrypted_bytes,
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
return decrypted.decode('utf-8')
Client side uses Web Crypto API:
async function encryptWithPublicKey(publicKeyPem, data) {
const pemContents = publicKeyPem
.replace('-----BEGIN PUBLIC KEY-----', '')
.replace('-----END PUBLIC KEY-----', '')
.replace(/\s/g, '');
const binaryKey = Uint8Array.from(atob(pemContents), c => c.charCodeAt(0));
const cryptoKey = await crypto.subtle.importKey(
'spki',
binaryKey,
{ name: 'RSA-OAEP', hash: 'SHA-256' },
false,
['encrypt']
);
const encodedData = new TextEncoder().encode(data);
const encrypted = await crypto.subtle.encrypt(
{ name: 'RSA-OAEP' },
cryptoKey,
encodedData
);
return btoa(String.fromCharCode(...new Uint8Array(encrypted)));
}
Intercept form submit, encrypt credentials, clear the plaintext fields, submit:
document.querySelector('form').addEventListener('submit', async function(e) {
e.preventDefault();
const username = document.querySelector('input[name="username"]').value;
const password = document.querySelector('input[name="password"]').value;
const encryptedUsername = await encryptWithPublicKey(PUBLIC_KEY, username);
const encryptedPassword = await encryptWithPublicKey(PUBLIC_KEY, password);
document.getElementById('encrypted_username').value = encryptedUsername;
document.getElementById('encrypted_password').value = encryptedPassword;
// Clear plaintext
document.querySelector('input[name="username"]').value = '';
document.querySelector('input[name="password"]').value = '';
this.submit();
});
One gotcha: Flask’s debug mode reloads the server on file changes. This regenerates the keypair. So the page has public key A, server now has private key B, decryption fails. Fixed by disabling the reloader:
app.run(debug=True, port=5001, use_reloader=False)
Geolocking
IPQS already returns country code. Just check against a blocklist:
BLOCKED_COUNTRIES = ['CN', 'RU', 'KP', 'IR']
# In check_ip():
self.country = data.get('country_code', 'UNKNOWN')
if self.country in BLOCKED_COUNTRIES:
self.signals["geo"] = 100
else:
self.signals["geo"] = 0
No database needed. Just a Python list.
The Scoring System
Everything flows through BotDetector:
@main.route('/login', methods=['POST'])
def login():
encrypted_username = request.form.get('encrypted_username')
encrypted_password = request.form.get('encrypted_password')
username = decrypt_data(private_key, encrypted_username)
password = decrypt_data(private_key, encrypted_password)
ip_address = request.remote_addr
detector = BotDetector(ip_address)
detector.check_ip()
detector.headless_score(headless_score=int(request.form.get('headless_score', 0)))
if detector.is_bot():
log_event(detector.get_summary())
return "Access Denied: Bot Detected", 403
else:
log_event(detector.get_summary())
return f"Welcome {username}!", 200
Logs look like:
2025-12-22 17:47:03 - IP: 127.0.0.1, Signals: {'ip': 0, 'geo': 0, 'headless': 0}, Total: 0 Blocked: False
Easy to see exactly why something got blocked.
Lessons Learned
Building the defender side taught me things I didn’t pick up from just reversing.
Additive scoring beats hard blocks. Any single check can be spoofed. The idea is to make attackers work harder on multiple fronts.
Engine-level checks are underrated. navigator.webdriver is trivial to fake. Checking if Function.prototype.toString returns expected output is harder to spoof without actually understanding what’s being checked.
Keep it simple. My original login-defender project was way too complicated. This does the same thing in a fraction of the code.
Debug mode will ruin your crypto. Learned that one the hard way. Server restart = new keypair = decryption fails = confused debugging for 10 minutes.
ID mismatches are silent killers. Had id="generateFingerprint" in HTML and getElementById('fingerprint') in JS. Form submitted fine, field was just empty. No errors. Check your IDs.
What’s Missing
This is a learning project, not production code. Things I’d add for real use:
- Rate limiting - Block IPs making too many attempts
- CSRF validation - Actually verify the token server-side (currently just generated)
- Timing analysis - How long did they spend on the page before submitting?
- Mouse movement - Real users move the mouse. Bots often don’t.
- Persistent keys - Store RSA keypair instead of regenerating
But for demonstrating the concepts? This covers it.
Code: github.com/B9ph0met/antibot-sim
Content on this site is licensed CC BY-NC-SA 4.0