We pointed Assay at OpenAI, Kubernetes,
Shopify, and Cloudflare.
It found vulnerabilities in all of them.
Verification that doesn't trust the AI it's verifying. Neural claim extraction. Symbolic oracles. Deterministic proof.
RCE via vm.Script sandbox escape
Buffer.from('').constructor.constructor('return process')()SSRF chain across kubeadm discovery endpoints
// RetrieveValidatedConfigInfo — httpsURL passed with no host validation client.Get(httpsURL) // user-controlled, full response consumed
Path traversal to root code execution in GCI mounter
path, _ := filepath.Split(os.Args[0]) // attacker controls argv[0]
rootfsPath := filepath.Join(path, rootfs)
// then: exec.Command("chroot", rootfsPath, "/bin/mount", ...)SSRF + CORS + Open Redirect chain
const remote = requestHeaders.get("X-CF-Remote");
await fetch(switchRemote(url, remote), { ... }) // no assertValidURL() callOAuth tokens written world-readable
writeFileSync(path.join(configPath), TOML.stringify(config), {
encoding: "utf-8", // missing mode option — defaults to 0o644 via umask
});Predictable CSP nonce via Math.random fallback
// Falls back to Math.random() when crypto.getRandomValues() throws return new Uint8Array(16).map(() => (Math.random() * 255) | 0);
Weak OAuth state parameter
const randomString = Math.random().toString(36).substring(2) // Same file correctly uses crypto.getRandomValues() elsewhere
CSRF wildcard origin bypass
// Guard intended to block *.com — but *.com has length 2, bypasses check if (patternParts.length === 1 && patternParts[0] === '**') return false
npx tryassay assess .Run it on your code.Not just big targets. Real developers. Real repos.
Drop a repo. We'll scan it free. Join the community →
Human-built code scores 91. Here's what AI platforms score.
4 platforms verified. 21 bugs found. 0 passed.
How it works
A neurosymbolic verification loop. Neural systems extract claims. Symbolic oracles verify them. The architecture that makes AI output trustworthy.
An LLM reads your code and identifies every implicit claim it makes. “This function handles null input.” “This API returns sorted results.” No regex or AST parser can do this — it requires semantic understanding. This is the neural layer.
A deterministic oracle tests each claim. For code, that means actual test execution in an isolated subprocess. The oracle doesn’t hallucinate. It runs the code and reports what happened. This is the symbolic layer.
When claims fail, the loop feeds oracle evidence back to the LLM for targeted fixes. Neural extraction, symbolic verification, deterministic remediation. Not LLM-as-judge. Not manual review. Structured verification.
Run it
npx tryassay assess .uses: gtsbahamas/hallucination-reversing-system/github-action@mainDrop a GitHub URL and we'll run it for you. Join the community →