HQData / The Vault Bulletin / N. 01 / Limited Release 04 · 2026

High-grade
trainingcorpora
built by hand.

Datasets that turn general-purpose models into working security practitioners. Every record read, structured, and signed off by a senior operator. Not tutorials. Real reasoning from people who do this for a living.

St. Gallen · Switzerland
[email protected]

Browse the vault↳

§ 01 // the vault

Current volume.

One dataset live. Built for teams training cyber-capable models on real operator reasoning, not tutorial gloss. The founding release ships now in SQLite and JSONL for fine-tuning, evals, and tool-use workflows; additional volumes follow once the first corpus and eval assets are locked.

CTF-ZD-001AVAILABLE

Security Reasoning Corpus

Commercially licensable security reasoning corpus for teams training and evaluating cyber-capable models.

CAT.: OFFENSIVE SECURITY
RECORDS: 10k+
FORMAT: SQLite, JSONL
FROM: from $1,990

↳OPEN FILE

§ 02 // process

How each record earns its place.

Quality is not marketing. Every record clears five stations before it ships. A single failure sends it back to the top.

01 / Source

We pick the well, not the tap.

Known competition sites, public writeup archives, disclosed zero-days, and primary-source repositories. No aggregators, no second-hand paraphrases.

02 / Capture

Raw in, raw preserved.

Originals are stored verbatim alongside structured extractions, so every record has a traceable provenance and can be re-parsed if our schema evolves.

03 / Verify

Read by a person. Every one.

Dedup hashes, spot checks, and a sign-off per record. If a step does not match the claimed outcome, the record is rejected, not softened.

04 / Enrich

Structured, not flattened.

Canonical technique taxonomies, tool inventories, decision traces, and quality scores. The signal gets louder; nothing valuable gets stripped.

05 / Package

Formats you actually train on.

Instruct, completion, agent tool-use, and DPO preference pairs. Delivered as JSONL plus the original SQLite source. Stratified splits included.

§ 03 // specimen

Ten records, unredacted below the line.

Ten examples across pwn, web, crypto, reverse, forensics, kernel, blockchain, stego, and OSINT. The reasoning shape stays consistent; technique, depth, and tools shift. Everything in orange ships in the clear on license.

file: record #01 / 10ctf-zd-001 · pwn · q=0.95

// challenge event: "DownUnderCTF 2020" task: "Return to what's revenge" category: "pwn" difficulty: "hard" quality: 0.95 // solution vulnerability: "Stack buffer overflow via gets() on 40-byte buffer. Seccomp whitelist permits open/read/write; execve blocked." technique_chain: ["buffer-overflow", "rop-libc-leak", "libc-identification", "rop-syscall-chain-open-read-write"] tools_used: ["checksec", "ghidra", "seccomp-tools", "pwntools", "libc-database"] failed_hypotheses: [ "ret2libc system('/bin/sh'): seccomp BPF kills execve (syscall 59); filter confirmed via seccomp-tools dump", "SROP with one sigreturn frame per syscall: ~248 bytes/frame, total ~1000 bytes vs 300-byte pop-gadget chain. Rejected on size", "mmap+mprotect for RWX shellcode: added complexity with a known flag path. Dropped for direct ORW" ] decision_trace: [ "After seccomp-tools showed open/read/write whitelisted, chose ORW ROP chain over SROP or shellcode: textbook bypass, compact, flag path known", "Stored filename and file contents in BSS (bss+0x30 / bss+0x40). No PIE means static addresses, no leak needed" ] exploit_code: | from pwn import * context.binary = elf = ELF('./vuln') libc = ELF('./libc-2.27.so') flag: "DUCTF{...}"

file: record #02 / 10ctf-zd-001 · forensics · q=0.91

// challenge event: "TAMUctf 2026" task: "Phantom 2" category: "forensics" difficulty: "medium" quality: 0.91 // solution vulnerability: "Flag hidden in orphan commit of a public GitHub repo. Main branch does not reference it; reflog + fsck expose it." technique_chain: ["git-forensics", "orphan-commit-discovery", "reflog-walk"] tools_used: ["git", "github-api", "python-requests"] failed_hypotheses: [ "Brute-forcing branch names against GitHub API: rate-limited at 60 req/h without token, low hit-rate", "GitHub code-search API: only indexes default branch HEADs, orphan commits invisible" ] decision_trace: [ "git clone --mirror then git fsck --unreachable, iterating git cat-file --batch over dangling commits until flag regex hits" ] exploit_code: | import subprocess subprocess.run(['git', 'clone', '--mirror', TARGET]) unreachable = subprocess.check_output(['git', '-C', 'repo.git', 'fsck', '--unreachable']) flag: "gigem{...}"

file: record #03 / 10ctf-zd-001 · web · q=0.88

// challenge event: "HackTheBox 2025" task: "Headless" category: "web" difficulty: "medium" quality: 0.88 // solution vulnerability: "Server-side template injection in admin error page. User-Agent header rendered directly through Jinja2 on 500 responses." technique_chain: ["recon", "header-injection", "ssti-jinja2", "python-rce"] tools_used: ["burp", "curl", "python"] failed_hypotheses: [ "Reflected XSS on product-search parameter: CSP enforces nonce-based script-src, output escaped", "SQLi on search endpoint: parameterised, no boolean/time-based delta measurable" ] decision_trace: [ "Probed {{7*7}} in User-Agent against a forced 500: response body returned '49', confirmed Jinja2 SSTI in error template" ] exploit_code: | import requests p = "{{ request.application.__globals__.__builtins__.__import__('os').popen('id').read() }}" r = requests.post(LOGIN_URL, headers={'User-Agent': p}, data=BAD_CREDS) flag: "HTB{...}"

file: record #04 / 10ctf-zd-001 · crypto · q=0.93

// challenge event: "PicoCTF 2024" task: "Smooth Criminal" category: "crypto" difficulty: "hard" quality: 0.93 // solution vulnerability: "ECDSA signing oracle reuses nonce k across two distinct messages. Private key recoverable in closed form." technique_chain: ["nonce-reuse-detection", "ecdsa-private-key-recovery", "signature-forgery"] tools_used: ["sage", "python", "fastecdsa"] failed_hypotheses: [ "Discrete log attack on secp256k1: computationally infeasible inside the competition window" ] decision_trace: [ "Oracle returned identical r for two distinct H(m). Recovered k = (H1 - H2) * (s1 - s2)^-1 mod n, then d = (s*k - H) * r^-1 mod n. Verified by re-signing a probe" ] exploit_code: | from fastecdsa.curve import secp256k1 as C n = C.q k = ((h1 - h2) * pow(s1 - s2, -1, n)) % n flag: "picoCTF{...}"

file: record #05 / 10ctf-zd-001 · rev · q=0.96

// challenge event: "Midnight Sun CTF 2024" task: "Kiwi" category: "rev" difficulty: "insane" quality: 0.96 // solution vulnerability: "Custom stack-based VM with obfuscated opcode dispatch. Flag check is a bytecode routine combining XOR and bit rotation." technique_chain: ["vm-opcode-recovery", "dispatch-table-extraction", "symbolic-execution"] tools_used: ["ghidra", "angr", "python", "unicorn"] failed_hypotheses: [ "XOR key brute up to 4 bytes: keyspace too large, no structural match against captured output", "Ghidra native decompilation on the dispatcher: computed jump table produced noise, not recoverable pseudo-code", "Pattern-match against known CTF VM fingerprints: custom ISA, no signature overlap" ] decision_trace: [ "Extracted dispatch table via Ghidra xref on the opcode-fetch site, then ran each opcode through Unicorn with instrumented hooks to record register/memory deltas, reconstructing the ISA semantically", "Solved the final flag-check constraint symbolically with z3 after extracting per-byte XOR+rotation expressions" ] exploit_code: | from unicorn import Uc, UC_ARCH_X86, UC_MODE_64 from z3 import BitVec, Solver, RotateLeft mu = Uc(UC_ARCH_X86, UC_MODE_64); mu.mem_map(BASE, 0x4000) flag: "midnight{...}"

file: record #06 / 10ctf-zd-001 · pwn · q=0.92

// challenge event: "SECCON 2024" task: "Backpack" category: "pwn" difficulty: "hard" quality: 0.92 // solution vulnerability: "glibc 2.36 tcache poisoning via UAF. Struct contains a function pointer; controlled next-ptr overwrite lands on __free_hook." technique_chain: ["uaf", "tcache-poisoning", "free-hook-overwrite", "one-gadget"] tools_used: ["pwndbg", "pwntools", "one_gadget", "ghidra"] failed_hypotheses: [ "Largebin attack: required at least 3 pre-UAF allocations, only 2 menu options reachable", "Fastbin dup: tcache reuse path disabled, fastbin consolidation not triggerable", "Unsorted bin split for libc leak: top-chunk protection prevented split, no viable read" ] decision_trace: [ "Single post-free UAF on a struct holding a callback pointer. glibc 2.36 tcache keys are bypassable by poisoning the next-ptr field after heap feng shui. Pointed next-ptr at __free_hook, wrote one_gadget, triggered free() for shell" ] exploit_code: | from pwn import * io = remote(HOST, PORT) create(0, 0x60); free(0); edit(0, p64(libc.sym.__free_hook ^ key)) flag: "SECCON{...}"

file: record #07 / 10ctf-zd-001 · blockchain · q=0.90

// challenge event: "DefCon Qual 2025" task: "Quinine" category: "blockchain" difficulty: "hard" quality: 0.90 // solution vulnerability: "Solidity withdraw() executes an external call before updating the sender's balance. Reentrancy drains the pool across recursive calls." technique_chain: ["reentrancy", "checks-effects-interactions-violation", "attacker-contract-deploy"] tools_used: ["foundry", "slither", "anvil"] failed_hypotheses: [ "Stake accounting overflow: SafeMath enforced across every arithmetic op, no wrap-around", "Oracle manipulation via local AMM: target reads Chainlink aggregator, no tamper path in local anvil fork" ] decision_trace: [ "slither --detect reentrancy-eth flagged withdraw() pre-deploy. Deployed attacker contract whose receive() recursively recalls withdraw() until pool == 0. Verified in anvil fork with a 0.01 ETH seed" ] exploit_code: | contract Drain { Target t; constructor(address a) { t = Target(a); } receive() external payable { if (address(t).balance > 0) t.withdraw(); } flag: "OOO{...}"

file: record #08 / 10ctf-zd-001 · stego · q=0.86

// challenge event: "PlaidCTF 2024" task: "Dimension Rift" category: "stego" difficulty: "medium" quality: 0.86 // solution vulnerability: "LSB-encoded payload across the alpha channel of a PNG. Row order scrambled by a key embedded in the EXIF timestamp seconds field." technique_chain: ["lsb-extraction", "exif-metadata-pivot", "row-permutation-recovery"] tools_used: ["zsteg", "exiftool", "python-pillow"] failed_hypotheses: [ "Generic LSB across RGB channels via zsteg: output was structured noise, no flag bytes", "steghide brute with rockyou.txt: passphrase not in dictionary, signature byte absent from file" ] decision_trace: [ "EXIF DateTimeOriginal carried a seconds value of 47, out of the normal capture distribution. Treated it as a numpy.random seed to derive a row permutation; unshuffled the alpha-channel LSBs and reassembled the payload" ] exploit_code: | from PIL import Image; import numpy as np, exifread img = np.array(Image.open('chal.png')) seed = int(exifread.process_file(open('chal.png','rb'))['EXIF DateTimeOriginal'].values[-2:]) flag: "PCTF{...}"

file: record #09 / 10ctf-zd-001 · kernel · q=0.97

// challenge event: "RealWorldCTF 2025" task: "KubeSploit" category: "kernel" difficulty: "insane" quality: 0.97 // solution vulnerability: "io_uring SQE submission race in Linux 6.8. Concurrent close() and recvmsg() yields a use-after-free on a socket struct refcount." technique_chain: ["io-uring-race", "socket-uaf", "modprobe_path-overwrite", "root-shell"] tools_used: ["syzkaller", "kvm", "gdb-kernel", "libio_uring"] failed_hypotheses: [ "pipe_buffer spray post-free: slab sits under SLAB_TYPESAFE_BY_RCU, not reclaimable that fast", "shm_file_data spray: allocation size mismatch (0x280 vs 0x100 victim), no overlap", "msg_msg spray: randomize_kmalloc_cache on 6.8 builds breaks deterministic targeting" ] decision_trace: [ "Poll_list allocations sit in the same kmalloc-256 bucket as the victim. Held the recvmsg SQE window open via userfaultfd stall, raced close(fd), then sprayed poll_list to reclaim the freed slot", "Used controlled write primitive to overwrite modprobe_path to /tmp/p, triggered via an unknown-binfmt ENOEXEC; /tmp/p runs as root on first invocation" ] exploit_code: | struct io_uring ring; io_uring_queue_init(8, &ring, 0); sqe = io_uring_get_sqe(&ring); io_uring_prep_recvmsg(sqe, fd, &msg, 0); uffd = syscall(SYS_userfaultfd, O_CLOEXEC | O_NONBLOCK); flag: "rwctf{...}"

file: record #10 / 10ctf-zd-001 · osint · q=0.84

// challenge event: "ASIS CTF 2025" task: "HighNote" category: "osint" difficulty: "medium" quality: 0.84 // solution vulnerability: "Target username reused on SoundCloud. Track description leaks a Discord handle; Discord profile banner embeds flag in the rightmost pixel column." technique_chain: ["username-pivot", "platform-cross-reference", "image-pixel-extract"] tools_used: ["sherlock", "discord-api", "python-pillow"] failed_hypotheses: [ "LinkedIn username pivot: account private, no posts visible without connection", "Twitter / web.archive.org snapshot: account suspended prior to challenge release, cache purged" ] decision_trace: [ "sherlock surfaced an active SoundCloud handle. Track description contained 'dm me on discord 0x??##1234'. Fetched the Discord banner via CDN, extracted the rightmost pixel column as bytes, reassembled the flag" ] exploit_code: | import requests user = requests.get(f'https://api.soundcloud.com/users/{H}?client_id={CID}').json() banner = requests.get(f'https://cdn.discordapp.com/banners/{UID}/{HASH}.png?size=512') flag: "ASIS{...}"

§ 04 // next step

Start with research.
Close team deals by email.

Research can be bought directly. Commercial and OEM buyers should expect a short human conversation so scope, intended use, and delivery format are aligned before anything expensive gets signed.

Founding release · Research self-serve See pricing and pilot paths From $1,990 Open file → Sample pack, pilot scope, OEM rights [email protected] ↳

High-grade trainingcorpora built by hand.