ransomware-policy
Resources & Distribution
Ransomware Detection using LLMs
Research project exploring ransomware detection using Large Language Models, with the Active Detective Agent as the primary contribution.
Project Structure
ransomware-policy/
├── active-detective/ # Main contribution: RL-trained investigation agent
│ ├── simulator/ # HostState (FileRegistry + ProcessTable), attack/benign generators
│ ├── tools/ # 11 investigation tools + dual-format parser
│ ├── environment/ # RansomwareDetectionEnv, RLVR reward
│ ├── training/ # GRPO via TRL, scenario generation
│ ├── evaluation/ # Metrics, baselines, ablation
│ └── tests/ # 408 tests
│
├── prompting-only/ # Comparison: zero-shot/few-shot/CoT prompting
│ ├── telemetry/ # Telemetry generation (latent states, Atomic Red Team)
│ ├── prompts/ # Detection prompt templates
│ └── evaluation/ # Prompt-based evaluation
│
├── fine-tuning/ # Comparison: QLoRA fine-tuning (TinyLlama, Phi-2, Mistral)
│ ├── scripts/ # Training and evaluation scripts
│ └── data/ # Training datasets
│
└── docs/ # Research documents and design specs
├── plans/ # Active Detective system design
└── early-research/ # Brainstorming notes and proposals
Three Approaches
1. Active Detective Agent (Main Contribution)
An LLM agent trained via GRPO to actively investigate ransomware by deciding which host evidence to examine. Operates in a POMDP: host state is partially hidden, events are stochastically dropped, and the agent must choose which evidence to seek before rendering a verdict.
cd active-detective
# Run tests
python -m pytest tests/ -q
# Generate training scenarios
python -c "from training.scenarios import generate_training_scenarios, save_scenarios; save_scenarios(generate_training_scenarios(1000), 'scenarios.jsonl')"
# Train (requires GPU + trl + transformers>=5.2.0)
accelerate launch -m training.train_grpo --model Qwen/Qwen3.5-9B --output-dir ./checkpoints --n-episodes 500 --group-size 4
See docs/plans/2026-03-05-active-detective-system-design.md for the full design specification.
2. Prompting-Only (Baseline)
Zero-shot, few-shot, and chain-of-thought prompting against pre-trained LLMs. No training required.
cd prompting-only
python prompts/detection_prompts.py
3. Fine-Tuning (Comparison)
QLoRA fine-tuning of small LLMs on synthetic telemetry with expert annotations.
cd fine-tuning
python scripts/prepare_training_data.py --simple --output train.jsonl
python scripts/finetune_ransomware_llm.py --model tiny --train-data train.jsonl --epochs 3
Key Innovation: Multi-Layer Prediction
The model learns to predict:
- Latent State: What’s really happening (hidden)
- Observable Events: What telemetry comes next
- Causal Understanding: Why events occur
- Risk Assessment: Current threat level
- Temporal Reasoning: Time to impact
- Recommended Actions: What to do
- Uncertainty: Confidence and alternatives
Requirements
pip install -r fine-tuning/scripts/requirements_finetune.txt
For Active Detective training, see active-detective/ for GPU requirements.
License
MIT License — see LICENSE file.
Citation
@software{towell_ransomware_llm_2025,
title={Ransomware Detection using LLMs},
author={Towell, Alexander},
year={2025},
url={https://github.com/queelius/ransomware-policy}
}