Shlok Gilda, Karsten Martiny, Justin Ho,
Laura Tinnel, Grit Denker, and Bonnie J. Dorr
University of Florida & SRI International
π― Goal
Construction of a Knowledge Graph from CTI reports by translating natural language inputs to formal knowledge
Example Input1
"Sandworm was first observed in the victim's environment in June 2022, when the actor deployed the Neo-REGEORG webshell..."
Knowledge Graph
Entities connected by relationships
Sandworm
Actor
uses
Neo-REGEORG
Software
Enables temporal analysis & pattern clustering
The Problem with LLMs
LLMs are unreliable for CTI and hallucinate, creating dangerous misrepresentations.2
Performance degrades on complex reports
Inconsistent outputs across runs
High confidence in wrong answers
1 Google/Mandiant. (2022). Sandworm Disrupts Power in Ukraine Using a Novel Attack Against Operational Technology.
2 Mezzi E., at al. (2025). Large Language Models are Unreliable for Cyber Threat Intelligence. Springer.
The Cyber Behavior Pattern Extractor (CBPE) is a neuro-symbolic framework that generates validated, temporally aware knowledge graphs from multi-modal CTI sources
Multi-modal pipeline for transparent and reliable KG construction that implements the scaffolding paradigm.1
Input
Unstructured reports
(text + visuals)
Output
Validated
Knowledge Graph
LLM extraction + Schema validation
Scaffolding: Robust structure around LLMs to ensure reliability through constraints and validation
Two-stage validation loop corrects hallucinations without reliance on pre-existing trusted KGs.2
Syntactic Validation
+
Semantic Validation
Two-stage loop: Schema conformance, source verification, hallucination correction
Extracts time-ordered TTP patterns to construct attacker playbooks for sequential prediction.
Capture temporal relationships
Graph structure enables: Path queries, pattern clustering, sequential prediction
1 Mezzi, E., et al. (2025). Large Language Models are Unreliable for Cyber Threat Intelligence. Springer.
2 Wu, Z., et al. (2024). KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment. arXiv.
The globally recognized knowledge base of adversary tactics, techniques, and procedures (TTPs) based on real-world observations.1
Standard framework for describing cyberattacks
Used by security teams worldwide
Provides common language for threat intelligence
TACTIC
TECHNIQUE
SUB-TECHNIQUE
Initial Access
Phishing
Spearphishing Attachment
(T1566.001)
CBPE extends the MITRE ATT&CK ontology to capture temporal attack sequences, creating validated knowledge graphs that model dynamic adversarial behavior over time.
We ingest and process the cited threat reports referenced by MITRE with our framework to enrich the knowledge base.
1 MITRE ATT&CK. (2025). https://attack.mitre.org
Multi-modal preprocessing
LLM creates CSTs
Two-stage verification
Add validated data
Feedback Loop
Text
"Sandworm deployed Neo-REGEORG webshell..."
πΌοΈ Visual + Vision Language Model (VLM) Output
VLM Output
"Four-step attack: ISO mounted β n.bat runs β Scilc.exe creates s1.txt β RTU via IEC"
What is a Concrete Syntax Tree? A structured, human-readable representation that captures entities, relationships, and temporal information in a format ready for validation and knowledge graph construction.
π Input Text
"Sandworm deployed GOGETTER roughly one month later..."
EventEntity(
actors: ["Sandworm"],
software: ["GOGETTER"],
temporal_descriptor:
"one month later"
)
Fully Automated: The entire validation loop runs without human intervention
Original
"Sandworm deployed GOGETTER"
CST
EventEntity(
actors: ["Sandworm"],
software: ["GOGETTER"]
)
Syntactic Validation
Structure & types correct
Semantic Validation
Meaning preserved
Validated CST
EventEntity(
actors: ["Sandworm"],
software: ["GOGETTER"]
)
Enriched KG
USES_SOFTWARE(
actor: "Sandworm",
software: "GOGETTER"
)
PDF, HTML, MITRE data
Sentences with section context
w.r.t. given schema
Candidate CST
pass
Valid KG instances
Validated knowledge
fail
Key Innovation: Automated validation loop ensures data fidelity by detecting and correcting LLM hallucinations through iterative feedback
CBPE ingests reports and preserves document structure (chapters, sections, paragraphs) for coherent context.
Text Input:1
"Sandworm was first observed in the victim's environment in June 2022, when the actor deployed the Neo-REGEORG webshell on an internet-facing server."
Vision-Language Models (VLMs) extract information from diagrams, timelines, and screenshots.
Visual Input (Attack Chain):2
VLM Output Example
"A four-step attack chain is depicted. First, an ISO file (βa.isoβ) is mounted on a MicroSCADA server, leading to the execution of βn.batβ. Second, βn.batβ executes βScilc.exeβ, which is installed on the server. Third, βScilc.exeβ creates a file βs1.txtβ. Finally, the server communicates with an RTU using IEC-104/101 protocols."
1 Google/Mandiant. (2022). Sandworm Disrupts Power in Ukraine Using a Novel Attack Against Operational Technology.
The LLM transforms unstructured text into Concrete Syntax Trees (CSTs) guided by a formal schema.
Concrete Syntax Trees are parse trees defined through a formal grammar that we use as a knowledge unit to formally capture the full semantic content of inputs on the level of individual input sentences.
CSTs bridge the gap between raw text and knowledge graphs. They preserve meaning of individual sentences in a structured form.
The CST captures entities (actors, software), actions (deployed), temporal context (June 2022), and relationships in a structured format ready for validation.
Note: This is a domain-specific modeling choice
"Sandworm was first observed in June 2022, when the actor deployed the Neo-REGEORG webshell. Roughly one month later, Sandworm deployed GOGETTER..."
ReportChunkCST(
actors: ["Sandworm"],
timeline: [
EventEntity(
id: "event_1",
temporal_descriptor: "June 2022",
actors: ["Sandworm"],
software: ["Neo-REGEORG"]
),
EventEntity(
id: "event_2",
temporal_descriptor:
"Roughly one month later",
preceding_event_ids: ["event_1"],
actors: ["Sandworm"],
software: ["GOGETTER"]
)
]
)
Is the CST structurally sound and typed correctly?
Is the CST factually accurate compared to the source?
Each field in the CST is checked for conformance to the schema's data types.
Example:
Schema requires: sequence_index: int
β Invalid CST:
{"sequence_index": "second"}
β Valid CST:
{"sequence_index": 2}
The system automatically provides specific feedback to the LLM for correction when malformed data is detected.
Common errors:
Missing required fields
Event missing "actors" field
Incorrect data types
Expected integer, got string
Invalid temporal formats
"June" instead of "2022-06-01"
The LLM converts the validated CST back into natural language text.
Example: CST β "Sandworm deployed GOGETTER."
An evaluator LLM checks if the reformulated text is semantically equivalent to the original source.
Example: Compare "Sandworm deployed GOGETTER" with original text.
Why use an LLM to validate an LLM? Extraction is complex, while equivalence checking is a constrained comparison task where LLMs perform reliably.1
1 Y. Hayashi (2025). Evaluating LLMs' Capability to Identify Lexical Semantic Equivalence. COLING.
"...Sandworm deployed GOGETTER..."
Let's see what the LLM extracts
{
"event_2": {
"actors": [{"name": "GOGETTER"}],
"software": [{"name": "Sandworm"}],
"actions": [{"name": "deployed"}]
}
}
"GOGETTER deployed Sandworm."
β Semantic Validation: FAILED
Reason: Original text states 'Sandworm deployed GOGETTER', but the CST claims 'GOGETTER deployed Sandworm'.
β³ FEEDBACK SENT TO LLM
System generates corrective feedback automatically
{
"event_2": {
"actors": [{"name": "Sandworm"}],
"software": [{"name": "GOGETTER"}],
"actions": [{"name": "deployed"}]
}
}
"Sandworm deployed GOGETTER."
β Semantic Validation: PASSED
The reformulated text is semantically equivalent to the source. The data is validated and ready for the Knowledge Graph.
Validated CSTs enrich the final KG. This step performs entity resolution, merging duplicates like "Sandworm" and "APT44".
Validated CST for GOGETTER
USES_SOFTWARE(Sandworm, GOGETTER)
Sandworm β USES_SOFTWARE β CaddyWiper
Sandworm β USES_SOFTWARE β CaddyWiper
Sandworm β USES_SOFTWARE β GOGETTER NEW
Sandworm β USES_SOFTWARE β CaddyWiper
Sandworm β USES_SOFTWARE β GOGETTER
Sandworm β USES_SOFTWARE β GOGETTER β USES_LIBRARY β Yamux NEW DETAIL
The order of events in a report is crucial for capturing the sequence of adversarial actions for predictive modeling.
Keywords enhance the clarity of attack timelines, providing insight into campaign phases.
Example: "...observed in June 2022... Roughly one month later, Sandworm deployed GOGETTER..."
"Sandworm was first observed in the victim's environment in June 2022, when the actor deployed the Neo-REGEORG webshell. Roughly one month later, Sandworm deployed GOGETTER, which proxies communications for its C2 server using Yamux over TLS."
By clustering attack timelines, CBPE identifies recurring TTP sequences, enabling defenders to anticipate an adversary's next move.
Key Insight: Detect patterns early to predict and preempt later stages before they unfold.
1. Reliable CTI automation requires robust scaffolding around LLMs, not just scaling models alone.
2. CBPE's automated validation loop detects and corrects hallucinations without requiring pre-existing trusted knowledge graphs.
3. Temporal modeling enables dynamic attacker playbooks that support predictive defense strategies.
4. The validated KG serves as a dynamic repository for confirming known behaviors and assimilating novel threat intelligence.
GTA3 Workshop @ ICDM 2025