UF Logo
Computer & Information Science & Engineering
NLPC Laboratory FINS

GTA3 Workshop @ ICDM 2025

Navigating the Blue Nowhere:
A Framework for Mapping Validated
Adversarial Trajectories

Shlok Gilda, Karsten Martiny, Justin Ho,
Laura Tinnel, Grit Denker, and Bonnie J. Dorr

University of Florida & SRI International

The Challenge: From Chaos to Clarity

🎯 Goal

Construction of a Knowledge Graph from CTI reports by translating natural language inputs to formal knowledge

Example Input1

"Sandworm was first observed in the victim's environment in June 2022, when the actor deployed the Neo-REGEORG webshell..."

↓

Knowledge Graph

Entities connected by relationships

Sandworm

Actor

uses

β†’

Neo-REGEORG

Software

Enables temporal analysis & pattern clustering

The Problem with LLMs

LLMs are unreliable for CTI and hallucinate, creating dangerous misrepresentations.2

⚠️

Performance degrades on complex reports

πŸ”„

Inconsistent outputs across runs

❌

High confidence in wrong answers

1 Google/Mandiant. (2022). Sandworm Disrupts Power in Ukraine Using a Novel Attack Against Operational Technology.

2 Mezzi E., at al. (2025). Large Language Models are Unreliable for Cyber Threat Intelligence. Springer.

Core Contributions

The Cyber Behavior Pattern Extractor (CBPE) is a neuro-symbolic framework that generates validated, temporally aware knowledge graphs from multi-modal CTI sources

πŸ”—

Neuro-Symbolic Pipeline

Multi-modal pipeline for transparent and reliable KG construction that implements the scaffolding paradigm.1

Input

Unstructured reports
(text + visuals)

β†’

Output

Validated
Knowledge Graph

LLM extraction + Schema validation

Scaffolding: Robust structure around LLMs to ensure reliability through constraints and validation

βœ“

Automated Validation

Two-stage validation loop corrects hallucinations without reliance on pre-existing trusted KGs.2

Syntactic Validation

+

Semantic Validation

Two-stage loop: Schema conformance, source verification, hallucination correction

⏱️

Time-Ordered Extraction

Extracts time-ordered TTP patterns to construct attacker playbooks for sequential prediction.

1
β†’
2
β†’
3

Capture temporal relationships

Graph structure enables: Path queries, pattern clustering, sequential prediction

1 Mezzi, E., et al. (2025). Large Language Models are Unreliable for Cyber Threat Intelligence. Springer.

2 Wu, Z., et al. (2024). KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment. arXiv.

The Foundation: MITRE ATT&CK

What is MITRE ATT&CK?

The globally recognized knowledge base of adversary tactics, techniques, and procedures (TTPs) based on real-world observations.1

βœ“

Standard framework for describing cyberattacks

βœ“

Used by security teams worldwide

βœ“

Provides common language for threat intelligence

ATT&CK Hierarchy Example

TACTIC

β†’

TECHNIQUE

β†’

SUB-TECHNIQUE

Initial Access

β†’

Phishing

β†’

Spearphishing Attachment

(T1566.001)

How CBPE Uses MITRE ATT&CK:

CBPE extends the MITRE ATT&CK ontology to capture temporal attack sequences, creating validated knowledge graphs that model dynamic adversarial behavior over time.

We ingest and process the cited threat reports referenced by MITRE with our framework to enrich the knowledge base.

1 MITRE ATT&CK. (2025). https://attack.mitre.org

The CBPE Pipeline

System Architecture

πŸ“š

Threat Intelligence Data

PDF, HTML, MITRE data

β†’
βš™οΈ

Pre-processing

Sentences with section context

β†’
πŸ€–

LLM-based Formalization

w.r.t. given schema

πŸ“‹
Formal Schema
↓
β†’

Candidate CST

βœ“

Validation

pass

β†’
βš™οΈ

Post-processing

Valid KG instances

β†’
πŸ•ΈοΈ

Threat Intelligence KG

Validated knowledge

fail

Key Innovation: Automated validation loop ensures data fidelity by detecting and correcting LLM hallucinations through iterative feedback

Step 1: Multi-Modal Preprocessing

Text Ingestion

CBPE ingests reports and preserves document structure (chapters, sections, paragraphs) for coherent context.

Text Input:1

"Sandworm was first observed in the victim's environment in June 2022, when the actor deployed the Neo-REGEORG webshell on an internet-facing server."

Visual Processing

Vision-Language Models (VLMs) extract information from diagrams, timelines, and screenshots.

Visual Input (Attack Chain):2

Attack Chain Diagram

VLM Output Example

"A four-step attack chain is depicted. First, an ISO file (β€˜a.iso’) is mounted on a MicroSCADA server, leading to the execution of β€˜n.bat’. Second, β€˜n.bat’ executes β€˜Scilc.exe’, which is installed on the server. Third, β€˜Scilc.exe’ creates a file β€˜s1.txt’. Finally, the server communicates with an RTU using IEC-104/101 protocols."

1 Google/Mandiant. (2022). Sandworm Disrupts Power in Ukraine Using a Novel Attack Against Operational Technology.

Pipeline: Formalization

Structuring Unstructured Text

The LLM transforms unstructured text into Concrete Syntax Trees (CSTs) guided by a formal schema.

What is a CST?

Concrete Syntax Trees are parse trees defined through a formal grammar that we use as a knowledge unit to formally capture the full semantic content of inputs on the level of individual input sentences.

Why CSTs?

CSTs bridge the gap between raw text and knowledge graphs. They preserve meaning of individual sentences in a structured form.

Key Features

The CST captures entities (actors, software), actions (deployed), temporal context (June 2022), and relationships in a structured format ready for validation.

Note: This is a domain-specific modeling choice

Source Text Example:

"Sandworm was first observed in June 2022, when the actor deployed the Neo-REGEORG webshell. Roughly one month later, Sandworm deployed GOGETTER..."

Generated CST Structure

ReportChunkCST(
  actors: ["Sandworm"],
  timeline: [
    EventEntity(
      id: "event_1",
      temporal_descriptor: "June 2022",
      actors: ["Sandworm"],
      software: ["Neo-REGEORG"]
    ),
    EventEntity(
      id: "event_2",
      temporal_descriptor: 
        "Roughly one month later",
      preceding_event_ids: ["event_1"],
      actors: ["Sandworm"],
      software: ["GOGETTER"]
    )
  ]
)

Pipeline: Two-Stage Validation

Validation

Stage 1:
Syntactic Validation

Is the CST structurally sound and typed correctly?

Stage 2:
Semantic Validation

Is the CST factually accurate compared to the source?

Stage 1: Syntactic Validation

Type Checking

Each field in the CST is checked for conformance to the schema's data types.

Example:

Schema requires: sequence_index: int

❌ Invalid CST:

{"sequence_index": "second"}

βœ“ Valid CST:

{"sequence_index": 2}

Error Detection

The system automatically provides specific feedback to the LLM for correction when malformed data is detected.

Common errors:

Missing required fields

Event missing "actors" field

Incorrect data types

Expected integer, got string

Invalid temporal formats

"June" instead of "2022-06-01"

Stage 2: Semantic Validation

πŸ”„

Reformulation

The LLM converts the validated CST back into natural language text.

Example: CST β†’ "Sandworm deployed GOGETTER."

βš–οΈ

Comparison

An evaluator LLM checks if the reformulated text is semantically equivalent to the original source.

Example: Compare "Sandworm deployed GOGETTER" with original text.

Why use an LLM to validate an LLM? Extraction is complex, while equivalence checking is a constrained comparison task where LLMs perform reliably.1

1 Y. Hayashi (2025). Evaluating LLMs' Capability to Identify Lexical Semantic Equivalence. COLING.

Validation in Action

"...Sandworm deployed GOGETTER..."

Let's see what the LLM extracts

β†’ Running Initial Extraction

Attempt 1: LLM Extraction

Generated CST

{
  "event_2": {
    "actors": [{"name": "GOGETTER"}],
    "software": [{"name": "Sandworm"}],
    "actions": [{"name": "deployed"}]
  }
}

Reformulation

"GOGETTER deployed Sandworm."

❌ Semantic Validation: FAILED

Reason: Original text states 'Sandworm deployed GOGETTER', but the CST claims 'GOGETTER deployed Sandworm'.

⟳ FEEDBACK SENT TO LLM

System generates corrective feedback automatically

Attempt 2: Corrected Extraction

Generated CST (Corrected)

{
  "event_2": {
    "actors": [{"name": "Sandworm"}],
    "software": [{"name": "GOGETTER"}],
    "actions": [{"name": "deployed"}]
  }
}

Reformulation

"Sandworm deployed GOGETTER."

βœ“ Semantic Validation: PASSED

The reformulated text is semantically equivalent to the source. The data is validated and ready for the Knowledge Graph.

Pipeline: Postprocessing

Enrich the Knowledge Graph

Validated CSTs enrich the final KG. This step performs entity resolution, merging duplicates like "Sandworm" and "APT44".

Example: Add New Fact

Validated CST for GOGETTER
⬇
USES_SOFTWARE(Sandworm, GOGETTER)

Building Intelligence Over Time

MITRE ATT&CK
β†’
Report 1
β†’
Report 2
β†’
Unified KG
Initial State
(from MITRE ATT&CK)
Sandworm β†’ USES_SOFTWARE β†’ CaddyWiper
After Report 1
Sandworm β†’ USES_SOFTWARE β†’ CaddyWiper Sandworm β†’ USES_SOFTWARE β†’ GOGETTER NEW
After Report 2
Sandworm β†’ USES_SOFTWARE β†’ CaddyWiper Sandworm β†’ USES_SOFTWARE β†’ GOGETTER Sandworm β†’ USES_SOFTWARE β†’ GOGETTER β†’ USES_LIBRARY β†’ Yamux NEW DETAIL

Temporal Pattern Extraction

Narrative Flow

The order of events in a report is crucial for capturing the sequence of adversarial actions for predictive modeling.

Temporal Keywords

Keywords enhance the clarity of attack timelines, providing insight into campaign phases.

Example: "...observed in June 2022... Roughly one month later, Sandworm deployed GOGETTER..."

Source Text

"Sandworm was first observed in the victim's environment in June 2022, when the actor deployed the Neo-REGEORG webshell. Roughly one month later, Sandworm deployed GOGETTER, which proxies communications for its C2 server using Yamux over TLS."

β†’

Extracted Timeline

June 2022
Deploy Webshell
Neo-REGEORG
↓
July 2022
Deploy Tunneler
GOGETTER with Yamux

Build "Attacker Playbooks"

By clustering attack timelines, CBPE identifies recurring TTP sequences, enabling defenders to anticipate an adversary's next move.

Key Insight: Detect patterns early to predict and preempt later stages before they unfold.

Example: Ransomware Campaign

1
Spearphishing
(T1566.001)
β†’
2
OS Credential Dumping
(T1003)
β†’
3
Lateral Movement
(T1021)
β†’
4
Data Encrypted
(T1486)

Key Takeaways

1. Reliable CTI automation requires robust scaffolding around LLMs, not just scaling models alone.

2. CBPE's automated validation loop detects and corrects hallucinations without requiring pre-existing trusted knowledge graphs.

3. Temporal modeling enables dynamic attacker playbooks that support predictive defense strategies.

4. The validated KG serves as a dynamic repository for confirming known behaviors and assimilating novel threat intelligence.

Thank You

Contact Information

Shlok Gilda

University of Florida

shlokgilda@ufl.edu

GTA3 Workshop @ ICDM 2025