Before It Persists: Write-Time Defense for Multimodal Agent Memory

SAGE-Mem architecture: write-time admission, belief promotion, and provenance-aware retrieval — **SAGE-Mem** is a governed memory layer between heterogeneous observation sources and a downstream planner. It controls *what gets written* (write-time admission gate), *what is promoted* from evidence into durable belief (sufficiency gate), and *what retrieved memory may support reasoning* (provenance-aware retrieval).

Abstract

Persistent memory makes multimodal agents more capable, but it also creates a new attack surface: once unsupported content is written into memory, later retrieval and consolidation can reuse it as if it were reliable state. We introduce SAGE-Mem, a write-time memory layer that separates transient evidence from durable belief — observations may enter evidence, but they are promoted to belief only when sufficiently supported, independent, and non-conflicting.

On the long-horizon LoCoMo-Adv benchmark, SAGE-Mem reduces write admission from 1.000 (retrieval-time baseline) to 0.004 and retrieval contamination from 0.158 to a 95% rule-of-three upper bound of ≤ 0.002. On the broader five-attack MM-BrowseComp-Adv suite, BrowseGuard-Extended reduces Write ASR from 0.255 to 0.037 and Retrieval ASR from 0.564 to 0.369.

For persistent-memory agents, robustness should be evaluated not only at retrieval, but also at the point where observations become persistent state.

Key Contributions

Write-time admission gate. A semantic guard classifies every incoming observation (DATA / DIRECTIVE / METADATA) and blocks unsupported or injected content at the memory boundary — before it can ever be stored.
Evidence → belief promotion. Observations enter as transient evidence and are promoted to durable belief only when sufficiently supported, independent, and non-conflicting — with dependent multimodal signals (e.g. OCR + caption of the same image) not counted as independent corroboration.
Provenance-aware retrieval. A channel-trust floor, partition multiplier, and conflict gate decide which retrieved memory is allowed to support downstream reasoning.
Two adversarial benchmarks. LoCoMo-Adv (long-horizon multimodal memory poisoning) and MM-BrowseComp-Adv (five-attack multimodal browsing suite) — released to evaluate robustness at the write boundary, not only at retrieval.

Adversarial Benchmarks

Headline Results

All numbers are 3-seed means from frozen analysis artifacts. ≤ x denotes a 95% rule-of-three upper bound for 0 observed events at the evaluation budget — we do not claim exact zeros.

LoCoMo-Adv — long-horizon multimodal memory poisoning

Method	BCU ↑	Write ASR ↓	Retrieval ASR ↓
MMA (retrieval-time baseline)	0.655	1.000	0.158
RSum	0.410	0.004	0.008
SAGE-Mem (ours)	0.418	0.004	≤ 0.002

MM-BrowseComp-Adv — multimodal browsing, five-attack suite

Method	BCU ↑	Write ASR ↓	Retrieval ASR ↓
MMA (baseline)	0.000	1.000	1.000
SAGE-Mem	0.198	0.255	0.564
BrowseGuard-Extended (ours)	0.095	0.037	0.369

MM-BrowseComp-Adv — narrower 2-attack answer-overwrite specialization

Method	BCU ↑	Write ASR ↓	Retrieval ASR ↓
MMA	0.000	1.000	1.000
SAGE-Mem	0.175	0.645	0.552
SAGE-Mem-Browse (provenance prior only)	0.021	0.333	0.830
BrowseGuard (ours)	0.153	≤ 0.003	≤ 0.002

Takeaway: the write boundary, not retrieval-time filtering alone, is what keeps contamination from hardening into persistent state. Reported Retrieval ASR reflects the combined SAGE-Mem stack (write-time admission + belief promotion + provenance-aware retrieval).

BibTeX

@inproceedings{pandey2026sagemem,
  title     = {Before It Persists: Write-Time Defense for Multimodal Agent Memory},
  author    = {Pandey, Agam},
  booktitle = {ICML 2026 Workshop on SCALE},
  year      = {2026},
  url       = {https://github.com/AGAMPANDEYY/sage-mem}
}