Mem0 Data Science Group, IIT Roorkee IIT Roorkee
ICML 2026

Before It Persists:
Write-Time Defense for Multimodal Agent Memory

Agam Pandey1,2

1Indian Institute of Technology Roorkee    2Mem0

ICML 2026 · SCALE Workshop

SAGE-Mem architecture: write-time admission, belief promotion, and provenance-aware retrieval
SAGE-Mem is a governed memory layer between heterogeneous observation sources and a downstream planner. It controls what gets written (write-time admission gate), what is promoted from evidence into durable belief (sufficiency gate), and what retrieved memory may support reasoning (provenance-aware retrieval).

Abstract

Persistent memory makes multimodal agents more capable, but it also creates a new attack surface: once unsupported content is written into memory, later retrieval and consolidation can reuse it as if it were reliable state. We introduce SAGE-Mem, a write-time memory layer that separates transient evidence from durable belief — observations may enter evidence, but they are promoted to belief only when sufficiently supported, independent, and non-conflicting.

On the long-horizon LoCoMo-Adv benchmark, SAGE-Mem reduces write admission from 1.000 (retrieval-time baseline) to 0.004 and retrieval contamination from 0.158 to a 95% rule-of-three upper bound of ≤ 0.002. On the broader five-attack MM-BrowseComp-Adv suite, BrowseGuard-Extended reduces Write ASR from 0.255 to 0.037 and Retrieval ASR from 0.564 to 0.369.

For persistent-memory agents, robustness should be evaluated not only at retrieval, but also at the point where observations become persistent state.


Key Contributions


Adversarial Benchmarks

Construction pipeline for the LoCoMo-Adv and MM-BrowseComp-Adv adversarial benchmarks
We extend two established benchmarks with adversarial memory-poisoning attacks. LoCoMo-Adv injects unsupported content across long multimodal conversations to test whether it hardens into persistent belief; MM-BrowseComp-Adv adds a five-attack suite over multimodal browsing traces (browser, OCR, and vision-caption channels). Both are built from frozen, reproducible pipelines checked into the repository.

Headline Results

All numbers are 3-seed means from frozen analysis artifacts. ≤ x denotes a 95% rule-of-three upper bound for 0 observed events at the evaluation budget — we do not claim exact zeros.

LoCoMo-Adv — long-horizon multimodal memory poisoning
MethodBCU ↑Write ASR ↓Retrieval ASR ↓
MMA (retrieval-time baseline)0.6551.0000.158
RSum0.4100.0040.008
SAGE-Mem (ours)0.4180.004≤ 0.002
MM-BrowseComp-Adv — multimodal browsing, five-attack suite
MethodBCU ↑Write ASR ↓Retrieval ASR ↓
MMA (baseline)0.0001.0001.000
SAGE-Mem0.1980.2550.564
BrowseGuard-Extended (ours)0.0950.0370.369
MM-BrowseComp-Adv — narrower 2-attack answer-overwrite specialization
MethodBCU ↑Write ASR ↓Retrieval ASR ↓
MMA0.0001.0001.000
SAGE-Mem0.1750.6450.552
SAGE-Mem-Browse (provenance prior only)0.0210.3330.830
BrowseGuard (ours)0.153≤ 0.003≤ 0.002

Takeaway: the write boundary, not retrieval-time filtering alone, is what keeps contamination from hardening into persistent state. Reported Retrieval ASR reflects the combined SAGE-Mem stack (write-time admission + belief promotion + provenance-aware retrieval).


BibTeX

@inproceedings{pandey2026sagemem,
  title     = {Before It Persists: Write-Time Defense for Multimodal Agent Memory},
  author    = {Pandey, Agam},
  booktitle = {ICML 2026 Workshop on SCALE},
  year      = {2026},
  url       = {https://github.com/AGAMPANDEYY/sage-mem}
}