← Articles

OpenClaw: Stripping Leaked Model Control Tokens

OpenClaw Security Multi-Provider March 10, 2026 · George Zhang

Source: Commit 309162f

The Change

George Zhang lands commit 309162f to strip leaked model control tokens from user-facing text. Models like GLM-5 and DeepSeek occasionally emit internal delimiter tokens in their responses — tokens that users should never see. This fix adds generic pattern matching in OpenClaw's text extraction pipeline to sanitize these artifacts before display.

Author Context

George Zhang (Tengji Zhang) is a new addition to the OpenClaw maintainer roster, added in an accompanying PR (#42190) the same day. His contributions focus on internationalization and multi-provider compatibility — essential as OpenClaw expands beyond its original Anthropic-focused roots to support a growing list of model providers.

Why This Matters

Control tokens are the hidden scaffolding that models use to structure their outputs. Think of them like HTML tags but for model internals: <|assistant|>, <|endoftext|>, or provider-specific delimiters. They're supposed to be processed and stripped before the response reaches users.

When models leak these tokens, users see confusing artifacts:

"Here's your summary <|system|> of the document..."

This damages trust and creates confusion. The fix implements defensive stripping regardless of which model produced the leakage.

Multi-Provider Reality

OpenClaw supports dozens of model providers. Each has different control token formats. A fix that only handles one provider's tokens misses the point — the pattern matching must be generic enough to catch artifacts from GLM, DeepSeek, Minimax, and others.

Technical Approach

The commit description notes this follows "the same architecture as stripMinimaxToolCallXml" — an existing pattern in OpenClaw's codebase for sanitizing provider-specific artifacts. The approach:

Pipeline placement: Stripping happens in the text extraction phase, after model response but before user display
Generic patterns: Rather than provider-specific rules, uses pattern matching that catches common control token formats
Defensive design: Even if a model rarely leaks tokens, the protection is always active

The Bigger Picture

This commit closes issue #40020 and supersedes an earlier attempt (#40573). The issue was open for months, suggesting this is a recurring pain point that required careful design to solve properly.

OpenClaw's multi-provider support means encountering edge cases that single-provider tools never see. GLM-5 and DeepSeek have different tokenization approaches than Anthropic or OpenAI models — quirks that only surface when you're actually routing traffic through them.

Related Activity Today

George Zhang's maintainer addition and this security fix are part of a busy day for OpenClaw:

Sessions spawn resume: ACP session resume support via resumeSessionId
Provider cooldown fix: Avoid duplicate probes during fallback runs
Mattermost improvements: Markdown formatting preservation, DM media uploads
MS Teams fix: Bot Framework compatibility for General channel conversations

The pattern: incremental robustness improvements across OpenClaw's expanding provider and platform matrix.

Implications

For OpenClaw users:

GLM-5/DeepSeek users: Should see cleaner outputs immediately after updating
Multi-provider setups: Reduced risk of control token artifacts regardless of model
UX improvement: One less source of confusing output in conversations