OpenClaw Multi-Language Memory Search: Japanese, Spanish, Portuguese Support

Vincent Koc adds Japanese, Spanish, and Portuguese query expansion to OpenClaw's full-text search — enabling memory recall to work properly across languages by filtering language-specific stop words.

About the Contributor

Vincent Koc has been a consistent contributor to OpenClaw's memory subsystem, focusing on making semantic search work reliably across diverse use cases. This internationalization work continues that theme — memory that works in English but fails in Japanese isn't really "memory."

Why This Matters

OpenClaw's memory system uses full-text search (FTS) to recall previous conversations, decisions, and context. When you ask your AI assistant "what did we decide about the project last week?", FTS finds relevant snippets to include in the prompt.

The problem: FTS relies on query expansion — breaking queries into meaningful tokens. In English, you filter out "the", "a", "is". But Japanese, Spanish, and Portuguese have their own stop words that were being included in searches, producing poor recall.

Technical Implementation

Three related commits landed today:

Commit	Language	Change
`21cbf59`	Japanese	Add query expansion support for FTS (#23156)
`35b162a`	Spanish, Portuguese	Add stop words (#23710)
#23717	Arabic	FTS query expansion filtering (pending)

The implementation adds language-specific stop word lists that get filtered during query expansion. For Japanese, this is particularly complex because the language doesn't use spaces between words — requiring different tokenization strategies.

The Bigger Picture: Global AI Assistants

This work reflects a broader pattern: as AI assistants move from English-first demos to global production use, every subsystem needs internationalization. Memory search failing in Japanese means Japanese users get a degraded experience — their assistant forgets things.

OpenClaw's memory architecture uses a combination of:

Full-text search (FTS) for fast keyword matching
Semantic similarity for meaning-based recall
Recency weighting for temporal relevance

All three need to work across languages. FTS is getting fixed now. Semantic similarity already handles multiple languages via embedding models. Recency is language-agnostic.

💡 Implications

Japanese, Spanish, and Portuguese users get dramatically better memory recall
Pattern established for adding more languages (Arabic already pending)
Signals OpenClaw's commitment to global usability, not just English-first
Each language addition is incremental — community can contribute stop word lists

What's Next

Arabic support is already in review (#23717), and the pattern is now established for other languages. The challenge varies by language family:

CJK languages (Chinese, Japanese, Korean): Need special tokenization
Romance languages (Spanish, Portuguese, French, Italian): Similar stop word patterns
RTL languages (Arabic, Hebrew): Both stop words and display considerations

Expect more language additions as the community expands. The infrastructure is now in place.

View Japanese Commit → View Spanish/Portuguese Commit → Back to Repo Pulse →