Building bilingual AI agents for the Gulf — Arabic NLP gotchas. — insights

Six weeks into a brokerage agent build, a native-speaker tester sent us one line of feedback that reshaped the project: 'It answers like a news anchor.' Grammatically perfect, dialectally alien. Buyers were writing Khaleeji and Egyptian; our test set was Modern Standard Arabic.

The common belief

'The model supports Arabic' — true, and dangerously incomplete. Benchmarks skew MSA; real customers text in dialect, switch to English mid-sentence, transliterate into Arabizi (3 for ع, 7 for ح), and expect the register of a helpful local, not a broadcast.

If your Arabic test set is MSA, you've tested a language your customers don't text in.

What actually bites

Dialect register: Khaleeji, Egyptian, and Levantine differ enough that one 'Arabic prompt' can read formal in one and odd in another. We maintain dialect-specific few-shot examples and test each with native speakers before launch.

Code-switching and Arabizi: real threads mix scripts and languages mid-sentence. The intake layer must detect and follow the customer's mix rather than forcing a lane.

RTL is product work, not CSS work: bidirectional text with embedded English brand names, numerals, and URLs breaks naive rendering. Budget design time, not a stylesheet pass.

Evaluation needs humans: automated metrics miss register entirely. Our launch gate includes native-speaker review across the dialects the client's traffic actually shows — we re-tested two prompts post-launch on one project because they read 'too formal' to Khaleeji speakers. Small thing; mattered.

When MSA-first is fine

Government, legal, and formal-document contexts expect MSA — there, dialect tuning is wasted effort. Match register to channel: WhatsApp is dialect country; an official portal isn't.

The takeaway

Build dialect test sets from real traffic, not benchmarks.
Handle Arabizi and code-switching in the intake layer.
Treat RTL as design scope.
Gate launch on native-speaker review.

Build a properly bilingual agent →

Keep reading

RAG is not a strategy. Here's what is.From WhatsApp to MCP — the new agent integration stack.