Skip to content
Apptology
AI · 7 min

Building bilingual AI agents for the Gulf — Arabic NLP gotchas.

Your customers don't write Modern Standard Arabic. Test for the Arabic they actually type.

Apptology · 26 February 2026

Six weeks into a brokerage agent build, a native-speaker tester sent us one line of feedback that reshaped the project: 'It answers like a news anchor.' Grammatically perfect, dialectally alien. Buyers were writing Khaleeji and Egyptian; our test set was Modern Standard Arabic.

The common belief

'The model supports Arabic' — true, and dangerously incomplete. Benchmarks skew MSA; real customers text in dialect, switch to English mid-sentence, transliterate into Arabizi (3 for ع, 7 for ح), and expect the register of a helpful local, not a broadcast.

If your Arabic test set is MSA, you've tested a language your customers don't text in.

What actually bites

Dialect register: Khaleeji, Egyptian, and Levantine differ enough that one 'Arabic prompt' can read formal in one and odd in another. We maintain dialect-specific few-shot examples and test each with native speakers before launch.

Code-switching and Arabizi: real threads mix scripts and languages mid-sentence. The intake layer must detect and follow the customer's mix rather than forcing a lane.

RTL is product work, not CSS work: bidirectional text with embedded English brand names, numerals, and URLs breaks naive rendering. Budget design time, not a stylesheet pass.

Evaluation needs humans: automated metrics miss register entirely. Our launch gate includes native-speaker review across the dialects the client's traffic actually shows — we re-tested two prompts post-launch on one project because they read 'too formal' to Khaleeji speakers. Small thing; mattered.

When MSA-first is fine

Government, legal, and formal-document contexts expect MSA — there, dialect tuning is wasted effort. Match register to channel: WhatsApp is dialect country; an official portal isn't.

The takeaway
  • Build dialect test sets from real traffic, not benchmarks.
  • Handle Arabizi and code-switching in the intake layer.
  • Treat RTL as design scope.
  • Gate launch on native-speaker review.
Build a properly bilingual agent →