Back to open data
NLP & Language
Darija-NLP-Resources — MoroccoAI
About
Curated collection of resources and repositories for Natural Language Processing tasks specific to Darija, the Moroccan Arabic dialect. Includes datasets, models, tools, and research. Useful for researchers and developers working with Moroccan Arabic NLP.
https://github.com/MoroccoAI/Arabic-Darija-NLP-Resources
Visit WebsiteIn the same category
Goud-sum (HuggingFace) — Darija Summarization Dataset
158k articles + headlines from Goud.ma — Darija/MSA text summarization dataset
Darija Open Dataset (DODa)
100k+ darija↔English entries — largest open source Darija translation dataset
MA_Open_Datasets — Goud.ma
Goud news articles in CSV format — alternative distribution of Goud data
MA_Open_Datasets — LeMatin
Le Matin newspaper articles by category — nation, economy, culture, sports