Back to open data
NLP & Language
Offensive-Darija-Detection — a-ibrahimi
About
Moroccan Darija Offensive Language Detection Dataset. A human-labeled dataset consisting of Moroccan Darija sentences for offensive content detection. Useful for developing offensive content detection models in Moroccan Darija.
https://github.com/a-ibrahimi/Moroccan-Darija-Offensive-Language-Detection-Dataset
Visit WebsiteIn the same category
Goud-sum (HuggingFace) — Darija Summarization Dataset
158k articles + headlines from Goud.ma — Darija/MSA text summarization dataset
Darija Open Dataset (DODa)
100k+ darija↔English entries — largest open source Darija translation dataset
MA_Open_Datasets — Goud.ma
Goud news articles in CSV format — alternative distribution of Goud data
MA_Open_Datasets — LeMatin
Le Matin newspaper articles by category — nation, economy, culture, sports