Back to open data
Community Repositories
MA_Open_Datasets — Moroccan NLP Corpora
About
Comprehensive repository of Moroccan NLP datasets by OumaimaHourrane. 6 sub-datasets: Goud.ma, LeMatin, MoroccoWorldNews, YouTube comments, Booking_ma, Jumia.ma. Jupyter Notebook. Ideal for NLP research.
https://github.com/OumaimaHourrane/MA_Open_Datasets
Visit WebsiteIn the same category
Moroccan-Darija-Datasets — nainiayoub
Comprehensive collection of Moroccan Darija datasets — 13 categorized datasets
Darija-NLP-Resources — MoroccoAI
Curated collection of resources and repositories for Darija NLP tasks
Darija-Dataset-Builder — IlyasFardaouix
Scalable pipeline for building Moroccan Darija NLP datasets for LLM training
Offensive-Darija-Detection — a-ibrahimi
Moroccan Darija Offensive Language Detection Dataset — human-labeled