Back to open data
Community Repositories
Moroccan-Darija-Datasets — nainiayoub
About
Comprehensive collection of Moroccan Darija (Darija) datasets categorized by name, data source, region, and size. Contains 13 datasets covering various aspects of Darija NLP, including sentiment analysis, dialect identification, translation, summarization, speech recognition, named entity recognition, text localization, offensive content detection, electricity consumption, news, audio data, names database, rumor detection, and location data.
https://github.com/nainiayoub/moroccan-darija-datasets
Visit WebsiteIn the same category
MA_Open_Datasets — Moroccan NLP Corpora
Complete collection of Moroccan NLP datasets — 6 sub-datasets
Darija-NLP-Resources — MoroccoAI
Curated collection of resources and repositories for Darija NLP tasks
Darija-Dataset-Builder — IlyasFardaouix
Scalable pipeline for building Moroccan Darija NLP datasets for LLM training
Offensive-Darija-Detection — a-ibrahimi
Moroccan Darija Offensive Language Detection Dataset — human-labeled