Back to open data
NLP & Language
MSAC — Moroccan Arabic Sentiment Analysis Corpus
About
Corpus of 2000 Moroccan Arabic tweets collected from Twitter, manually annotated for sentiment analysis. Released with BDCA 2018 conference (Kenitra, Morocco). ARFF format. Useful for Moroccan dialect sentiment models.
https://github.com/ososs/Arabic-Sentiment-Analysis-corpus
Visit WebsiteIn the same category
Goud-sum (HuggingFace) — Darija Summarization Dataset
158k articles + headlines from Goud.ma — Darija/MSA text summarization dataset
Darija Open Dataset (DODa)
100k+ darija↔English entries — largest open source Darija translation dataset
MA_Open_Datasets — Goud.ma
Goud news articles in CSV format — alternative distribution of Goud data
MA_Open_Datasets — LeMatin
Le Matin newspaper articles by category — nation, economy, culture, sports