Retour aux données ouvertes
NLP & Langues
Goud-sum (HuggingFace) — Darija Summarization Dataset
À Propos
Goud-sum contains 158,282 article-headline pairs extracted from the Goud.ma news website. Headlines are in Moroccan Darija, articles in Darija, MSA, or code-switched. Tasks: text summarization. Splits: train (139k), validation (9.5k), test (9.5k). Size: 326 MB. Languages: Moroccan Arabic, Modern Standard Arabic. Citation: Issam & Mrini, 3rd Workshop on African NLP, 2022.
https://huggingface.co/datasets/Goud/Goud-sum
Visiter le siteDans la même catégorie
Darija Open Dataset (DODa)
100k+ entries darija↔English — largest open source Darija translation dataset
MA_Open_Datasets — Goud.ma
Goud news articles in CSV format — alternative distribution of Goud data
MA_Open_Datasets — LeMatin
Le Matin newspaper articles by category — nation, économie, culture, sport
MA_Open_Datasets — MoroccoWorldNews
Morocco news articles dataset from MoroccoWorldNews