Back to open data
NLP & Language

Darija Open Dataset (DODa)

About

DODa is the largest open source Darija↔English translation dataset on GitHub (CC BY-NC 4.0). 1300+ nouns, 1000+ verbs, 45,000+ sentences, 100,000+ entries total. Subcategories: food, animals, body, health, education. Standard resource for Darija NLP.

https://darija-open-dataset.github.io
Visit Website