Flor y Canto Nahuatl exists to establish Eastern Huasteca Nahuatl as a teachable spoken standard, Modern Standard Nahuatl as a respected written norm, and a living poetic high register that reconnects Nahuatl speech, literature, song, and public life.
What this is · Why it matters · What you can do with it
This is the largest freely available structured dataset of Classical Nahuatl in existence, paired with a formal framework for standardizing the language across speech, writing, and literature. Everything is open. Everything is free. Here's where to start depending on who you are.
28,709 parsed entries from Siméon's 1885 dictionary in structured JSON. 8,465 Wiktionary lexical rows across four Nahuatl varieties. Two UD treebanks. 55,904 classical examples. Provenance-tracked, license-tagged, queryable. Use it for NLP, typology, historical linguistics, or computational work.
GitHub repository →A three-layer framework that respects spoken Nahuatl as the foundation, establishes a clear written standard, and creates space for poetry, song, and literature. Governance documents define how the language is handled across registers — so the standard serves speakers, not the other way around.
Read the governance documents →Structured JSON and JSONL ready for ingestion. CSV exports for quick analysis. A provenance pipeline you can fork. Build a dictionary app, a learning tool, a search engine, a language model — the data is CC BY-SA 3.0 and public domain. No paywall. No API key. Just download it.
Browse the data →Speech · Writing · Literature
Eastern Huasteca Nahuatl. The living spoken base drawn from community speech. All pronunciation, phonology, and conversational register grounded here.
Modern Standard Nahuatl. The neutral written reference norm for education, publishing, governance, and formal communication. Clear, consistent, teachable.
The elevated literary register for poetry, song, ceremony, and public oratory. Classical resonance with modern clarity. Where the language creates, not just communicates.
Open · Structured · Provenance-tracked
The first machine-readable dataset of Classical Nahuatl. Parsed from Siméon's 1885 dictionary, Wiktionary across four varieties (Classical, Central, Eastern Huasteca, Highland Puebla), and two Universal Dependencies treebanks. Every entry carries provenance, license tracking, and source confidence scoring.
All data is free and open under CC BY-SA 3.0 / GFDL. Public-domain sources remain public domain.
Constitutional framework · Version 0.1
Code · Data · Music
Source code, parsers, governance documents, and project infrastructure. Includes fcn_source_parsers.py, fcn_legal_ingest.py, and the full pipeline.
Public data files: Siméon parsed JSON, Kaikki JSONL across four varieties, UD treebanks, classical example bank, and lexical rows.
Original compositions in Nahuatl. Worship, poetry, and song in the tradition of in xochitl in cuicatl. New uploads daily.
28,709 structured entries from Rémi Siméon's 1885 Dictionnaire de la langue nahuatl. The first machine-readable version. 6.2 MB JSON.
Loading…