Eastern Huasteca Nahuatl for Speech · Modern Standard Nahuatl for Writing

Flor y Canto Nahuatl

In xochitl in cuicatl — The flower, the song

Flor y Canto Nahuatl exists to establish Eastern Huasteca Nahuatl as a teachable spoken standard, Modern Standard Nahuatl as a respected written norm, and a living poetic high register that reconnects Nahuatl speech, literature, song, and public life.

Start Here

What this is · Why it matters · What you can do with it

This is the largest freely available structured dataset of Classical Nahuatl in existence, paired with a formal framework for standardizing the language across speech, writing, and literature. Everything is open. Everything is free. Here's where to start depending on who you are.

For Linguists & Researchers

The dataset you've been looking for doesn't exist anywhere else.

28,709 parsed entries from Siméon's 1885 dictionary in structured JSON. 8,465 Wiktionary lexical rows across four Nahuatl varieties. Two UD treebanks. 55,904 classical examples. Provenance-tracked, license-tagged, queryable. Use it for NLP, typology, historical linguistics, or computational work.

GitHub repository
For Nahuatl Speakers & Learners

Your language now has infrastructure.

A three-layer framework that respects spoken Nahuatl as the foundation, establishes a clear written standard, and creates space for poetry, song, and literature. Governance documents define how the language is handled across registers — so the standard serves speakers, not the other way around.

Read the governance documents
For Developers & Builders

Build on this. That's why it's open.

Structured JSON and JSONL ready for ingestion. CSV exports for quick analysis. A provenance pipeline you can fork. Build a dictionary app, a learning tool, a search engine, a language model — the data is CC BY-SA 3.0 and public domain. No paywall. No API key. Just download it.

Browse the data

The Three-Layer Framework

Speech · Writing · Literature

Spoken Foundation

EHN

Eastern Huasteca Nahuatl. The living spoken base drawn from community speech. All pronunciation, phonology, and conversational register grounded here.

Written Standard

MSN

Modern Standard Nahuatl. The neutral written reference norm for education, publishing, governance, and formal communication. Clear, consistent, teachable.

Poetic Register

MSN-P

The elevated literary register for poetry, song, ceremony, and public oratory. Classical resonance with modern clarity. Where the language creates, not just communicates.

The Data

Open · Structured · Provenance-tracked

28,709
Siméon Dictionary Entries
8,465
Wiktionary Lexical Rows
55,904
Classical Examples
4
Nahuatl Varieties Covered

The first machine-readable dataset of Classical Nahuatl. Parsed from Siméon's 1885 dictionary, Wiktionary across four varieties (Classical, Central, Eastern Huasteca, Highland Puebla), and two Universal Dependencies treebanks. Every entry carries provenance, license tracking, and source confidence scoring.

All data is free and open under CC BY-SA 3.0 / GFDL. Public-domain sources remain public domain.

Governance Documents

Constitutional framework · Version 0.1

Founding Charter
Identity, mission, principles, and governing rules of FCN. Version 0.1, adopted March 25, 2026.
Register Charter
The official register system: ten statuses, domain assignments, conversion principles, publication rules.
Mission Statements
One-sentence, one-paragraph, one-page, and website versions of the FCN mission.
Core Premises
The non-negotiable assumptions and constraints that govern all project decisions.
Success Criteria
Measurable milestones for versions 0.1, 0.5, and 1.0 of the FCN framework.

Resources

Code · Data · Music

GitHub Repository

Source code, parsers, governance documents, and project infrastructure. Includes fcn_source_parsers.py, fcn_legal_ingest.py, and the full pipeline.

S3 Data Bucket

Public data files: Siméon parsed JSON, Kaikki JSONL across four varieties, UD treebanks, classical example bank, and lexical rows.

Music — Sam Itzli

Original compositions in Nahuatl. Worship, poetry, and song in the tradition of in xochitl in cuicatl. New uploads daily.

Siméon Dataset

28,709 structured entries from Rémi Siméon's 1885 Dictionnaire de la langue nahuatl. The first machine-readable version. 6.2 MB JSON.

Loading…