A collaborative lexical game based on the TLFi dictionary (ATILF XML release)
LexiGraph lets players connect distant words through meaningful intermediate steps. While playing, users collectively build a graph of semantic relations extracted from real dictionary data.
Example challenge:
LIGNIFIER → BOIS → FORÊT → OMBRE → FRAÎCHEUR → GLACE
(to lignify) → (wood) → (forest) → (shadow) → (freshness) → (ice)
Shorter chains = better score.
Every validated relation enriches a lexical knowledge base.
LexiGraph is both a game and a research/teaching tool. It shows how structured lexical resources (XML dictionaries) can be turned into:
- An interactive web application
- A navigable semantic graph
- A crowdsourced linguistic dataset
The game uses the newly released XML version of the TLFi dictionary (Trésor de la Langue Française informatisé).
This could be a good starting point for a master's thesis project in the context of the new master's program on linguistic and digital accessibility (LIIAN) at Université de Lille
LexiGraph/
├── standalone/ # Offline demo (limited vocabulary and challenges)
│ └── index.html
├── server/ # Flask application
│ ├── app.py
│ └── templates/
│ └── index.html # Full interface (91,000+ words)
├── scripts/
│ ├── tlfi_to_json_v2.py # XML → JSON converter (hierarchical)
│ ├── merge_tlfi_json.py # Merges multiple JSON files
│ └── convert_and_merge_tlfi.sh # Full conversion pipeline
├── data/ # Generated data (not included)
│ └── tlfi_sample.json
├── requirements.txt
└── README.md
- Convert the TLFi XML files into JSON
- Install Python dependencies
- Launch a local web server
- Play in your browser
Total time: ~10 minutes (excluding TLFi download)
git clone https://github.com/YOUR_USERNAME/LexiGraph.git
cd LexiGraphOr download and extract the ZIP archive.
Requirements: Python 3.8+
Create and activate a virtual environment (eg. use "lexigraph" instead of "venv"):
# Linux / macOS
python3 -m venv venv
source venv/bin/activate
# Windows
python -m venv venv
venv\Scripts\activateInstall dependencies:
pip install -r requirements.txtThe game requires the ATILF TLFi XML release:
https://www.ortolang.fr/market/lexicons/xml-tlfi/v1
You will get > 80 XML files.
Alternatively: you can use the ready to use samples of the JSON conversion of the current version. But remember: this JSON version is but a tiny sample of the whole TLFi dictionary.
From the project root:
chmod +x scripts/convert_and_merge_tlfi.sh
./scripts/convert_and_merge_tlfi.sh /path/to/tlfi_xml/ data/This script:
- Converts each XML file to JSON (preserving hierarchical sense structure)
- Merges all files into a single dataset
- Produces
data/tlfi_complet.json
Then rename for the server:
mv data/tlfi_complet.json data/tlfi_sample.jsonVerify the conversion:
python -c "import json; d=json.load(open('data/tlfi_sample.json')); print(f'{len(d[\"entries\"])} entries loaded')"cd server
python app.pyYou should see:
✓ Lexicon loaded: 91476 entries
✓ Database: /path/to/LexiGraph/data/LexiGraph.db
✓ Server started on http://localhost:5000
Open your browser:
http://localhost:5000
You can now:
- Generate random challenges
- Build new semantic links
- Explore word neighbors (derivatives, related terms)
- Submit your links
All proposed relations are stored locally in a SQLite database.
Stop the server: Ctrl + C
Restart later:
cd LexiGraph
source venv/bin/activate # Linux/macOS
cd server
python app.pyFor a quick demo without installation, open:
standalone/index.html
This version:
- Works offline
- Requires no installation
- Uses a limited built-in vocabulary (~25 words)
- Does not save player contributions
| Endpoint | Method | Description |
|---|---|---|
/api/challenge |
GET | Generate a new word pair |
/api/word/<word> |
GET | Get word info (definition, derivatives, relations) |
/api/check-relation |
POST | Check if two words are already linked |
/api/submit-chain |
POST | Submit a completed chain |
/api/submit-relation |
POST | Submit a single relation |
/api/export/relations |
GET | Export all collected relations (JSON) |
/api/stats |
GET | Game statistics |
/api/leaderboard |
GET | Player rankings |
The server stores in SQLite:
- Proposed relations: with player ID, timestamp, and vote counts
- Completed chains: full path, score, player
- Aggregated votes: relations confirmed by multiple players gain trust
curl http://localhost:5000/api/export/relations > relations.jsonThis allows linguists to:
- Review proposed relations
- Validate or reject them
- Integrate confirmed relations into lexical resources
A dictionary is not just text.
In XML format, each TLFi entry contains structured linguistic information:
| Element | Example |
|---|---|
| Lemma | LIGNIFIER |
| Part of speech | verbe pronominal |
| Domain | BOT. (botany) |
| Definition | Se transformer en bois |
| Literary examples | With author, title, date |
| Derivatives | LIGNIFIÉ, LIGNIFICATION |
| Etymology | Latin lignum (wood) |
LexiGraph extracts these relations and turns them into a navigable semantic network. The game then lets users discover relations the dictionary doesn't explicitly encode: synonyms, , hypernyms, associations, etc.
-
Some challenges are unsolvable. Lexical graphs have a "small world" structure: dense clusters connected by few bridges. If two words belong to distant clusters, no natural semantic chain exists. But maybe this a feature and not a problem?
-
Relations are untyped. In the current version, the game collects that A→B are related, but not how (synonym? hypernym? association?). Projects like JeuxDeMots use 180+ relation types.
-
Single-user prototype. No authentication, no anti-cheating mechanisms. For production use, the stack would need hardening. Like, a lot.
This project was developed in the context of the LIIAN Master's program (Linguistique Informatique pour l'Inclusion et l'Accessibilité Numérique) at the University of Lille.
It illustrates core skills taught in the program:
| Skill | Application in LexiGraph |
|---|---|
| XML processing | Parsing TLFi structure |
| Python scripting | Conversion pipeline |
| Web APIs | Flask REST endpoints |
| Databases | SQLite storage |
| Data visualization | Graph rendering (Canvas) |
| Lexical semantics | Relation extraction |
| Crowdsourcing | Collaborative annotation |
Lexical data: TLFi: ATILF (CNRS – Université de Lorraine)
Code: MIT License
Developed by Antonio BALVET in the context of the LIIAN Master's program: University of Lille and UMR STL 8163.
Inspired by:
- JeuxDeMots (LIRMM, Montpellier)
- Semantris (Google)
Contributions are welcome! Possible improvements:
- Filter challenges by feasibility (BFS path check)
- Add relation typing (synonym, hypernym, etc.)
- Implement word autocompletion
- Visualize subgraph between start and end words
- Add multilingual support
- WCAG accessibility compliance
Please open an issue to discuss major changes before submitting a PR.