How to generate exercises

First you'll have to download the Tatoeba dataset. Run
```
./download-data.sh
```
You will find the data under data-tatoeba/ inside this folder.
Download any required sentence pairs you are interested in and place it in data-tatoeba/.
As this sentence pairs are always created on the fly, this step has to happen manually.
Sentence pair data should be named sentences_{SRC_LANG}_{TGT_LANG}.tsv, where SRC_LANG and TGT_LANG are lowercased two-letter language codes. For example sentences_uk_de.tsv.
Run the ./generate_data.ipynb notebook to process raw data into TSVs. Change input variables as needed.
Run the ./import_sql.ipynb notebook to import the generated CSVs into a local SQLite database.
Run the ./generate_exercise_precursors.ipynb notebook to generate exercise precursors.
Run the ./similar_words.ipynb notebook to generate similar words.
Optionally do a manual quality control check over the generated data.
To populate the exercise table which is the main entity in the API server, run
```
python3 populate_exercise_table.py
```
The script expects the following two files to exist (they have to be moved to that folder manually):
1. data-import/exercise-import.tsv: which is the output of the ./generate_exercise_precursors.ipynb notebook
2. data-import/similar-words-import.tsv: which is the output of the ./similar_words.ipynb notebook
Optionally run the ./generate_sentence_audio notebook to generate audio files.

After all those above steps. You should have a ready-to-use tasbkpool.db SQLite DB in the parent folder which will be used by the API server.

Provide feedback