- Tokenization and Bag of Words Creation
- Data Preprocessing
- Neural Network Model
- Training the Model
- Model Evaluation
- Telegram Bot Integration
- Make it more inteligent and flexible
- intents.json ↓
- Read patterns & tags ↓
- Khmer Tokenization (
khmernltk.word_tokenize) ↓ - Build Vocabulary (
all_words) ↓ - Bag of Words Encoding ↓
- Create Training Data
X_train(features)y_train(labels) ↓
- PyTorch Dataset & DataLoader ↓
- Neural Network Model (
SimpleNet) ↓ - Training Loop
- Forward pass
- Loss computation
- Backpropagation
- Weight update ↓
- Save Model Weights & Metadata (
data.pth)
pip install numpypip install khmer-nltkpip install python-telegram-botpip install torch[Official Blog Post Website Url]
To get the quick way of using this code package, please install requirement.txt and read the comment in code header before run that code script.
- Check folder directory
- create virtual environment with "myEnv" folder
- install with requrements version module
- Check Bot Api
- Check Bot Username
- Make sure u aready istall library I mentioned. (if u have any issue, U can contact me by Telegram Directly @SOYTET)
Python ./App/ChatApp.pyWhile Running ChatApp.py Script, so u can make some chat with Telegram Bot which u integrate with your own
សួស្ដីបង@misc{Chatbot,
author = {SOY TET},
title = {Khmer Telegram Traditional Chatbot},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository}
}- stopes: A library for preparing data for machine translation research
- LASER Language-Agnostic SEntence Representations
- Pretrained Models and Evaluation Data for the Khmer Language
- Multilingual Open Text 1.0: Public Domain News in 44 Languages
- ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System
- Shared Task on Cross-lingual Open-Retrieval QA
- No Language Left Behind: Scaling Human-Centered Machine Translation
- Wordless
- A Simple and Fast Strategy for Handling Rare Words in Neural Machine Translation

