Abror Shopulatov

I'm a Machine Learning Engineer at tilmoch.ai, where I mostly work on NLP applications for Turkic languages. At tilmoch.ai, I've worked on Tahrirchi Editor, tilmoch.ai Translator and Misollar.

We also contribute to the open-source Turkic NLP community with the biggest Uzbek text corpora (UzBooks and UzCrawl), the SOTA Uzbek encode models (tahrirchi-bert-base and tahrirchi-bert-small), Karakalpak and Southern Uzbek Machine Translation models and dataset.

At the same time, I am a BSc. in AI student at MBZUAI, where I am digging deeper into fundamentals of AI Engineering.

Research

I'm interested in NLP, Grammatical Error Correction and Machine Translation. Most of my research is on Uzbek and other Turkic languages..

	Filling the Gap for Uzbek: Creating Translation Resources for Southern Uzbek Mukhammadsaid Mamasaidov, Azizullah Aral, Abror Shopulatov, Mironshoh Inomjonov WMT, 2025 arXiv / model / dataset / code / bibtex With modern techniques in Machine Translation research, we have collected dataset and trained MT model for Southern Uzbek, a language with around 5 million speakers.
	UzLiB - Uzbek Linguistic Benchmark Abror Shopulatov In draft, 2025 blogpost (in Uzbek) / blogpost (in English) / benchmark / code / bibtex I created UzLiB, a comprehensive linguistic benchmark to evaluate LLMs on Uzbek. The results revealed that even top models fail to surpass 70% accuracy, highlighting a key gap in their capabilities.
	Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak Mukhammadsaid Mamasaidov, Abror Shopulatov WMT, 2024 paper / arXiv / models / dataset / bibtex We translated FLORES+ devset to Karakalpak language. Then, we collected 100,000 pairs involving Karakalpak to fine-tune machine translation model to improve upon existing baselines.
	Grammatical Error Correction (GEC) for Agglutinative Languages: Case of Uzbek Mukhammadsaid Mamasaidov, Abror Shopulatov In draft, 2023 project page (in Uzbek) / demo We developed a hybrid GEC model for Uzbek by integrating a rule-based morphological analyzer with a neural network, significantly improving its ability to handle complex agglutinative grammar.

Sports	I am an avid football (the real one) and table tennis player.
Blogging	I enjoy translating popular Machine Learning blog posts into Uzbek. You can find my translations on my Substack.
Tutoring	I have experience as a Math tutor for 4th-grade students, where I prepared them for entrance exams for a prestigious school in Uzbekistan.

Feel free to steal this website's source code.