Abror Shopulatov

I'm a Machine Learning Engineer at tilmoch.ai, where I mostly work on NLP applications for Turkic languages. At tilmoch.ai, I've worked on Tahrirchi Editor, tilmoch.ai Translator and Misollar.

We also contribute to the open-source Turkic NLP community with the biggest Uzbek text corpora (UzBooks and UzCrawl), the SOTA Uzbek encode models (tahrirchi-bert-base and tahrirchi-bert-small), Karakalpak and Southern Uzbek Machine Translation models and dataset.

At the same time, I am a BSc. in AI student at MBZUAI, where I am digging deeper into fundamentals of AI Engineering.

Email / CV / Scholar / Github / Huggingface

Research

I'm interested in NLP, Grammatical Error Correction and Machine Translation. Most of my research is on Uzbek and other Turkic languages..

	Filling the Gap for Uzbek: Creating Translation Resources for Southern Uzbek Mukhammadsaid Mamasaidov, Azizullah Aral, Abror Shopulatov, Mironshoh Inomjonov WMT, 2025 arXiv / model / dataset / code / bibtex With modern techniques in Machine Translation research, we have collected dataset and trained MT model for Southern Uzbek, a language with around 5 million speakers.
	UzLiB - Uzbek Linguistic Benchmark Abror Shopulatov In draft, 2025 blogpost (in Uzbek) / blogpost (in English) / benchmark / code / bibtex I created UzLiB, a comprehensive linguistic benchmark to evaluate LLMs on Uzbek. The results revealed that even top models fail to surpass 70% accuracy, highlighting a key gap in their capabilities.
	Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak Mukhammadsaid Mamasaidov, Abror Shopulatov WMT, 2024 paper / arXiv / models / dataset / bibtex We translated FLORES+ devset to Karakalpak language. Then, we collected 100,000 pairs involving Karakalpak to fine-tune machine translation model to improve upon existing baselines.
	Grammatical Error Correction (GEC) for Agglutinative Languages: Case of Uzbek Mukhammadsaid Mamasaidov, Abror Shopulatov In draft, 2023 project page (in Uzbek) / demo We developed a hybrid GEC model for Uzbek by integrating a rule-based morphological analyzer with a neural network, significantly improving its ability to handle complex agglutinative grammar.
	UzBooks and UzCrawl: the biggest open-sourced Uzbek corpora Mukhammadsaid Mamasaidov, Abror Shopulatov 2023 UzBooks / UzCrawl We built and released UzBooks and UzCrawl by processing over 35,000 books and web data. At 36 GB, it is now the largest publicly available, high-quality text corpus for the Uzbek language.

Miscellanea

Sports	I am an avid football (the real one) and table tennis player.
Blogging	I enjoy translating popular Machine Learning blog posts into Uzbek. You can find my translations on my Substack.
Tutoring	I have experience as a Math tutor for 4th-grade students, where I prepared them for entrance exams for a prestigious school in Uzbekistan.

Feel free to steal this website's source code.

Research

Miscellanea

Sports

Blogging

Tutoring