Abror Shopulatov

I'm a Machine Learning Engineer at tilmoch.ai, where I mostly work on NLP applications for Turkic languages. At tilmoch.ai, I've worked on Tahrirchi Editor, tilmoch.ai Translator and Misollar.

We also contribute to the open-source Turkic NLP community with the biggest Uzbek text corpora (UzBooks and UzCrawl), the SOTA Uzbek encode models (tahrirchi-bert-base and tahrirchi-bert-small), Karakalpak and Southern Uzbek Machine Translation models and dataset.

At the same time, I am a BSc. in AI student at MBZUAI, where I am digging deeper into fundamentals of AI Engineering.

Email  /  CV  /  Scholar  /  Github  /  Huggingface

profile photo

Research

I'm interested in NLP, Grammatical Error Correction and Machine Translation. Most of my research is on Uzbek and other Turkic languages..

Filling the Gap for Uzbek: Creating Translation Resources for Southern Uzbek
Mukhammadsaid Mamasaidov, Azizullah Aral, Abror Shopulatov, Mironshoh Inomjonov
WMT, 2025
arXiv / model / dataset / code / bibtex

With modern techniques in Machine Translation research, we have collected dataset and trained MT model for Southern Uzbek, a language with around 5 million speakers.

UzLiB - Uzbek Linguistic Benchmark
Abror Shopulatov
In draft, 2025
blogpost (in Uzbek) / blogpost (in English) / benchmark / code / bibtex

I created UzLiB, a comprehensive linguistic benchmark to evaluate LLMs on Uzbek. The results revealed that even top models fail to surpass 70% accuracy, highlighting a key gap in their capabilities.

Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak
Mukhammadsaid Mamasaidov, Abror Shopulatov
WMT, 2024
paper / arXiv / models / dataset / bibtex

We translated FLORES+ devset to Karakalpak language. Then, we collected 100,000 pairs involving Karakalpak to fine-tune machine translation model to improve upon existing baselines.

Grammatical Error Correction (GEC) for Agglutinative Languages: Case of Uzbek
Mukhammadsaid Mamasaidov, Abror Shopulatov
In draft, 2023
project page (in Uzbek) / demo

We developed a hybrid GEC model for Uzbek by integrating a rule-based morphological analyzer with a neural network, significantly improving its ability to handle complex agglutinative grammar.

UzBooks and UzCrawl: the biggest open-sourced Uzbek corpora
Mukhammadsaid Mamasaidov, Abror Shopulatov
2023
UzBooks / UzCrawl

We built and released UzBooks and UzCrawl by processing over 35,000 books and web data. At 36 GB, it is now the largest publicly available, high-quality text corpus for the Uzbek language.

Miscellanea

Sports

I am an avid football (the real one) and table tennis player.

Blogging

I enjoy translating popular Machine Learning blog posts into Uzbek. You can find my translations on my Substack.

Tutoring

I have experience as a Math tutor for 4th-grade students, where I prepared them for entrance exams for a prestigious school in Uzbekistan.

Feel free to steal this website's source code.