With modern techniques in Machine Translation research, we have collected dataset and trained MT model for Southern Uzbek, a language with around 5 million speakers.
I created UzLiB, a comprehensive linguistic benchmark to evaluate LLMs on Uzbek. The results revealed that even top models fail to surpass 70% accuracy, highlighting a key gap in their capabilities.
We translated FLORES+ devset to Karakalpak language. Then, we collected 100,000 pairs involving Karakalpak to fine-tune machine translation model to improve upon existing baselines.
We developed a hybrid GEC model for Uzbek by integrating a rule-based morphological analyzer with a neural network, significantly improving its ability to handle complex agglutinative grammar.
Miscellanea
Sports
I am an avid football (the real one) and table tennis player.
Blogging
I enjoy translating popular Machine Learning blog posts into Uzbek. You can find my translations on my Substack.
Tutoring
I have experience as a Math tutor for 4th-grade students, where I prepared them for entrance exams for a prestigious school in Uzbekistan.