Highly motivated researcher and software engineer with a PhD in Physical and Mathematical Sciences and 10+ years of experience in academic research, teaching, and full-stack development. Senior Lecturer at Kazan Federal University teaching ML, NLP, and time series forecasting. Research focuses on differential equations, dynamical systems, and applied machine learning. Published in peer-reviewed journals and conferences. Supervised 25+ Master's projects in AI.
EDUCATION
PhD (Candidate of Sciences) in Physical and Mathematical Sciences, Institute of Mathematics, Academy of Sciences of Tajikistan (2012–2016). Specialist Degree in Computer Science & Software Engineering, Tajik National University (2007–2012).
RESEARCH, TEACHING, or OTHER INTERESTS
Applied Mathematics, Artificial Intelligence, Theoretical Computer Science, General Mathematics
7
Scopus Publications
87
Scholar Citations
5
Scholar h-index
Scopus Publications
Character-Level Transformer for Tajik-Persian Transliteration with a Parallel Lexical Corpus Arabov Mullosharaf Kurbonovich Eacl 2026 19th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of the 2nd Workshop on Nlp for Languages Using Arabic Script Abjadnlp 2026, 2026 This study addresses automatic transliteration from Tajik (Cyrillic script) to Persian (Perso-Arabic script).We present a curated, lexicographically verified parallel corpus of 52,152 Tajik-Persian words and short phrases, compiled from printed dictionaries, encyclopedic sources, and manually verified online resources.To the best of our knowledge, this is one of the largest publicly available word-level corpora for Tajik-Persian transliteration.Using this corpus, we train a character-level sequence-to-sequence Transformer model and evaluate it using Character Error Rate (CER) and exact-match accuracy.The Transformer achieves a CER of 0.3216 and an exactmatch accuracy of 0.3133, outperforming both dictionary-based rule-based and recurrent neural baselines.With beam search (k = 3), performance further improves to CER 0.3182 and accuracy 0.3215.We describe the data collection and preprocessing pipeline, model architecture, and experimental protocol, and report a part-of-speech analysis showing performance differences across lexical categories.All preprocessing scripts, deterministic splits into training, validation, and test sets, and training configurations are released to support reproducibility and further research on Tajik and related Persian dialects.The corpus supports research in character-level transliteration, crossscript NLP, and lexicographic applications.
Identification of the Original Author of a Social Media Post Based on Text Analysis, Time Dependencies, and the Structure of Reposts Using Combined Neural Networks Mullosharaf Arabov, Adelya Shaydullina Proceedings 2025 International Conference on Industrial Engineering Applications and Manufacturing Icieam 2025, 2025 This article examines the task of identifying the original author of a post on social networks by analyzing the textual content and structure of reposts. A mathematical model was proposed that describes the relationship between the original posts and their reposts, and an algorithm based on a combination of recurrent neural networks (RNN), convolutional neural networks (CNN) and graph neural networks (GNN) was presented. To train and evaluate the model, a synthetic dataset was created containing 50,000 records generated using the Faker library. The developed model has shown high accuracy (RNN 1.000, CNN - 0.9180, GNN - 0.9080) in identifying authors and can be used to analyze the dissemination of information on social networks, as well as to counter misinformation.
Comparative Analysis of Intelligent Methods for Automatic Anomaly Detection in Industrial and Distributed Systems Based on Machine Learning and Deep Learning Algorithms M. K. Arabov, R. A. Burnashev, O. A. Medvedeva Rusautocon Proceedings of the International Russian Automation Conference, 2025 In modern distributed and industrial systems, an important task is automatic anomaly detection based on system logs. This study compares approaches to model building using machine learning and deep learning techniques. The research is conducted on real-world data consisting of event sequences from various types of information systems, including logs from computing clusters, cloud platforms, and distributed storage systems. Unlike common solutions based on LSTM/GRU networks, a Transformer-based model is proposed that can capture long-term dependencies between events and offers improved interpretability. An evaluation of method performance was carried out using accuracy, recall, and F1-score metrics. The results demonstrate the superiority of the Transformer architecture, especially in handling long sequences and under conditions of limited temporal delay information. The models developed in this work can be applied in the design of automated monitoring systems for industrial information infrastructure. In modern distributed and industrial systems, automatic anomaly detection based on system logs remains a critical challenge.
Comparative Analysis of FCN and U-Net for Retinal Blood Vessels Segmentation: A Performance Evaluation Anis Amirouche, M. K. Arabov Rusautocon Proceedings of the International Russian Automation Conference, 2024 Accurate retinal blood vessels segmentation is essential for early detection and diagnosis of several diseases. This study presents a comparative analysis between U-Net and FCN for retinal blood vessel segmentation. The aim is to assess their performance across four key metrics: sensitivity, accuracy, precision, and specificity, alongside training time. The analysis revealed differences in performance between the two models in terms of segmentation quality and efficiency. U-Net consistently outperforms FCN in terms of accuracy and precision, showcasing its detailed understanding of vessel characteristics and superior segmentation performance. However, FCN demonstrates competitive accuracy and specificity, and shorter training time, suggesting its utility in situations prioritizing computational efficiency.
Comparative Analysis of Ensemble and Linear Machine Learning Models in the Task of House Price Prediction Ihcene Zitoune, M. K. Arabov Rusautocon Proceedings of the International Russian Automation Conference, 2024 In today's evolving real estate market, accurately predicting real estate prices is important given the crucial role real estate plays in any country's economy. With numerous methods and techniques available, navigating the complexities of data types and the multitude of factors influencing prices requires careful consideration. This research paper presents a comparative analysis of ensemble and linear machine learning models in predicting real estate prices using a wide range of predictive models, including linear regression, ridge regression, lasso regression, random forest, decision trees, XGBoost and LightGBM. The study evaluates their effectiveness using metrics such as MAE (mean absolute error), RMSE (RMS error), RMSLE (RMS logarithmic error) and R2 (coefficient of determination) and evaluates their training time. The experimental results showed that ensemble models outperformed linear models with a higher training time. These results can be useful both for researchers in the field of real estate and finance, as well as for investors interested in predicting and optimizing real estate investment strategies in a volatile market.
Algorithm Application of Machine Learning Algorithms to Predict Energy Demand M. K. Arabov, A. F. Nazipova, R. A. Burnashev Proceedings 2024 International Conference on Industrial Engineering Applications and Manufacturing Icieam 2024, 2024 This article presents an analysis of the application of machine learning algorithms to predict energy demand based on temperature and humidity data. Extensive research has been conducted on the effectiveness and accuracy of various machine learning methods in the context of energy demand forecasting, including the support vector machine (SVM) method with various kernels (e.g., ‘rbf’ and ‘poly’) and RandomForestRegressor. The article also discusses in detail the selection of the optimal model and parameter optimization to achieve the best forecasting results for energy demand. The presented research holds significant practical importance for advancing the field of energy resource management, enhancing their utilization efficiency, and optimizing decision-making processes. This enables the identification of the most effective and accurate approaches to forecast energy demand across diverse conditions and scenarios.
Existence tests for limiting cycles of second order differential equations Mullosharaf Kurbonovich Arabov, Ergashboy Mirzoevich Muhamadiev, Iskhokboi Dzhumaevich Nurov, Khurshed Ilkhomiddinovich Sobirov Ufa Mathematical Journal, 2017 This work is devoted to finding limiting cycles in the vicinity of equilibria of second order nonlinear differential equations. We obtain new conditions for the coefficients of the equations ensuring the existence of a limiting cycle by employing the methods of qualitative analysis and computer modeling. We study the behavior of a singular point under variation of the parameters and we apply the Lyapunov stability theory. On the base of the obtained results, we make a sector partition of the plane. This partition allows us to predict the behavior of the solutions in various parts of the plane. We develop a package of computer programs for constructing a phase portrait in the corresponding domains.
RECENT SCHOLAR PUBLICATIONS
Tatarstan Toponyms: A Bilingual Dataset and Hybrid RAG System for Geospatial Question Answering MK Arabov arXiv preprint arXiv:2605.05962 , 2026 2026
Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir MK Arabov, SS Khaybullina arXiv preprint arXiv:2605.04948 , 2026 2026 Citations: 1
TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script) MK Arabov arXiv preprint arXiv:2605.04583 , 2026 2026
Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus MK Arabov arXiv preprint arXiv:2605.04576 , 2026 2026
Natural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHF MK Arabov arXiv preprint arXiv:2605.03799 , 2026 2026
Benchmarking Parameter-Efficient Fine-Tuning of Large Language Models for Low-Resource Tajik Text Generation with the Tajik Web Corpus MK Arabov arXiv preprint arXiv:2605.03742 , 2026 2026
A Systematic Benchmark of Machine Transliteration Models for the Tajik-Farsi Language Pair: A Comparative Study from Rule-Based to Transformer Architectures MK Arabov arXiv preprint arXiv:2605.02270 , 2026 2026 Citations: 1
Анализ эффективности субсловных токенизаторов в малоресурсной лингвистической среде: опыт реализации на таджикском языке МК Арабов, СС Хайбуллина Электронные библиотеки 29 (2), 546-564 , 2026 2026
Hybrid Residual Fuzzy Time Series Method and Comparison with Modern Machine Learning Algorithms for Global Temperature Forecasting M Arabov, R Burnashev, D Lenchenko 2026 International Russian Smart Industry Conference (SmartIndustryCon), 68-73 , 2026 2026
Modelling Subword Embeddings of Low-Resource Languages for the Digitalisation of Industry 4.0 Industrial Systems MK Arabov, RN Gainullin, AI Khusaenov 2026 International Russian Smart Industry Conference (SmartIndustryCon), 74-79 , 2026 2026
Программа для ЭВМ «ML Pipeline PRO Suite» МК Арабов, НИ Фатхуллоев RU Patent 2,026,617,263 , 2026 2026
Программа для ЭВМ «TimeFlow Pro» МК Арабов, НИ Фатхуллоев RU Patent 2,026,616,961 , 2026 2026
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family R Merchant, K Megerdoomian The Proceedings of the First Workshop on NLP and LLMs for the Iranian … , 2026 2026
TajPersLexon: A Tajik–Persian lexical resource and hybrid model for cross-script low-resource NLP MK Arabov The Proceedings of the First Workshop on NLP and LLMs for the Iranian … , 2026 2026 Citations: 5
Character-level transformer for Tajik–Persian transliteration with a parallel lexical corpus AM Kurbonovich Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, 75-83 , 2026 2026 Citations: 5
Суверенитет данных как основа цифрового будущего Республики Таджикистан в эпоху LLM: оценка возможностей, рисков и путей построения национальной экосистемы МК Арабов Развитие искусственного интеллекта в условиях информатизации общества … , 2026 2026
Программа для ЭВМ «PectinProductionPredictor» МК Арабов, ШЁ Холов RU Patent 2,025,697,729 , 2026 2026
Программа для ЭВМ «TatarTokenizers» МК Арабов RU Patent 2,026,611,049 , 2026 2026
Программа для ЭВМ «Tatar2Vec» МК Арабов RU Patent 2,026,610,619 , 2026 2026
MOST CITED SCHOLAR PUBLICATIONS
Comparative analysis of ensemble and linear machine learning models in the task of house price prediction I Zitoune, MK Arabov 2024 International Russian Automation Conference (RusAutoCon), 50-55 , 2024 2024 Citations: 9
Анализ устойчивости особой точки квазилинейного уравнения второго порядка МК Арабов Известия Академии наук Республики Таджикистан. Отделение физико … , 2015 2015 Citations: 7
TajPersLexon: A Tajik–Persian lexical resource and hybrid model for cross-script low-resource NLP MK Arabov The Proceedings of the First Workshop on NLP and LLMs for the Iranian … , 2026 2026 Citations: 5
Character-level transformer for Tajik–Persian transliteration with a parallel lexical corpus AM Kurbonovich Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, 75-83 , 2026 2026 Citations: 5
Algorithm application of machine learning algorithms to predict energy demand MK Arabov, AF Nazipova, RA Burnashev 2024 International Conference on Industrial Engineering, Applications and … , 2024 2024 Citations: 5
Компьютерная визуализация поведения решений негладкой динамической системы МК Арабов, АМ Гулов, ИД Нуров Доклады Академии наук Республики Таджикистан 57 (9-10), 739-745 , 2014 2014 Citations: 5
Creating a multiformat text corpus for the Tajik language to train modern language models KKK Arabov M.K., Makhmadaliev Kh.S. Science and Innovation. Series of Geological and Technical Sciences, 131-136 , 2025 2025 Citations: 3
Роҷеъ ба зарурати ҷорӣ намудани усулҳои муосири таълими веб-барномасозӣ бо донишҷӯёни ихтисосҳои IT МҚ Арабов, АЭ Сатторов, КҚ Ҳабибуллозода Вестник Бохтарского государственного университета имени Носира Хусрава … , 2022 2022 Citations: 3
Package of programs on automation of hydraulic accounting of hydraulic structures and determination of the amount of material consumption. Certification about registration of … NI Fathulloev, MK Arabov From , 2020 2020 Citations: 3
О методики обучения ООП на основе графических возможностей языка C++ ХК Хабибулло, МК Арабов, АЭ Сатторов Вестник педагогического университета (Серия 2: Педагогики и психологии … , 2020 2020 Citations: 3
Existence tests for limiting cycles of second order differential equations M Arabov Ufa Mathematical Journal , 2017 2017 Citations: 3
Признаки бифуркации Андронова-Хопфа для динамических систем, содержащих негладкие нелинейности МГ Юмагулов, МК Арабов Известия Академии наук Республики Таджикистан. Отделение физико … , 2016 2016 Citations: 3
Identification of the Original Author of a Social Media Post Based on Text Analysis, Time Dependencies, and the Structure of Reposts Using Combined Neural Networks M Arabov, A Shaydullina 2025 International Conference on Industrial Engineering, Applications and … , 2025 2025 Citations: 2
V. Comparative analysis of methods for modelling semantic word representations under low-resource language conditions: The case of tajik MK Arabov, V Sedykh Scientific and Technical Bulletin of the Volga Region 6, 196-198 , 2025 2025 Citations: 2
Developing the Tajik language in the era of large language models: Corpus infrastructure, linguistic challenges, and safety alignment MK Arabov Modern Science, 85-93 , 2025 2025 Citations: 2
Comparative Analysis of FCN and U-Net for Retinal Blood Vessels Segmentation: A Performance Evaluation A Amirouche, MK Arabov 2024 International Russian Automation Conference (RusAutoCon), 89-94 , 2024 2024 Citations: 2
" Comparative Analysis of Ensemble and Linear Machine Learning Models in the Task of House Price Prediction", 2024 International Russian Automation Conference (RusAutoCon … I Zitoune, MK Arabov 2024 Citations: 2
Анализ локальных бифуркаций динамических систем, содержащих негладкие нелинейности МК Арабов Вестник Таджикского национального университета. Серия естественных наук, 45-48 , 2015 2015 Citations: 2