Arabov Mulosharaf Kurbanovich

@kpfu.ru

Department of Data Analysis and Software Technology, Kazan Federal University
Kazan Federal University

Arabov Mulosharaf Kurbanovich
Highly motivated researcher and software engineer with a PhD in Physical and Mathematical Sciences and 10+ years of experience in academic research, teaching, and full-stack development. Senior Lecturer at Kazan Federal University teaching ML, NLP, and time series forecasting. Research focuses on differential equations, dynamical systems, and applied machine learning. Published in peer-reviewed journals and conferences. Supervised 25+ Master's projects in AI.

EDUCATION

PhD (Candidate of Sciences) in Physical and Mathematical Sciences, Institute of Mathematics, Academy of Sciences of Tajikistan (2012–2016). Specialist Degree in Computer Science & Software Engineering, Tajik National University (2007–2012).

RESEARCH, TEACHING, or OTHER INTERESTS

Applied Mathematics, Artificial Intelligence, Theoretical Computer Science, General Mathematics
7

Scopus Publications

87

Scholar Citations

5

Scholar h-index

Scopus Publications

  • Character-Level Transformer for Tajik-Persian Transliteration with a Parallel Lexical Corpus
    Arabov Mullosharaf Kurbonovich
    Eacl 2026 19th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of the 2nd Workshop on Nlp for Languages Using Arabic Script Abjadnlp 2026, 2026
    This study addresses automatic transliteration from Tajik (Cyrillic script) to Persian (Perso-Arabic script).We present a curated, lexicographically verified parallel corpus of 52,152 Tajik-Persian words and short phrases, compiled from printed dictionaries, encyclopedic sources, and manually verified online resources.To the best of our knowledge, this is one of the largest publicly available word-level corpora for Tajik-Persian transliteration.Using this corpus, we train a character-level sequence-to-sequence Transformer model and evaluate it using Character Error Rate (CER) and exact-match accuracy.The Transformer achieves a CER of 0.3216 and an exactmatch accuracy of 0.3133, outperforming both dictionary-based rule-based and recurrent neural baselines.With beam search (k = 3), performance further improves to CER 0.3182 and accuracy 0.3215.We describe the data collection and preprocessing pipeline, model architecture, and experimental protocol, and report a part-of-speech analysis showing performance differences across lexical categories.All preprocessing scripts, deterministic splits into training, validation, and test sets, and training configurations are released to support reproducibility and further research on Tajik and related Persian dialects.The corpus supports research in character-level transliteration, crossscript NLP, and lexicographic applications.
  • Identification of the Original Author of a Social Media Post Based on Text Analysis, Time Dependencies, and the Structure of Reposts Using Combined Neural Networks
    Mullosharaf Arabov, Adelya Shaydullina
    Proceedings 2025 International Conference on Industrial Engineering Applications and Manufacturing Icieam 2025, 2025
    This article examines the task of identifying the original author of a post on social networks by analyzing the textual content and structure of reposts. A mathematical model was proposed that describes the relationship between the original posts and their reposts, and an algorithm based on a combination of recurrent neural networks (RNN), convolutional neural networks (CNN) and graph neural networks (GNN) was presented. To train and evaluate the model, a synthetic dataset was created containing 50,000 records generated using the Faker library. The developed model has shown high accuracy (RNN 1.000, CNN - 0.9180, GNN - 0.9080) in identifying authors and can be used to analyze the dissemination of information on social networks, as well as to counter misinformation.
  • Comparative Analysis of Intelligent Methods for Automatic Anomaly Detection in Industrial and Distributed Systems Based on Machine Learning and Deep Learning Algorithms
    M. K. Arabov, R. A. Burnashev, O. A. Medvedeva
    Rusautocon Proceedings of the International Russian Automation Conference, 2025
    In modern distributed and industrial systems, an important task is automatic anomaly detection based on system logs. This study compares approaches to model building using machine learning and deep learning techniques. The research is conducted on real-world data consisting of event sequences from various types of information systems, including logs from computing clusters, cloud platforms, and distributed storage systems. Unlike common solutions based on LSTM/GRU networks, a Transformer-based model is proposed that can capture long-term dependencies between events and offers improved interpretability. An evaluation of method performance was carried out using accuracy, recall, and F1-score metrics. The results demonstrate the superiority of the Transformer architecture, especially in handling long sequences and under conditions of limited temporal delay information. The models developed in this work can be applied in the design of automated monitoring systems for industrial information infrastructure. In modern distributed and industrial systems, automatic anomaly detection based on system logs remains a critical challenge.
  • Comparative Analysis of FCN and U-Net for Retinal Blood Vessels Segmentation: A Performance Evaluation
    Anis Amirouche, M. K. Arabov
    Rusautocon Proceedings of the International Russian Automation Conference, 2024
    Accurate retinal blood vessels segmentation is essential for early detection and diagnosis of several diseases. This study presents a comparative analysis between U-Net and FCN for retinal blood vessel segmentation. The aim is to assess their performance across four key metrics: sensitivity, accuracy, precision, and specificity, alongside training time. The analysis revealed differences in performance between the two models in terms of segmentation quality and efficiency. U-Net consistently outperforms FCN in terms of accuracy and precision, showcasing its detailed understanding of vessel characteristics and superior segmentation performance. However, FCN demonstrates competitive accuracy and specificity, and shorter training time, suggesting its utility in situations prioritizing computational efficiency.
  • Comparative Analysis of Ensemble and Linear Machine Learning Models in the Task of House Price Prediction
    Ihcene Zitoune, M. K. Arabov
    Rusautocon Proceedings of the International Russian Automation Conference, 2024
    In today's evolving real estate market, accurately predicting real estate prices is important given the crucial role real estate plays in any country's economy. With numerous methods and techniques available, navigating the complexities of data types and the multitude of factors influencing prices requires careful consideration. This research paper presents a comparative analysis of ensemble and linear machine learning models in predicting real estate prices using a wide range of predictive models, including linear regression, ridge regression, lasso regression, random forest, decision trees, XGBoost and LightGBM. The study evaluates their effectiveness using metrics such as MAE (mean absolute error), RMSE (RMS error), RMSLE (RMS logarithmic error) and R2 (coefficient of determination) and evaluates their training time. The experimental results showed that ensemble models outperformed linear models with a higher training time. These results can be useful both for researchers in the field of real estate and finance, as well as for investors interested in predicting and optimizing real estate investment strategies in a volatile market.
  • Algorithm Application of Machine Learning Algorithms to Predict Energy Demand
    M. K. Arabov, A. F. Nazipova, R. A. Burnashev
    Proceedings 2024 International Conference on Industrial Engineering Applications and Manufacturing Icieam 2024, 2024
    This article presents an analysis of the application of machine learning algorithms to predict energy demand based on temperature and humidity data. Extensive research has been conducted on the effectiveness and accuracy of various machine learning methods in the context of energy demand forecasting, including the support vector machine (SVM) method with various kernels (e.g., ‘rbf’ and ‘poly’) and RandomForestRegressor. The article also discusses in detail the selection of the optimal model and parameter optimization to achieve the best forecasting results for energy demand. The presented research holds significant practical importance for advancing the field of energy resource management, enhancing their utilization efficiency, and optimizing decision-making processes. This enables the identification of the most effective and accurate approaches to forecast energy demand across diverse conditions and scenarios.
  • Existence tests for limiting cycles of second order differential equations
    Mullosharaf Kurbonovich Arabov, Ergashboy Mirzoevich Muhamadiev, Iskhokboi Dzhumaevich Nurov, Khurshed Ilkhomiddinovich Sobirov
    Ufa Mathematical Journal, 2017
    This work is devoted to finding limiting cycles in the vicinity of equilibria of second order nonlinear differential equations. We obtain new conditions for the coefficients of the equations ensuring the existence of a limiting cycle by employing the methods of qualitative analysis and computer modeling. We study the behavior of a singular point under variation of the parameters and we apply the Lyapunov stability theory. On the base of the obtained results, we make a sector partition of the plane. This partition allows us to predict the behavior of the solutions in various parts of the plane. We develop a package of computer programs for constructing a phase portrait in the corresponding domains.

RECENT SCHOLAR PUBLICATIONS

  • Tatarstan Toponyms: A Bilingual Dataset and Hybrid RAG System for Geospatial Question Answering
    MK Arabov
    arXiv preprint arXiv:2605.05962 , 2026
    2026
  • Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir
    MK Arabov, SS Khaybullina
    arXiv preprint arXiv:2605.04948 , 2026
    2026
    Citations: 1
  • TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script)
    MK Arabov
    arXiv preprint arXiv:2605.04583 , 2026
    2026
  • Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus
    MK Arabov
    arXiv preprint arXiv:2605.04576 , 2026
    2026
  • Natural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHF
    MK Arabov
    arXiv preprint arXiv:2605.03799 , 2026
    2026
  • Benchmarking Parameter-Efficient Fine-Tuning of Large Language Models for Low-Resource Tajik Text Generation with the Tajik Web Corpus
    MK Arabov
    arXiv preprint arXiv:2605.03742 , 2026
    2026
  • A Systematic Benchmark of Machine Transliteration Models for the Tajik-Farsi Language Pair: A Comparative Study from Rule-Based to Transformer Architectures
    MK Arabov
    arXiv preprint arXiv:2605.02270 , 2026
    2026
    Citations: 1
  • Анализ эффективности субсловных токенизаторов в малоресурсной лингвистической среде: опыт реализации на таджикском языке
    МК Арабов, СС Хайбуллина
    Электронные библиотеки 29 (2), 546-564 , 2026
    2026
  • Hybrid Residual Fuzzy Time Series Method and Comparison with Modern Machine Learning Algorithms for Global Temperature Forecasting
    M Arabov, R Burnashev, D Lenchenko
    2026 International Russian Smart Industry Conference (SmartIndustryCon), 68-73 , 2026
    2026
  • Modelling Subword Embeddings of Low-Resource Languages for the Digitalisation of Industry 4.0 Industrial Systems
    MK Arabov, RN Gainullin, AI Khusaenov
    2026 International Russian Smart Industry Conference (SmartIndustryCon), 74-79 , 2026
    2026
  • Программа для ЭВМ «ML Pipeline PRO Suite»
    МК Арабов, НИ Фатхуллоев
    RU Patent 2,026,617,263 , 2026
    2026
  • Программный комплекс «TatCorp-222M»
    МК Арабов, РН Гайнуллин
    RU Patent 2,026,617,224 , 2026
    2026
  • Программа для ЭВМ «TimeFlow Pro»
    МК Арабов, НИ Фатхуллоев
    RU Patent 2,026,616,961 , 2026
    2026
  • The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
    R Merchant, K Megerdoomian
    The Proceedings of the First Workshop on NLP and LLMs for the Iranian … , 2026
    2026
  • TajPersLexon: A Tajik–Persian lexical resource and hybrid model for cross-script low-resource NLP
    MK Arabov
    The Proceedings of the First Workshop on NLP and LLMs for the Iranian … , 2026
    2026
    Citations: 5
  • Character-level transformer for Tajik–Persian transliteration with a parallel lexical corpus
    AM Kurbonovich
    Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, 75-83 , 2026
    2026
    Citations: 5
  • Суверенитет данных как основа цифрового будущего Республики Таджикистан в эпоху LLM: оценка возможностей, рисков и путей построения национальной экосистемы
    МК Арабов
    Развитие искусственного интеллекта в условиях информатизации общества … , 2026
    2026
  • Программа для ЭВМ «PectinProductionPredictor»
    МК Арабов, ШЁ Холов
    RU Patent 2,025,697,729 , 2026
    2026
  • Программа для ЭВМ «TatarTokenizers»
    МК Арабов
    RU Patent 2,026,611,049 , 2026
    2026
  • Программа для ЭВМ «Tatar2Vec»
    МК Арабов
    RU Patent 2,026,610,619 , 2026
    2026

MOST CITED SCHOLAR PUBLICATIONS

  • Comparative analysis of ensemble and linear machine learning models in the task of house price prediction
    I Zitoune, MK Arabov
    2024 International Russian Automation Conference (RusAutoCon), 50-55 , 2024
    2024
    Citations: 9
  • Анализ устойчивости особой точки квазилинейного уравнения второго порядка
    МК Арабов
    Известия Академии наук Республики Таджикистан. Отделение физико … , 2015
    2015
    Citations: 7
  • TajPersLexon: A Tajik–Persian lexical resource and hybrid model for cross-script low-resource NLP
    MK Arabov
    The Proceedings of the First Workshop on NLP and LLMs for the Iranian … , 2026
    2026
    Citations: 5
  • Character-level transformer for Tajik–Persian transliteration with a parallel lexical corpus
    AM Kurbonovich
    Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, 75-83 , 2026
    2026
    Citations: 5
  • Algorithm application of machine learning algorithms to predict energy demand
    MK Arabov, AF Nazipova, RA Burnashev
    2024 International Conference on Industrial Engineering, Applications and … , 2024
    2024
    Citations: 5
  • Компьютерная визуализация поведения решений негладкой динамической системы
    МК Арабов, АМ Гулов, ИД Нуров
    Доклады Академии наук Республики Таджикистан 57 (9-10), 739-745 , 2014
    2014
    Citations: 5
  • Рољеъ ба истифодабарии технологияњои муосир ва барноманависї дар бахшњои гуногуни математикї дар муассисањои олии ЉТ
    МҚ Арабов, МШ Халилова
    ПАЁМИ, 277 , 1990
    1990
    Citations: 5
  • Creating a multiformat text corpus for the Tajik language to train modern language models
    KKK Arabov M.K., Makhmadaliev Kh.S.
    Science and Innovation. Series of Geological and Technical Sciences, 131-136 , 2025
    2025
    Citations: 3
  • Роҷеъ ба зарурати ҷорӣ намудани усулҳои муосири таълими веб-барномасозӣ бо донишҷӯёни ихтисосҳои IT
    МҚ Арабов, АЭ Сатторов, КҚ Ҳабибуллозода
    Вестник Бохтарского государственного университета имени Носира Хусрава … , 2022
    2022
    Citations: 3
  • БАЪЗЕ МАСЪАЛАҲОИ АСОСИИ МЕТОДИКАИ ТАДРИСИ ФАННИ ИНФОРМАТИКА ДАР СИНФҲОИ ПОЁНИИ МУАССИСАҲОИ ТАҲСИЛОТИ МИЁНАИ УМУМИИ ҶУМҲУРИИ ТОҶИКИСТОН
    МСС Арабов М.Қ., Розиқов П.Ш.
    Паёми Пажӯҳишгоҳи рушди маориф, 183-188 , 2021
    2021
    Citations: 3
  • Package of programs on automation of hydraulic accounting of hydraulic structures and determination of the amount of material consumption. Certification about registration of …
    NI Fathulloev, MK Arabov
    From , 2020
    2020
    Citations: 3
  • О методики обучения ООП на основе графических возможностей языка C++
    ХК Хабибулло, МК Арабов, АЭ Сатторов
    Вестник педагогического университета (Серия 2: Педагогики и психологии … , 2020
    2020
    Citations: 3
  • Existence tests for limiting cycles of second order differential equations
    M Arabov
    Ufa Mathematical Journal , 2017
    2017
    Citations: 3
  • Признаки бифуркации Андронова-Хопфа для динамических систем, содержащих негладкие нелинейности
    МГ Юмагулов, МК Арабов
    Известия Академии наук Республики Таджикистан. Отделение физико … , 2016
    2016
    Citations: 3
  • Identification of the Original Author of a Social Media Post Based on Text Analysis, Time Dependencies, and the Structure of Reposts Using Combined Neural Networks
    M Arabov, A Shaydullina
    2025 International Conference on Industrial Engineering, Applications and … , 2025
    2025
    Citations: 2
  • V. Comparative analysis of methods for modelling semantic word representations under low-resource language conditions: The case of tajik
    MK Arabov, V Sedykh
    Scientific and Technical Bulletin of the Volga Region 6, 196-198 , 2025
    2025
    Citations: 2
  • Developing the Tajik language in the era of large language models: Corpus infrastructure, linguistic challenges, and safety alignment
    MK Arabov
    Modern Science, 85-93 , 2025
    2025
    Citations: 2
  • Comparative Analysis of FCN and U-Net for Retinal Blood Vessels Segmentation: A Performance Evaluation
    A Amirouche, MK Arabov
    2024 International Russian Automation Conference (RusAutoCon), 89-94 , 2024
    2024
    Citations: 2
  • " Comparative Analysis of Ensemble and Linear Machine Learning Models in the Task of House Price Prediction", 2024 International Russian Automation Conference (RusAutoCon …
    I Zitoune, MK Arabov
    2024
    Citations: 2
  • Анализ локальных бифуркаций динамических систем, содержащих негладкие нелинейности
    МК Арабов
    Вестник Таджикского национального университета. Серия естественных наук, 45-48 , 2015
    2015
    Citations: 2