Tleubayeva Arailym

@astanait.edu.kz

Department of Computer Engeeniring
Astana IT University

RESEARCH, TEACHING, or OTHER INTERESTS

Computer Engineering, Computer Science, Artificial Intelligence, Information Systems
8

Scopus Publications

49

Scholar Citations

4

Scholar h-index

1

Scholar i10-index

Scopus Publications

  • Enhancing Question Answering for Low-Resource Languages: The Case of Kazakh Language
    Arailym Tleubayeva, Zhansaya Makhambetova, Aigerim Mansurova, Adai Shomanov
    Proceedings of the 18th IEEE ACM International Conference on Utility and Cloud Computing Ucc 2025, 2025
  • A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering
    Aigerim Mansurova, Arailym Tleubayeva, Aliya Nugumanova, Adai Shomanov, Sadi Evren Seker
    Information Switzerland, 2025
    This paper presents a systematic evaluation of large language models (LLMs) and retrieval-augmented generation (RAG) approaches for question answering (QA) in the low-resource Kazakh language. We assess the performance of existing proprietary (GPT-4o, Gemini 2.5-flash) and open-source Kazakh-oriented models (KazLLM-8B, Sherkala-8B, Irbis-7B) across closed-book and RAG settings. Within a three-stage evaluation framework we benchmark retriever quality, examine LLM abilities such as knowledge-gap detection, external truth integration and context grounding, and measures gains from realistic end-to-end RAG pipelines. Our results show a clear pattern: proprietary models lead in closed-book QA, but RAG narrows the gap substantially. Under the Ideal RAG setting, KazLLM-8B improves from its closed-book baseline of 0.427 to reach answer correctness of 0.867, closely matching GPT-4o’s score of 0.869. In the end-to-end RAG setup, KazLLM-8B paired with Snowflake retriever achieved answer correctness up to 0.754, surpassing GPT-4o’s best score of 0.632. Despite improvements, RAG outcomes show an inconsistency: high retrieval metrics do not guarantee high QA system accuracy. The findings highlight the importance of retrievers and context grounding strategies in enabling open-source Kazakh models to deliver competitive QA performance in a low-resource setting.
  • Text Similarity Detection in Agglutinative Languages: A Case Study of Kazakh Using Hybrid N-Gram and Semantic Models
    Svitlana Biloshchytska, Arailym Tleubayeva, Oleksandr Kuchanskyi, Andrii Biloshchytskyi, Yurii Andrashko, et al.
    Applied Sciences Switzerland, 2025
    This study presents an advanced hybrid approach for detecting near-duplicate texts in the Kazakh language, addressing the specific challenges posed by its agglutinative morphology. The proposed method combines statistical and semantic techniques, including N-gram analysis, TF-IDF, LSH, LSA, and LDA, and is benchmarked against the bert-base-multilingual-cased model. Experiments were conducted on the purpose-built Arailym-aitu/KazakhTextDuplicates corpus, which contains over 25,000 manually modified text fragments using typical techniques, such as paraphrasing, word order changes, synonym substitution, and morphological transformations. The results show that the hybrid model achieves a precision of 1.00, a recall of 0.73, and an F1-score of 0.84, significantly outperforming traditional N-gram and TF-IDF approaches and demonstrating comparable accuracy to the BERT model while requiring substantially lower computational resources. The hybrid model proved highly effective in detecting various types of near-duplicate texts, including paraphrased and structurally modified content, making it suitable for practical applications in academic integrity verification, plagiarism detection, and intelligent text analysis. Moreover, this study highlights the potential of lightweight hybrid architectures as a practical alternative to large transformer-based models, particularly for languages with limited annotated corpora and linguistic resources. It lays the foundation for future research in cross-lingual duplicate detection and deep model adaptation for the Kazakh language.
  • Development and Evaluation of a Small Kazakh Language Corpus to Improve the Efficiency of Multilingual NLP Systems in Low-Resource Environments
    Arailym Tleubayeva, Sultan Aubakirov, Aisultan Tabuldin, Aday Shomanov
    Sist 2025 2025 IEEE 5th International Conference on Smart Information Systems and Technologies Conference Proceedings, 2025
  • Multilingual QA-RAG: Evaluating LLMs' Contradiction Handling in English and Kazakh
    Arailym Tleubayeva, Aigerim Mansurova, Sultan Aubakirov, Aisultan Tabuldin, Adai Shomanov, et al.
    Proceedings 29th IEEE Acis International Conference on Software Engineering Artificial Intelligence Networking and Parallel Distributed Computing Snpd 2025 Summer, 2025
  • Effective detection of breast pathology using machine learning methods
    Ainur Orazayeva, Jamalbek Tussupov, Gulmira Shangytbayeva, Assem Galymova, Ulzhalgas Zhunissova, et al.
    International Journal of Electrical and Computer Engineering, 2024
    This work is devoted to the research and development of methods for effectively identifying breast pathologies using modern machine learning technologies, such as you only look once (YOLOv8) and faster region-based convolutional neural network (R-CNN). The paper presents an analysis of existing approaches to the diagnosis of breast diseases and an assessment of their effectiveness. YOLOv8 and Faster R-CNN architectures are then applied to create pathology detection models in mammography images. The work analyzed and classified identified breast pathologies at six levels, taking into account different degrees of severity and characteristics of the diseases. This approach allows for more accurate determination of disease progression and provides additional data for more individualized treatment planning. Classification results at various levels can improve the quality of medical decisions and provide more accurate information to doctors, which in turn improves the overall efficiency of diagnosis and treatment of breast diseases. Experimental results demonstrate high accuracy and speed of image processing, providing fast and reliable detection of potential breast pathologies. The data obtained confirm the effectiveness of the use of machine learning algorithms in the field of medical diagnostics, providing prospects for the further development of automated systems for detecting breast diseases in order to improve early diagnosis and treatment efficiency.
  • Systemic approach to optimizing natural language processing technologies in Astana IT University's admissions process
    Arailym Tleubayeva, Alina Mitroshina, Alpar Arman, Arystan Shokan, Shaikhanova Aigul
    Sist 2024 2024 IEEE 4th International Conference on Smart Information Systems and Technologies Proceedings, 2024
    Impelemtation of artificial intelligence (AI) has a transformative potential for various sectors, including higher education. This study is focused on an AI system's effective development and accuracy in streamlining university admissions. The focal point is the AITU Admissions Advisor, an AI solution crafted to navigate the complexities of the admissions process. The study examines the problem of operational inefficiencies and inaccuracies that plague traditional admissions methods, and it positions the AI system as a remedy by offering automation and intelligent decision-making capabilities. The essence of the findings derives from a methodical evaluation of the AITU system against conventional practices, revealing its enhanced efficiency and precision in handling admissions procedures. Distinguishing features of these results include the system's adept use of natural language processing (NLP), sophisticated machine learning models, and a dynamic feedback system that collectively elevate its performance metrics. These technological strides underscore the system's reliability and responsiveness to the nuanced needs of applicants and administrators alike. The paper concludes that for practical implementation, seamless integration with existing university infrastructures, thorough staff training, and continuous system monitoring are imperative. This study provides a blueprint for the application of AI in higher education, showcasing a system that not only meets but anticipates the demands of modern university admissions.
  • Machine Learning Expert System for Recognizing Emotions in text 'Umai Cloud Services'
    Arailym Tleubayeva, Aigul Shaikhanova, Baurzhan Ospan, Ayan Sultan, Mariyam Abu, et al.
    Sist 2023 2023 IEEE International Conference on Smart Information Systems and Technologies Proceedings, 2023
    In this research, the focus is on recognizing 28 emotions in a text using the Roberta model, which is a state-of-the-art pre-trained language model that has achieved outstanding results in various natural language processing tasks. The study explores the effectiveness of the Roberta model for emotion recognition and compares it with other approaches, such as CNNs and RNNs. In addition, the research investigates the problem of toxicity detection, which involves identifying and flagging potentially harmful or offensive language in a given text. Various techniques for toxicity detection are considered, including supervised learning and deep learning methods. The study also explores the process of extracting key phrases and words from a text using machine learning algorithms. This involves applying NLP techniques such as part-of-speech tagging, named entity recognition, and text summarization. All of these methods are implemented and tested using a cloud service provided by Umai Cloud Services, a Kazakh startup company that offers machine learning and artificial intelligence solutions. The results of the study demonstrate the effectiveness of the Roberta model for emotion recognition and show promising results for toxicity detection and text summarization.

RECENT SCHOLAR PUBLICATIONS

  • Enhancing Question Answering for Low-Resource Languages: The Case of Kazakh Language
    A Tleubayeva, Z Makhambetova, A Mansurova, A Shomanov
    Proceedings of the 18th IEEE/ACM International Conference on Utility and … , 2025
    2025
  • DETECTING DUPLICATES IN KAZAKH TEXTS: A COMPARISON OF TF-IDF, WORD AND SENTENCE EMBEDDINGS
    ABNAB Nugumanovа
    МЕЖДУНАРОДНЫЙ ЖУРНАЛ ИНФОРМАЦИОННЫХ И КОММУНИКАЦИОННЫХ ТЕХНОЛОГИЙ 6 (4 … , 2025
    2025
  • A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering
    SE Mansurova, A., Tleubayeva, A., Nugumanova, A., Shomanov, A., & Seker
    Information 16 (11), 943 , 2025
    2025
    Citations: 3
  • COMPARATIVE ANALYSIS OF EMBEDDING MODELS FOR MATCHING QUESTIONS AND CONTEXTS IN THE KAZAKH LANGUAGE
    ZK K. Mazhitova, A. Tleubayeva, S. Mukhammediya, A. Tanirbergenova, A ...
    Vestnik KazUTB 3 (28), 13-23 , 2025
    2025
  • Multilingual QA-RAG: Evaluating LLMs' Contradiction Handling in English and Kazakh
    A Tleubayeva, A Mansurova, S Aubakirov, A Tabuldin, A Shomanov, ...
    2025 IEEE/ACIS 29th International Conference on Software Engineering … , 2025
    2025
    Citations: 1
  • Text similarity detection in agglutinative languages: A case study of Kazakh using hybrid n-gram and semantic models
    S Biloshchytska, A Tleubayeva, O Kuchanskyi, A Biloshchytskyi, ...
    Applied Sciences 15 (12), 6707 , 2025
    2025
    Citations: 7
  • Development and Evaluation of a Small Kazakh Language Corpus to Improve the Efficiency of Multilingual NLP Systems in Low-Resource Environments
    A Tleubayeva, S Aubakirov, A Tabuldin, A Shomanov
    2025 IEEE 5th International Conference on Smart Information Systems and … , 2025
    2025
    Citations: 2
  • Protege ontology in computer science
    AOT K.M. Maksutova, R.S. Niyazova, A.K. Shaikhanova
    Вестник Национальной инженерной академии Республики Казахстан 4 (94), 112-123 , 2024
    2024
  • Enhancing fingerprint recognition systems: Comparative analysis of biometric authentication algorithms and techniques for improved accuracy and reliability
    T Meiramkhanov, A Tleubayeva
    arXiv preprint arXiv:2412.14404 , 2024
    2024
    Citations: 14
  • Интеграция искусственного интеллекта для обнаружения респираторных заболеваний в программно-аппаратный комплекс «Диагностика на дому»
    А Шайханова, И Поз, Э Кусембаева, С Даулеткалиулы, А Тлеубаева
    Вестник КазАТК 135 (6), 272-282 , 2024
    2024
    Citations: 3
  • Comparative analysis of multilingual QA models and their adaptation to the Kazakh language
    A Tleubayeva, A Shomanov
    Scientific Journal of Astana IT University, 89-97 , 2024
    2024
    Citations: 8
  • Systemic approach to optimizing natural language processing technologies in Astana IT University's admissions process
    A Tleubayeva, A Mitroshina, A Arman, A Shokan, S Aigul
    2024 IEEE 4th International Conference on Smart Information Systems and … , 2024
    2024
    Citations: 2
  • Effective detection of breast pathology using machine learning methods
    ZK Ainur Orazayeva, Jamalbek Tussupov, Gulmira Shangytbayeva, Assem Galymova ...
    International Journal of Electrical and Computer Engineering (IJECE) 14 (5 … , 2024
    2024
    Citations: 4
  • INNOVATIVE ARCHITECTURAL SOLUTIONS AND INTERDISCIPLINARY IMPLEMENTATION OF THE BULT CLOUD PLATFORM FOR WEB APPLICATION ORCHESTRATION
    AK Shaikhanova, ZA Bermukhambetov, VV Kim, AO Tleubayeva
    Вестник Университета Шакарима. Серия технические науки, 40-48 , 2024
    2024
  • Machine learning expert system for recognizing emotions in text “Umai Cloud Services”
    A Tleubayeva, A Shaikhanova, B Ospan, A Sultan, M Abu, N Darmenkyzy
    2023 IEEE International Conference on Smart Information Systems and … , 2023
    2023
    Citations: 2
  • Удаленная диагностика–польза для узкоспециализированных врачей
    АК Шайханова, И Поз, ЭА Кусембаева, АО Тлеубаева
    Вестник Университета Шакарима. Серия технические науки, 5-13 , 2023
    2023
  • A model of an autonomous smart lighting system using sensors
    A Tleubayeva, A Maidanov, A Kantayeva
    Scientific Journal of Astana IT University, 34-44 , 2022
    2022
    Citations: 2
  • Практика преподавания курса «Робототехника» в образовательной среде LEGO Education
    СМВ Тулегулов А. Д., Ешпанов В. С., Тлеубаева А. О., Серикбай А. Т., Ержуман ...
    https://phsreda.com/ru/article/97068/discussion_platformhttps://phsreda.com … , 2020
    2020
    Citations: 1
  • Математикалық модельдеу әдістерімен екі механикалық дененің соқтығысу ықтималдығын есептеу
    ТАО ДЖУМАМУХАМБЕТОВ Н.Г., ТУЛЕГУЛОВ А.Д., НУРГАЛИЕВА Р.М.
    Журнал «Промышленный транспорт Казахстана». 3 (68), 87-92 , 2020
    2020
  • Освоение практических цифровых навыков в сфере информационной безопасности
    АД Тулегулов, ВС Ешпанов, АО Тлеубаева, СМ Меирбекулы, ...
    Рецензенты: Жданова Светлана Николаевна, д-р пед. наук, 69 , 2020
    2020

MOST CITED SCHOLAR PUBLICATIONS

  • Enhancing fingerprint recognition systems: Comparative analysis of biometric authentication algorithms and techniques for improved accuracy and reliability
    T Meiramkhanov, A Tleubayeva
    arXiv preprint arXiv:2412.14404 , 2024
    2024
    Citations: 14
  • Comparative analysis of multilingual QA models and their adaptation to the Kazakh language
    A Tleubayeva, A Shomanov
    Scientific Journal of Astana IT University, 89-97 , 2024
    2024
    Citations: 8
  • Text similarity detection in agglutinative languages: A case study of Kazakh using hybrid n-gram and semantic models
    S Biloshchytska, A Tleubayeva, O Kuchanskyi, A Biloshchytskyi, ...
    Applied Sciences 15 (12), 6707 , 2025
    2025
    Citations: 7
  • Effective detection of breast pathology using machine learning methods
    ZK Ainur Orazayeva, Jamalbek Tussupov, Gulmira Shangytbayeva, Assem Galymova ...
    International Journal of Electrical and Computer Engineering (IJECE) 14 (5 … , 2024
    2024
    Citations: 4
  • A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering
    SE Mansurova, A., Tleubayeva, A., Nugumanova, A., Shomanov, A., & Seker
    Information 16 (11), 943 , 2025
    2025
    Citations: 3
  • Интеграция искусственного интеллекта для обнаружения респираторных заболеваний в программно-аппаратный комплекс «Диагностика на дому»
    А Шайханова, И Поз, Э Кусембаева, С Даулеткалиулы, А Тлеубаева
    Вестник КазАТК 135 (6), 272-282 , 2024
    2024
    Citations: 3
  • Development and Evaluation of a Small Kazakh Language Corpus to Improve the Efficiency of Multilingual NLP Systems in Low-Resource Environments
    A Tleubayeva, S Aubakirov, A Tabuldin, A Shomanov
    2025 IEEE 5th International Conference on Smart Information Systems and … , 2025
    2025
    Citations: 2
  • Systemic approach to optimizing natural language processing technologies in Astana IT University's admissions process
    A Tleubayeva, A Mitroshina, A Arman, A Shokan, S Aigul
    2024 IEEE 4th International Conference on Smart Information Systems and … , 2024
    2024
    Citations: 2
  • Machine learning expert system for recognizing emotions in text “Umai Cloud Services”
    A Tleubayeva, A Shaikhanova, B Ospan, A Sultan, M Abu, N Darmenkyzy
    2023 IEEE International Conference on Smart Information Systems and … , 2023
    2023
    Citations: 2
  • A model of an autonomous smart lighting system using sensors
    A Tleubayeva, A Maidanov, A Kantayeva
    Scientific Journal of Astana IT University, 34-44 , 2022
    2022
    Citations: 2
  • Multilingual QA-RAG: Evaluating LLMs' Contradiction Handling in English and Kazakh
    A Tleubayeva, A Mansurova, S Aubakirov, A Tabuldin, A Shomanov, ...
    2025 IEEE/ACIS 29th International Conference on Software Engineering … , 2025
    2025
    Citations: 1
  • Практика преподавания курса «Робототехника» в образовательной среде LEGO Education
    СМВ Тулегулов А. Д., Ешпанов В. С., Тлеубаева А. О., Серикбай А. Т., Ержуман ...
    https://phsreda.com/ru/article/97068/discussion_platformhttps://phsreda.com … , 2020
    2020
    Citations: 1
  • Enhancing Question Answering for Low-Resource Languages: The Case of Kazakh Language
    A Tleubayeva, Z Makhambetova, A Mansurova, A Shomanov
    Proceedings of the 18th IEEE/ACM International Conference on Utility and … , 2025
    2025
  • DETECTING DUPLICATES IN KAZAKH TEXTS: A COMPARISON OF TF-IDF, WORD AND SENTENCE EMBEDDINGS
    ABNAB Nugumanovа
    МЕЖДУНАРОДНЫЙ ЖУРНАЛ ИНФОРМАЦИОННЫХ И КОММУНИКАЦИОННЫХ ТЕХНОЛОГИЙ 6 (4 … , 2025
    2025
  • COMPARATIVE ANALYSIS OF EMBEDDING MODELS FOR MATCHING QUESTIONS AND CONTEXTS IN THE KAZAKH LANGUAGE
    ZK K. Mazhitova, A. Tleubayeva, S. Mukhammediya, A. Tanirbergenova, A ...
    Vestnik KazUTB 3 (28), 13-23 , 2025
    2025
  • Protege ontology in computer science
    AOT K.M. Maksutova, R.S. Niyazova, A.K. Shaikhanova
    Вестник Национальной инженерной академии Республики Казахстан 4 (94), 112-123 , 2024
    2024
  • INNOVATIVE ARCHITECTURAL SOLUTIONS AND INTERDISCIPLINARY IMPLEMENTATION OF THE BULT CLOUD PLATFORM FOR WEB APPLICATION ORCHESTRATION
    AK Shaikhanova, ZA Bermukhambetov, VV Kim, AO Tleubayeva
    Вестник Университета Шакарима. Серия технические науки, 40-48 , 2024
    2024
  • Удаленная диагностика–польза для узкоспециализированных врачей
    АК Шайханова, И Поз, ЭА Кусембаева, АО Тлеубаева
    Вестник Университета Шакарима. Серия технические науки, 5-13 , 2023
    2023
  • Математикалық модельдеу әдістерімен екі механикалық дененің соқтығысу ықтималдығын есептеу
    ТАО ДЖУМАМУХАМБЕТОВ Н.Г., ТУЛЕГУЛОВ А.Д., НУРГАЛИЕВА Р.М.
    Журнал «Промышленный транспорт Казахстана». 3 (68), 87-92 , 2020
    2020
  • Освоение практических цифровых навыков в сфере информационной безопасности
    АД Тулегулов, ВС Ешпанов, АО Тлеубаева, СМ Меирбекулы, ...
    Рецензенты: Жданова Светлана Николаевна, д-р пед. наук, 69 , 2020
    2020