A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering Aigerim Mansurova, Arailym Tleubayeva, Aliya Nugumanova, Adai Shomanov, Sadi Evren Seker Information Switzerland, 2025 This paper presents a systematic evaluation of large language models (LLMs) and retrieval-augmented generation (RAG) approaches for question answering (QA) in the low-resource Kazakh language. We assess the performance of existing proprietary (GPT-4o, Gemini 2.5-flash) and open-source Kazakh-oriented models (KazLLM-8B, Sherkala-8B, Irbis-7B) across closed-book and RAG settings. Within a three-stage evaluation framework we benchmark retriever quality, examine LLM abilities such as knowledge-gap detection, external truth integration and context grounding, and measures gains from realistic end-to-end RAG pipelines. Our results show a clear pattern: proprietary models lead in closed-book QA, but RAG narrows the gap substantially. Under the Ideal RAG setting, KazLLM-8B improves from its closed-book baseline of 0.427 to reach answer correctness of 0.867, closely matching GPT-4o’s score of 0.869. In the end-to-end RAG setup, KazLLM-8B paired with Snowflake retriever achieved answer correctness up to 0.754, surpassing GPT-4o’s best score of 0.632. Despite improvements, RAG outcomes show an inconsistency: high retrieval metrics do not guarantee high QA system accuracy. The findings highlight the importance of retrievers and context grounding strategies in enabling open-source Kazakh models to deliver competitive QA performance in a low-resource setting.
Text Similarity Detection in Agglutinative Languages: A Case Study of Kazakh Using Hybrid N-Gram and Semantic Models Svitlana Biloshchytska, Arailym Tleubayeva, Oleksandr Kuchanskyi, Andrii Biloshchytskyi, Yurii Andrashko, et al. Applied Sciences Switzerland, 2025 This study presents an advanced hybrid approach for detecting near-duplicate texts in the Kazakh language, addressing the specific challenges posed by its agglutinative morphology. The proposed method combines statistical and semantic techniques, including N-gram analysis, TF-IDF, LSH, LSA, and LDA, and is benchmarked against the bert-base-multilingual-cased model. Experiments were conducted on the purpose-built Arailym-aitu/KazakhTextDuplicates corpus, which contains over 25,000 manually modified text fragments using typical techniques, such as paraphrasing, word order changes, synonym substitution, and morphological transformations. The results show that the hybrid model achieves a precision of 1.00, a recall of 0.73, and an F1-score of 0.84, significantly outperforming traditional N-gram and TF-IDF approaches and demonstrating comparable accuracy to the BERT model while requiring substantially lower computational resources. The hybrid model proved highly effective in detecting various types of near-duplicate texts, including paraphrased and structurally modified content, making it suitable for practical applications in academic integrity verification, plagiarism detection, and intelligent text analysis. Moreover, this study highlights the potential of lightweight hybrid architectures as a practical alternative to large transformer-based models, particularly for languages with limited annotated corpora and linguistic resources. It lays the foundation for future research in cross-lingual duplicate detection and deep model adaptation for the Kazakh language.
Multilingual QA-RAG: Evaluating LLMs' Contradiction Handling in English and Kazakh Arailym Tleubayeva, Aigerim Mansurova, Sultan Aubakirov, Aisultan Tabuldin, Adai Shomanov, et al. Proceedings 29th IEEE Acis International Conference on Software Engineering Artificial Intelligence Networking and Parallel Distributed Computing Snpd 2025 Summer, 2025
Effective detection of breast pathology using machine learning methods Ainur Orazayeva, Jamalbek Tussupov, Gulmira Shangytbayeva, Assem Galymova, Ulzhalgas Zhunissova, et al. International Journal of Electrical and Computer Engineering, 2024 This work is devoted to the research and development of methods for effectively identifying breast pathologies using modern machine learning technologies, such as you only look once (YOLOv8) and faster region-based convolutional neural network (R-CNN). The paper presents an analysis of existing approaches to the diagnosis of breast diseases and an assessment of their effectiveness. YOLOv8 and Faster R-CNN architectures are then applied to create pathology detection models in mammography images. The work analyzed and classified identified breast pathologies at six levels, taking into account different degrees of severity and characteristics of the diseases. This approach allows for more accurate determination of disease progression and provides additional data for more individualized treatment planning. Classification results at various levels can improve the quality of medical decisions and provide more accurate information to doctors, which in turn improves the overall efficiency of diagnosis and treatment of breast diseases. Experimental results demonstrate high accuracy and speed of image processing, providing fast and reliable detection of potential breast pathologies. The data obtained confirm the effectiveness of the use of machine learning algorithms in the field of medical diagnostics, providing prospects for the further development of automated systems for detecting breast diseases in order to improve early diagnosis and treatment efficiency.
Systemic approach to optimizing natural language processing technologies in Astana IT University's admissions process Arailym Tleubayeva, Alina Mitroshina, Alpar Arman, Arystan Shokan, Shaikhanova Aigul Sist 2024 2024 IEEE 4th International Conference on Smart Information Systems and Technologies Proceedings, 2024 Impelemtation of artificial intelligence (AI) has a transformative potential for various sectors, including higher education. This study is focused on an AI system's effective development and accuracy in streamlining university admissions. The focal point is the AITU Admissions Advisor, an AI solution crafted to navigate the complexities of the admissions process. The study examines the problem of operational inefficiencies and inaccuracies that plague traditional admissions methods, and it positions the AI system as a remedy by offering automation and intelligent decision-making capabilities. The essence of the findings derives from a methodical evaluation of the AITU system against conventional practices, revealing its enhanced efficiency and precision in handling admissions procedures. Distinguishing features of these results include the system's adept use of natural language processing (NLP), sophisticated machine learning models, and a dynamic feedback system that collectively elevate its performance metrics. These technological strides underscore the system's reliability and responsiveness to the nuanced needs of applicants and administrators alike. The paper concludes that for practical implementation, seamless integration with existing university infrastructures, thorough staff training, and continuous system monitoring are imperative. This study provides a blueprint for the application of AI in higher education, showcasing a system that not only meets but anticipates the demands of modern university admissions.
Machine Learning Expert System for Recognizing Emotions in text 'Umai Cloud Services' Arailym Tleubayeva, Aigul Shaikhanova, Baurzhan Ospan, Ayan Sultan, Mariyam Abu, et al. Sist 2023 2023 IEEE International Conference on Smart Information Systems and Technologies Proceedings, 2023 In this research, the focus is on recognizing 28 emotions in a text using the Roberta model, which is a state-of-the-art pre-trained language model that has achieved outstanding results in various natural language processing tasks. The study explores the effectiveness of the Roberta model for emotion recognition and compares it with other approaches, such as CNNs and RNNs. In addition, the research investigates the problem of toxicity detection, which involves identifying and flagging potentially harmful or offensive language in a given text. Various techniques for toxicity detection are considered, including supervised learning and deep learning methods. The study also explores the process of extracting key phrases and words from a text using machine learning algorithms. This involves applying NLP techniques such as part-of-speech tagging, named entity recognition, and text summarization. All of these methods are implemented and tested using a cloud service provided by Umai Cloud Services, a Kazakh startup company that offers machine learning and artificial intelligence solutions. The results of the study demonstrate the effectiveness of the Roberta model for emotion recognition and show promising results for toxicity detection and text summarization.
RECENT SCHOLAR PUBLICATIONS
Enhancing Question Answering for Low-Resource Languages: The Case of Kazakh Language A Tleubayeva, Z Makhambetova, A Mansurova, A Shomanov Proceedings of the 18th IEEE/ACM International Conference on Utility and … , 2025 2025
DETECTING DUPLICATES IN KAZAKH TEXTS: A COMPARISON OF TF-IDF, WORD AND SENTENCE EMBEDDINGS ABNAB Nugumanovа МЕЖДУНАРОДНЫЙ ЖУРНАЛ ИНФОРМАЦИОННЫХ И КОММУНИКАЦИОННЫХ ТЕХНОЛОГИЙ 6 (4 … , 2025 2025
A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering SE Mansurova, A., Tleubayeva, A., Nugumanova, A., Shomanov, A., & Seker Information 16 (11), 943 , 2025 2025 Citations: 3
COMPARATIVE ANALYSIS OF EMBEDDING MODELS FOR MATCHING QUESTIONS AND CONTEXTS IN THE KAZAKH LANGUAGE ZK K. Mazhitova, A. Tleubayeva, S. Mukhammediya, A. Tanirbergenova, A ... Vestnik KazUTB 3 (28), 13-23 , 2025 2025
Multilingual QA-RAG: Evaluating LLMs' Contradiction Handling in English and Kazakh A Tleubayeva, A Mansurova, S Aubakirov, A Tabuldin, A Shomanov, ... 2025 IEEE/ACIS 29th International Conference on Software Engineering … , 2025 2025 Citations: 1
Text similarity detection in agglutinative languages: A case study of Kazakh using hybrid n-gram and semantic models S Biloshchytska, A Tleubayeva, O Kuchanskyi, A Biloshchytskyi, ... Applied Sciences 15 (12), 6707 , 2025 2025 Citations: 7
Development and Evaluation of a Small Kazakh Language Corpus to Improve the Efficiency of Multilingual NLP Systems in Low-Resource Environments A Tleubayeva, S Aubakirov, A Tabuldin, A Shomanov 2025 IEEE 5th International Conference on Smart Information Systems and … , 2025 2025 Citations: 2
Protege ontology in computer science AOT K.M. Maksutova, R.S. Niyazova, A.K. Shaikhanova Вестник Национальной инженерной академии Республики Казахстан 4 (94), 112-123 , 2024 2024
Enhancing fingerprint recognition systems: Comparative analysis of biometric authentication algorithms and techniques for improved accuracy and reliability T Meiramkhanov, A Tleubayeva arXiv preprint arXiv:2412.14404 , 2024 2024 Citations: 14
Интеграция искусственного интеллекта для обнаружения респираторных заболеваний в программно-аппаратный комплекс «Диагностика на дому» А Шайханова, И Поз, Э Кусембаева, С Даулеткалиулы, А Тлеубаева Вестник КазАТК 135 (6), 272-282 , 2024 2024 Citations: 3
Comparative analysis of multilingual QA models and their adaptation to the Kazakh language A Tleubayeva, A Shomanov Scientific Journal of Astana IT University, 89-97 , 2024 2024 Citations: 8
Systemic approach to optimizing natural language processing technologies in Astana IT University's admissions process A Tleubayeva, A Mitroshina, A Arman, A Shokan, S Aigul 2024 IEEE 4th International Conference on Smart Information Systems and … , 2024 2024 Citations: 2
Effective detection of breast pathology using machine learning methods ZK Ainur Orazayeva, Jamalbek Tussupov, Gulmira Shangytbayeva, Assem Galymova ... International Journal of Electrical and Computer Engineering (IJECE) 14 (5 … , 2024 2024 Citations: 4
INNOVATIVE ARCHITECTURAL SOLUTIONS AND INTERDISCIPLINARY IMPLEMENTATION OF THE BULT CLOUD PLATFORM FOR WEB APPLICATION ORCHESTRATION AK Shaikhanova, ZA Bermukhambetov, VV Kim, AO Tleubayeva Вестник Университета Шакарима. Серия технические науки, 40-48 , 2024 2024
Machine learning expert system for recognizing emotions in text “Umai Cloud Services” A Tleubayeva, A Shaikhanova, B Ospan, A Sultan, M Abu, N Darmenkyzy 2023 IEEE International Conference on Smart Information Systems and … , 2023 2023 Citations: 2
Удаленная диагностика–польза для узкоспециализированных врачей АК Шайханова, И Поз, ЭА Кусембаева, АО Тлеубаева Вестник Университета Шакарима. Серия технические науки, 5-13 , 2023 2023
A model of an autonomous smart lighting system using sensors A Tleubayeva, A Maidanov, A Kantayeva Scientific Journal of Astana IT University, 34-44 , 2022 2022 Citations: 2
Практика преподавания курса «Робототехника» в образовательной среде LEGO Education СМВ Тулегулов А. Д., Ешпанов В. С., Тлеубаева А. О., Серикбай А. Т., Ержуман ... https://phsreda.com/ru/article/97068/discussion_platformhttps://phsreda.com … , 2020 2020 Citations: 1
Математикалық модельдеу әдістерімен екі механикалық дененің соқтығысу ықтималдығын есептеу ТАО ДЖУМАМУХАМБЕТОВ Н.Г., ТУЛЕГУЛОВ А.Д., НУРГАЛИЕВА Р.М. Журнал «Промышленный транспорт Казахстана». 3 (68), 87-92 , 2020 2020
Освоение практических цифровых навыков в сфере информационной безопасности АД Тулегулов, ВС Ешпанов, АО Тлеубаева, СМ Меирбекулы, ... Рецензенты: Жданова Светлана Николаевна, д-р пед. наук, 69 , 2020 2020
MOST CITED SCHOLAR PUBLICATIONS
Enhancing fingerprint recognition systems: Comparative analysis of biometric authentication algorithms and techniques for improved accuracy and reliability T Meiramkhanov, A Tleubayeva arXiv preprint arXiv:2412.14404 , 2024 2024 Citations: 14
Comparative analysis of multilingual QA models and their adaptation to the Kazakh language A Tleubayeva, A Shomanov Scientific Journal of Astana IT University, 89-97 , 2024 2024 Citations: 8
Text similarity detection in agglutinative languages: A case study of Kazakh using hybrid n-gram and semantic models S Biloshchytska, A Tleubayeva, O Kuchanskyi, A Biloshchytskyi, ... Applied Sciences 15 (12), 6707 , 2025 2025 Citations: 7
Effective detection of breast pathology using machine learning methods ZK Ainur Orazayeva, Jamalbek Tussupov, Gulmira Shangytbayeva, Assem Galymova ... International Journal of Electrical and Computer Engineering (IJECE) 14 (5 … , 2024 2024 Citations: 4
A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering SE Mansurova, A., Tleubayeva, A., Nugumanova, A., Shomanov, A., & Seker Information 16 (11), 943 , 2025 2025 Citations: 3
Интеграция искусственного интеллекта для обнаружения респираторных заболеваний в программно-аппаратный комплекс «Диагностика на дому» А Шайханова, И Поз, Э Кусембаева, С Даулеткалиулы, А Тлеубаева Вестник КазАТК 135 (6), 272-282 , 2024 2024 Citations: 3
Development and Evaluation of a Small Kazakh Language Corpus to Improve the Efficiency of Multilingual NLP Systems in Low-Resource Environments A Tleubayeva, S Aubakirov, A Tabuldin, A Shomanov 2025 IEEE 5th International Conference on Smart Information Systems and … , 2025 2025 Citations: 2
Systemic approach to optimizing natural language processing technologies in Astana IT University's admissions process A Tleubayeva, A Mitroshina, A Arman, A Shokan, S Aigul 2024 IEEE 4th International Conference on Smart Information Systems and … , 2024 2024 Citations: 2
Machine learning expert system for recognizing emotions in text “Umai Cloud Services” A Tleubayeva, A Shaikhanova, B Ospan, A Sultan, M Abu, N Darmenkyzy 2023 IEEE International Conference on Smart Information Systems and … , 2023 2023 Citations: 2
A model of an autonomous smart lighting system using sensors A Tleubayeva, A Maidanov, A Kantayeva Scientific Journal of Astana IT University, 34-44 , 2022 2022 Citations: 2
Multilingual QA-RAG: Evaluating LLMs' Contradiction Handling in English and Kazakh A Tleubayeva, A Mansurova, S Aubakirov, A Tabuldin, A Shomanov, ... 2025 IEEE/ACIS 29th International Conference on Software Engineering … , 2025 2025 Citations: 1
Практика преподавания курса «Робототехника» в образовательной среде LEGO Education СМВ Тулегулов А. Д., Ешпанов В. С., Тлеубаева А. О., Серикбай А. Т., Ержуман ... https://phsreda.com/ru/article/97068/discussion_platformhttps://phsreda.com … , 2020 2020 Citations: 1
Enhancing Question Answering for Low-Resource Languages: The Case of Kazakh Language A Tleubayeva, Z Makhambetova, A Mansurova, A Shomanov Proceedings of the 18th IEEE/ACM International Conference on Utility and … , 2025 2025
DETECTING DUPLICATES IN KAZAKH TEXTS: A COMPARISON OF TF-IDF, WORD AND SENTENCE EMBEDDINGS ABNAB Nugumanovа МЕЖДУНАРОДНЫЙ ЖУРНАЛ ИНФОРМАЦИОННЫХ И КОММУНИКАЦИОННЫХ ТЕХНОЛОГИЙ 6 (4 … , 2025 2025
COMPARATIVE ANALYSIS OF EMBEDDING MODELS FOR MATCHING QUESTIONS AND CONTEXTS IN THE KAZAKH LANGUAGE ZK K. Mazhitova, A. Tleubayeva, S. Mukhammediya, A. Tanirbergenova, A ... Vestnik KazUTB 3 (28), 13-23 , 2025 2025
Protege ontology in computer science AOT K.M. Maksutova, R.S. Niyazova, A.K. Shaikhanova Вестник Национальной инженерной академии Республики Казахстан 4 (94), 112-123 , 2024 2024
INNOVATIVE ARCHITECTURAL SOLUTIONS AND INTERDISCIPLINARY IMPLEMENTATION OF THE BULT CLOUD PLATFORM FOR WEB APPLICATION ORCHESTRATION AK Shaikhanova, ZA Bermukhambetov, VV Kim, AO Tleubayeva Вестник Университета Шакарима. Серия технические науки, 40-48 , 2024 2024
Удаленная диагностика–польза для узкоспециализированных врачей АК Шайханова, И Поз, ЭА Кусембаева, АО Тлеубаева Вестник Университета Шакарима. Серия технические науки, 5-13 , 2023 2023
Математикалық модельдеу әдістерімен екі механикалық дененің соқтығысу ықтималдығын есептеу ТАО ДЖУМАМУХАМБЕТОВ Н.Г., ТУЛЕГУЛОВ А.Д., НУРГАЛИЕВА Р.М. Журнал «Промышленный транспорт Казахстана». 3 (68), 87-92 , 2020 2020
Освоение практических цифровых навыков в сфере информационной безопасности АД Тулегулов, ВС Ешпанов, АО Тлеубаева, СМ Меирбекулы, ... Рецензенты: Жданова Светлана Николаевна, д-р пед. наук, 69 , 2020 2020