Natural Language Processing
Data Mining
Machine Learning
145
Scopus Publications
5663
Scholar Citations
30
Scholar h-index
79
Scholar i10-index
Scopus Publications
Disentangling Technical and Content Attributes in Search Engine Ranking: A Comparative Study of Google and Bing Göker Cebeci, Banu Diri IEEE Access, 2026 This study presents a novel empirical methodology to characterize and compare the ranking environments of major information retrieval systems, specifically Google and Bing. By analyzing technical and content attributes from a dataset of 14,465 Search Engine Results Page (SERP) items collected from a homogeneous commercial discount domain comprising 500 queries, we aim to characterize observable associative patterns between resource attributes and ranking outcomes. The dataset includes Lighthouse performance metrics and advanced content features, such as Sentence-BERT-based semantic similarity. Using K-Means clustering, we identify five resource profiles representing emergent optimization archetypes. The analysis revealed that content-related factors had a higher aggregate importance for both systems (Google: 70.1%, Bing: 61.8%) than technical factors. Specifically, Random Forest feature importance analysis highlighted that for Bing, content volume was a dominant predictor, whereas for Google, semantic relevance signals outweighed pure keyword targeting. We further contextualize these findings within an "Authority–Optimization Trade-off" framework, suggesting that Google’s negative associations for certain on-page optimization signals likely reflect a ranking function that heavily weights latent domain authority over explicit on-page compliance. These findings highlight how modern learning-to-rank systems may differentially weight explicit content features and latent authority signals when balancing relevance, diversity, and quality.
Healthcare-Focused Turkish Medical LLM: Training on Real Patient-Doctor Question-Answer Data for Enhanced Medical Insight M. Ali Bayram, Banu Diri, Savas Yildirim ACM Transactions on Asian and Low Resource Language Information Processing, 2025 The development of a Turkish-specific Large Language Model (LLM) for healthcare presents a unique opportunity to enhance AI’s accessibility and relevance for Turkish-speaking medical practitioners and patients. This study introduces a specialized Turkish Medical LLM fine-tuned on over 167,732 real patient-doctor question-answer pairs sourced from a trusted medical platform and capturing authentic linguistics in Turkish medical language. Utilizing models like LLAMA 3, the fine-tuning process was supported by Low-Rank Adaptation (LoRA) and involved innovative methods to mitigate catastrophic forgetting, including spherical linear interpolation (Slerp) merging. Evaluation of the model’s performance through similarity scores, GPT-3.5 assessments, and expert reviews indicates significant improvement in the model’s ability to generate medically accurate responses. This Turkish Medical LLM demonstrates potential to support medical decision-making and patient interaction in Turkish healthcare settings, offering an essential resource for enhancing AI inclusivity across languages.
A hybrid approach for the detection of images generated with multi generator MS-DCGAN Selim Sürücü, Banu Diri Engineering Science and Technology an International Journal, 2025 Over the past few years, there have been significant advances in remote sensing technology that have considerably expanded the range of research that can be conducted using remote sensing systems. Various fields, from agriculture to defense applications, use remote sensing imagery, primarily acquired by sensors mounted on vehicles like satellites and UAVs. In addition to advances in remote sensing technology, there have also been major advancements in deep learning. In recent years, there has been a substantial increase in the studies on these two topics. Generative Adversarial Networks (GAN) technology, another area of artificial intelligence and deep learning research, has taken the generation of fake satellite images to a new level. Users can use these artificial images for a variety of purposes, including information concealment and data expansion. Malicious uses of the generated fake images could trigger international crises. In this paper, we propose a new method for the generation and detection of fake satellite images. The MultiSpectral Deep Convolutional GAN (MS-DCGAN) model is developed to generate fake multispectral images, and the TransStacking model is proposed to distinguish between fake images and real images. This model is tested both as a single generator and multi generator model. The TransStacking (DenseNet201+stacking) model showed a very high success rate achieving 100% accuracy for single generator and 98% accuracy for multi generator MS-DCGAN, respectively. The proposed model is an advanced hybrid model that provides the best results in multi-spectral images and can be applied in diverse domains. Since the TransStacking model is a modular hybrid model, it can be used with many different old and new models. Furthermore, the effect of the models in the base part of the stacking module on the results was also analyzed by performing ablation analysis on the DenseNet201+stacking model, where the best results were obtained.
Notification Text Generation with Large Language Models Hakan Taşköprü, Banu Diri 33rd IEEE Conference on Signal Processing and Communications Applications Siu 2025 Proceedings, 2025 The delivery of notificationswith engaging and contextually appropriate content plays a critical role in increasing user engagement. Although Large Language Models (LLMs) are successful in text generation, systematic research on short, action-oriented content such as notification text generation is limited. In this study, Manual Prompt Creation, Automatic Prompt Creation, and Optimized Automatic Prompt Creation approaches for LLM-based notification text generation were compared. The outputs were generated using Google Gemini Pro and evaluated with LLM-based metrics such as grammatical accuracy, title-text coherence, and naturalness. The results revealed that Optimized Automatic Prompt Creation outperformed other approaches, while Manual Prompt Creation provided more diverse texts. This study contributes to the literature on notification text generation by advancing prompt engineering and introducing new evaluation metrics.
Accessible Hyperlinks and Search Engine Rankings: An Empirical Investigation Göker Cebeci, Banu Diri 33rd IEEE Conference on Signal Processing and Communications Applications Siu 2025 Proceedings, 2025 This study investigates the impact of accessibility features of internal links on a website’s Search Engine Results Page (SERP) ranking. To test the hypothesis that accessibility influences SERP ranking, a labeled dataset was created to measure internal link accessibility. In creating this dataset, internal links on websites were matched with their corresponding subpages listed by search engines to complete the labeling process. Accessibility metrics such as "is_unique", "contrast_ratio", and "text_clarity" were extracted for the internal links using machine learning techniques. Analysis of the data suggests that accessibility features may indeed influence SERP rankings. The labeled dataset created in this study will serve as a resource for future research on internal link accessibility and SERP ranking.
Comparison of Different Segmentation Models for Image Reconstruction: A Preliminary Study Saadet Aytaç Arpacı, Songül Varlı, Banu Diri International Conference on Electrical Computer Communications and Mechatronics Engineering Iceccme 2025, 2025 The need to produce high-quality and more images has increased the importance of image reconstruction methods. However, the reconstruction of images by protecting complete information is still a difficult task. Therefore, image reconstruction is an important research area in current computer science, and in recent years, encoder-decoder architectures based on artificial neural networks have been investigated as an alternative to classical computer vision methods. The U-Net model is the most well-known architecture for semantic segmentation, and until today, various segmentation architectures have been developed on the basis of the U-Net model. In order to find efficient models that can perform the image reconstruction task, in this study, U-Net, which has been more studied in image segmentation, and Attention U-Net, DC-UNet, Sharp U-Net, Residual U-Net (ResUnet), and Recurrent Residual U-Net (R2U-Net) models, which were developed on the basis of U-Net, were compared and evaluated for the image reconstruction task. Furthermore, the effect of “sigmoid” and “softmax” output activation functions on each model was analyzed within the scope of the image reconstruction task. Experiments were carried out on two publicly available datasets. The evaluations were performed according to the Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), and Structure Similarity Index Method (SSIM) criteria. As a result, it is shown that the use of the sigmoid output activation function increases the efficiency of all models. Furthermore, the R2U-Net model with a sigmoid output activation function was observed to be the most successful model. According to the results of this preliminary research, new models can be examined using different datasets for the image reconstruction task in the future.
The Effect of Convolutional Encoder-Decoder Architecture on Lossless Image Compression Applications Saadet Aytaç Arpacı, Banu Diri, Songül Varlı 2025 Innovations in Intelligent Systems and Applications Conference Asyu 2025, 2025 According to the results we obtained, we can state that although the Huffman method fully performs the lossless compression task, quality measurements decrease depending on the synthetic images processed by the method. Therefore, this situation increases the responsibility of the image reconstruction models in the system in terms of being able to create a similar, quality image to the original image. Within the scope of the evaluation, it is shown more clearly with this study that image encoder-decoder (image reconstruction) methods, which are integrated into most current lossless compression methods, affect the performance of lossless compression methods.
Tokenization Standards and Evaluation in Natural Language Processing: A Comparative Analysis of Large Language Models on Turkish M. Ali Bayram, Ali Arda Fincan, Ahmet Semih Gümüş, Sercan Karakaş, Banu Diri, Savaş Yıldırım 33rd IEEE Conference on Signal Processing and Communications Applications Siu 2025 Proceedings, 2025 Tokenization is a fundamental preprocessing step in Natural Language Processing (NLP), significantly impacting the capability of large language models (LLMs) to capture linguistic and semantic nuances. This study introduces a novel evaluation framework addressing tokenization challenges specific to morphologically-rich and low-resource languages such as Turkish. Utilizing the Turkish MMLU (TR-MMLU) dataset, comprising 6,200 multiple-choice questions from the Turkish education system, we assessed tokenizers based on vocabulary size, token count, processing time, language-specific token percentages (%TR), and token purity (%Pure). These newly proposed metrics measure how effectively tokenizers preserve linguistic structures. Our analysis reveals that language-specific token percentages exhibit a stronger correlation with downstream performance (e.g., MMLU scores) than token purity. Furthermore, increasing model parameters alone does not necessarily enhance linguistic performance, underscoring the importance of tailored, language-specific tokenization methods. The proposed framework establishes robust and practical tokenization standards for morphologically complex languages.
TR-MMLU Benchmark for Large Language Models: Performance Evaluation, Challenges, and Opportunities for Improvement M. Ali Bayram, Ali Arda Fincan, Ahmet Semih Gümüş, Banu Diri, Savaş Yıldırım, Öner Aytaş 33rd IEEE Conference on Signal Processing and Communications Applications Siu 2025 Proceedings, 2025 Language models have made significant advancements in understanding and generating human language, achieving remarkable success in various applications. However, evaluating these models remains a challenge, particularly for resource-limited languages like Turkish. To address this issue, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkish. TR-MMLU is based on a meticulously curated dataset comprising 6,200 multiple-choice questions across 62 sections within the Turkish education system. This benchmark provides a standard framework for Turkish NLP research, enabling detailed analyses of LLMs’ capabilities in processing Turkish text. In this study, we evaluated state-of-the-art LLMs on TR-MMLU, highlighting areas for improvement in model design. TR-MMLU sets a new standard for advancing Turkish NLP research and inspiring future innovations.
Automatic Radiology Report Generation with Deep Learning Based Model Fusion Approach Rukiye Başkara, Osman Furkan Karakuş, Ali Can Karaca, Banu Dırı, Arzu Serpil Arslan, Burcu Alparslan, Berk Sünnetçı 2025 Innovations in Intelligent Systems and Applications Conference Asyu 2025, 2025 The preparation of radiology reports from medical images is a time-consuming process and carries a risk of error, especially for inexperienced radiologists. In this study, a fusion process was performed to enable chest X-ray images to be reported more accurately and in greater detail using artificial intelligence supported models. In this context, reports generated by the high performance R2Gen and CvT2DistilGPT2 models were combined to obtain comprehensive reports. Necessary prompts were given to GPT-4 to improve the accuracy of these reports. The widely used IU X-Ray dataset was selected as the dataset. The reports were analyzed through expert evaluations and language generation metrics. Radiologist evaluations determined that the proposed model achieved a score of 2.65 out of 5, which is the closest performance to the reference score of 2.90. Additionally, the scores obtained with RadGraph_F1 and ChatGPT4o, which are important in terms of contextual and clinical accuracy, indicate that the proposed model is effective in improving the accuracy and coherence of the reports it generates.
Hate Speech Dataset from Turkish Tweets Islam Mayda, Yunus Emre Demir, Tugba Dalyan, Banu Diri Proceedings 2021 Innovations in Intelligent Systems and Applications Conference Asyu 2021, 2021
Analysis of English Language Groups with Regular Expressions Ismail Duru, Banu Diri, M. Emir Özçevik, Kerim Ataseven, Gülüstan Dogan, Su White Proceedings 2018 Innovations in Intelligent Systems and Applications Conference Asyu 2018, 2018
Robust feature selection and classification using heuristic algorithms based on correlation feature groups Turkish Online Journal of Educational Technology, 2017
Acomparative study on binary artificial bee colony optimization methods for feature selection Focus on Swarm Intelligence Research and Applications, 2017
Document classification using ontology graph Abdullah Tellioğlu, Faruk Rahmet, Banu Diri 2016 24th Signal Processing and Communication Application Conference Siu 2016 Proceedings, 2016
Question identification on Turkish tweets Zeynep Banu Ozger, Banu Diri, Canan Girgin Inista 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications Proceedings, 2014
Sentence selection methods for text summarization Aysun Guran, Sumeyra Nur Arslan, Esma Kilic, Banu Diri 2014 22nd Signal Processing and Communications Applications Conference Siu 2014 Proceedings, 2014
Extraction of part-whole relations from Turkish corpora Tuğba Yıldız, Savaş Yıldırım, Banu Diri Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2013
Content mining of microblogs M. O. Cingiz, B. Diri Proceedings of the 2012 IEEE ACM International Conference on Advances in Social Networks Analysis and Mining Asonam 2012, 2012
Classification of microblogging users M. Ozgur Cingiz, Banu Diri 2012 20th Signal Processing and Communications Applications Conference Siu 2012 Proceedings, 2012
Named entity recognition from Turkish texts Faik Erdem Dalkılıç, Semih Gelişli, Banu Diri Siu 2010 IEEE 18th Signal Processing and Communications Applications Conference, 2010
Software defect prediction using artificial immune recognition system Proceedings of the IASTED International Conference on Software Engineering Se 2007, 2007
Software process ontology Lecture Notes in Engineering and Computer Science, 2007
Author attribution of turkish texts by feature mining Filiz Türkoğlu, Banu Diri, M. Fatih Amasyalı Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2007
Color image compression using self organizing feature map Proceedings of the IASTED International Conference on Artificial Intelligence and Applications Aia 2006, 2006
Automatic Turkish text categorization in terms of author, genre and gender Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2006
RECENT SCHOLAR PUBLICATIONS
Disentangling Technical and Content Attributes in Search Engine Ranking: A Comparative Study of Google and Bing G Cebeci, B Diri IEEE Access , 2026 2026
Healthcare-Focused Turkish Medical LLM: Training on Real Patient-Doctor Question-Answer Data for Enhanced Medical Insight MA Bayram, B Diri, S Yildirim ACM Transactions on Asian and Low-Resource Language Information Processing … , 2025 2025
Comparison of Different Segmentation Models for Image Reconstruction: A Preliminary Study SA Arpacı, S Varlı, B Diri 2025 5th International Conference on Electrical, Computer, Communications … , 2025 2025
The Effect of Convolutional Encoder-Decoder Architecture on Lossless Image Compression Applications SA Arpacı, B Diri, S Varlı 2025 Innovations in Intelligent Systems and Applications Conference (ASYU), 1-6 , 2025 2025 Citations: 1
Tokens with Meaning: A Hybrid Tokenization Approach for NLP MA Bayram, AA Fincan, AS Gümüş, S Karakaş, B Diri, S Yıldırım, D Çelik arXiv preprint arXiv:2508.14292 , 2025 2025 Citations: 2
Do\u {g} al Dil\. I\c {s} lemede Tokenizasyon Standartlar {\i} ve\" Ol\c {c}\"" um\"" u: T\"" urk\c {c} e\"" Uzerinden B\"" uy\"" uk Dil Modellerinin Kar\c {s}{\i} la\c {s} t {\i …" MA Bayram, AA Fincan, AS Gümüş, S Karakaş, B Diri, S Yıldırım arXiv preprint arXiv:2508.13058 , 2025 2025
Doğal Dil İşlemede Tokenizasyon Standartları ve Ölçümü: Türkçe Üzerinden Büyük Dil Modellerinin Karşılaştırmalı Analizi MA Bayram, AA Fincan, A Semih Gümüş, S Karakaş, B Diri, S Yıldırım arXiv e-prints, arXiv: 2508.13058 , 2025 2025
Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme Fırsatları MA Bayram, AA Fincan, A Semih Gümüş, B Diri, S Yıldırım, Ö Aytaş arXiv e-prints, arXiv: 2508.13044 , 2025 2025
Tr-mmlu benchmark for large language models: Performance evaluation, challenges, and opportunities for improvement MA Bayram, AA Fincan, AS Gümüş, B Diri, S Yıldırım, Ö Aytaş 2025 33rd Signal Processing and Communications Applications Conference (SIU … , 2025 2025 Citations: 2
Accessible Hyperlinks and Search Engine Rankings: An Empirical Investigation G Cebeci, B Diri 2025 33rd Signal Processing and Communications Applications Conference (SIU … , 2025 2025
Tokenization Standards and Evaluation in Natural Language Processing: A Comparative Analysis of Large Language Models on Turkish MA Bayram, AA Fincan, AS Gümüş, S Karakaş, B Diri, S Yıldırım 2025 33rd Signal Processing and Communications Applications Conference (SIU … , 2025 2025
Notification Text Generation with Large Language Models H Taşköprü, B Diri 2025 33rd Signal Processing and Communications Applications Conference (SIU … , 2025 2025
TÜRKÇE SAĞLIK DANIŞMANLIĞINDA BÜYÜK DİL MODELLERİNİN HASTA-DOKTOR İLETİŞİMİNDE KULLANIM POTANSİYELİ MK Bulut, B Diri Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi 28 (2 … , 2025 2025 Citations: 2
Occupation prediction from twitter data T İzdaş, H İskifoğlu, B Diri Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27 … , 2025 2025 Citations: 1
A hybrid approach for the detection of images generated with multi generator MS-DCGAN S Sürücü, B Diri Engineering Science and Technology, an International Journal 63, 101969 , 2025 2025 Citations: 12
Tokenization standards for linguistic integrity: Turkish as a benchmark MA Bayram, AA Fincan, AS Gümüş, S Karakaş, B Diri, S Yıldırım arXiv preprint arXiv:2502.07057 , 2025 2025 Citations: 10
Setting standards in Turkish NLP: TR-MMLU for large language model evaluation MA Bayram, AA Fincan, AS Gümüş, B Diri, S Yıldırım, Ö Aytaş arXiv preprint arXiv:2501.00593 , 2024 2024 Citations: 12
Artificial intelligence revolution in turkish health consultancy: Development of llm-based virtual doctor assistants MK Bulut, B Diri 2024 8th International Artificial Intelligence and Data Processing Symposium … , 2024 2024 Citations: 2
Enhanced feature selection using genetic algorithm for machine-learning-based phishing URL detection E Kocyigit, M Korkmaz, OK Sahingoz, B Diri Applied sciences 14 (14), 6081 , 2024 2024 Citations: 61
MOST CITED SCHOLAR PUBLICATIONS
A systematic review of software fault prediction studies C Catal, B Diri Expert systems with applications 36 (4), 7346-7354 , 2009 2009 Citations: 1238
Machine learning based phishing detection from URLs OK Sahingoz, E Buber, O Demir, B Diri Expert Systems with Applications 117, 345-357 , 2019 2019 Citations: 1121
Derin öğrenme yöntemleri ve uygulamaları hakkında bir inceleme A Şeker, B Diri, HH Balık Gazi Mühendislik Bilimleri Dergisi 3 (3), 47-64 , 2017 2017 Citations: 282
Detection of phishing websites by using machine learning-based URL analysis M Korkmaz, OK Sahingoz, B Diri 2020 11th international conference on computing, communication and … , 2020 2020 Citations: 173
Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm C Catal, U Sevim, B Diri Expert Systems with Applications 38 (3), 2347-2353 , 2011 2011 Citations: 141
Clustering and metrics thresholds based software fault prediction of unlabeled program modules C Catal, U Sevim, B Diri 2009 Sixth international conference on information technology: new … , 2009 2009 Citations: 135
Automatic Turkish text categorization in terms of author, genre and gender MF Amasyalı, B Diri International Conference on Application of Natural Language to Information … , 2006 2006 Citations: 128
Comparative proteogenomic analysis of right-sided colon cancer, left-sided colon cancer and rectal cancer reveals distinct mutational profiles R Imperial, Z Ahmed, OM Toor, C Erdoğan, A Khaliq, P Case, J Case, ... Molecular cancer 17 (1), 177 , 2018 2018 Citations: 124
A corpus-based semantic kernel for text classification by using meaning values of terms B Altınel, MC Ganiz, B Diri Engineering Applications of Artificial Intelligence 43, 54-66 , 2015 2015 Citations: 96
Web page classification using RNN E Buber, B Diri Procedia Computer Science 154, 62-72 , 2019 2019 Citations: 88
NLP based phishing attack detection from URLs E Buber, B Diri, OK Sahingoz International Conference on Intelligent Systems Design and Applications, 608-618 , 2017 2017 Citations: 82
Software fault prediction with object-oriented metrics based artificial immune recognition system C Catal, B Diri International Conference on Product Focused Software Process Improvement … , 2007 2007 Citations: 64
Enhanced feature selection using genetic algorithm for machine-learning-based phishing URL detection E Kocyigit, M Korkmaz, OK Sahingoz, B Diri Applied sciences 14 (14), 6081 , 2024 2024 Citations: 61
An artificial immune system approach for fault prediction in object-oriented software C Catal, B Diri, B Ozumut 2nd International Conference on Dependability of Computer Systems (DepCoS … , 2007 2007 Citations: 60
Twitter verileri ile duygu analizi. ES Akgül, C Ertano, B Diri Pamukkale University Journal of Engineering Sciences 22 (2) , 2016 2016 Citations: 56
Sentiment analysis on Twitter M Meral, B Diri 2014 22nd Signal Processing and Communications Applications Conference (SIU … , 2014 2014 Citations: 55
Automatic author detection for Turkish texts B Diri, MF Amasyalı Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP … , 2003 2003 Citations: 54
Feature selections for the classification of webpages to detect phishing attacks: a survey M Korkmaz, OK Sahingoz, B Diri 2020 International Congress on Human-Computer Interaction, Optimization and … , 2020 2020 Citations: 53
Phishing web page detection using N-gram features extracted from URLs M Korkmaz, E Kocyigit, OK Sahingoz, B Diri 2021 3rd International Congress on Human-Computer Interaction, Optimization … , 2021 2021 Citations: 51
A hybrid phishing detection system using deep learning-based URL and content analysis M Korkmaz, E Kocyigit, O Sahingoz, B Diri Elektronika ir Elektrotechnika 28 (5) , 2022 2022 Citations: 45