Banu Diri

@yildiz.edu.tr

Electrical and Electronic Faculty / Computer Engineering
Yildiz Technical University



              

https://researchid.co/banu_diri

RESEARCH INTERESTS

Natural Language Processing
Data Mining
Machine Learning

127

Scopus Publications

4316

Scholar Citations

27

Scholar h-index

66

Scholar i10-index

Scopus Publications

  • Transferemble: A classification method for the detection of fake satellite images created with deep convolutional generative adversarial network
    Selim Sürücü and Banu Diri

    SPIE-Intl Soc Optical Eng
    Abstract. As the number of government and commercial satellites increases, there is a large increase in Earth observation (EO) imagery. Using different locations and tools, images can be taken from more than one satellite. Manipulations are carried out on these images using a variety of different methods. The number of studies that have been done on the manipulation of EO images is very small. In recent years, generative adversarial networks (GANs), a major breakthrough in deep learning, have made it very easy to obtain fake images. In this study, scene-by-scene fake images were obtained with the deep convolutional GAN on the EuroSAT dataset, which is one of the EO image sets, and fake scene images were obtained from the original scenes. In this study, a dataset called RF-EuroSAT was created. It consists of 14 classes and 36,000 images. Five transfer learning models (VGG-16, DenseNet201, MobileNetV2, RegNetY320, and ResNet152V2) were used to classify this dataset. Using these models as feature extraction and ensemble models (XGBoost, CatBoost, and LightGBM) as classifiers, the classification process was performed using our proprietary transferemble model. The best result was obtained with an accuracy of 91.55% using our transferemble model, which is developed in a modular structure.

  • Automatic Fake News Detection in Social Networks
    Doğa Bahar and Banu Diri

    IEEE
    With the development of technology, the spread of fake news on social networks is increasing. Many researchers and organizations have taken action to detect fake news manually or automatically. In this study, various Machine Learning Algorithms and Transformer based approaches are used to select the best performing model that can distinguish news as fake or real. In order to contribute to the Turkish literature in the field of Natural Language Processing (NLP), the dataset is specifically prepared in Turkish. The words were vectorized using Word2Vec, BERT and SBERT and classified using Machine Learning Algorithms such as Support Vector Machines, Naive Bayes, Logistic Regression, KNN and BERT/SBERT deep learning models. The highest F1 score of 0.99 was obtained from the transformer-based BERT and SBERT.

  • Detection of Style-Transfer Images Generated Using GAN with Deep Learning models
    Selim Sürücü and Banu Diri

    IEEE
    In recent years, with the development of various GAN architectures, studies on image reconstruction from images have accelerated. It is seen that GAN architectures are used in many different fields from health toe ntertainment sector for many purposes such as image colouring, fake image generation, style transfer and increasing the number of images. In this study, it is aimed to detect fake images created by style transfer using CycleGAN in remote sensing images. Fake and real images are tried to be detected with DenseNet121 , one of the transfer learning models, and voting model, which is an ensemble learning model using this model in feature extraction. It is seen that the proposed model with 98.33% accuracy and 0. 9677 MCC value is better than the 86.10% and 0.7220 values obtained from the classical transfer learning model.

  • Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition
    Cevahir Parlak, Banu Diri, and Yusuf Altun

    Springer Science and Business Media LLC

  • Diagnosis of Lung Diseases From Radiography Images Using Deep Learning Methods
    Onur Porsuk, Ahmet Elbir, and Banu Diri

    IEEE
    Nowadays, huge amounts of image data are being created for use in various fields. The use of these data in various fields such as medicine is of great importance. Thanks to the use of image analysis in the field of medicine for many years, there have been considerable improvements in the detection and treatment of diseases. In this study, Convolutional Neural Networks (CNN) which is one of the deep learning techniques has been utilized to extract distinctive features from radiography images. The classification of lung diseases has been implemented by using deep learning and machine learning models from these features. In this context, steps such as data type convertion, normalization and digitizing class information are used as pre-processing. While CNN's multilayer perceptron algorithm has been applied within the scope of deep learning, SVM and XGBoost algorithms have been implemented within the scope of machine learning. When the success of all methods has been analyzed, in multi classification the highest accuracy rate was achieved by %92,2 with the use of CNN and SVM together.

  • A Hybrid Phishing Detection System Using Deep Learning-based URL and Content Analysis
    Mehmet Korkmaz, Emre Kocyigit, Ozgur Koray Sahingoz, and Banu Diri

    Kaunas University of Technology (KTU)
    Phishing attacks are one of the most preferred types of attacks for cybercriminals, who can easily contact a large number of victims through the use of social networks, particularly through email messages. To protect end users, most of the security mechanisms control Uniform Resource Locator (URL) addresses because of their simplicity of implementation and execution speed. However, due to sophisticated attackers, this mechanism can miss some phishing attacks and has a relatively high false positive rate. In this research, a hybrid technique is proposed that uses not only URL features, but also content-based features as the second level of detection mechanism, thus improving the accuracy of the detection system while also minimizing the number of false positives. Additionally, most phishing detection algorithms use datasets that contain easily differentiated data pieces, either phishing or legitimate. However, in order to implement a more secure protection mechanism, we aimed to collect a larger and high-risk dataset. The proposed approaches were tested on this High-Risk URL and Content-Based Phishing Detection Dataset that only contains suspicious websites from PhishTank. According to experimental studies, an accuracy rate of 98.37 percent was achieved on a more realistic dataset for phishing detection.

  • IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION
    Zekeriya Anil Guven, Banu Diri, and Tolgahan Cakaloglu

    AGHU University of Science and Technology Press
    Data analysis becomes difficult with the increase of large amounts of data. More specifically, extracting meaningful insights from this vast amount of data and grouping them based on their shared features without human intervention requires advanced methodologies. There are topic modeling methods to overcome this problem in text analysis for downstream tasks, such as sentiment analysis, spam detection, and news classification. In this research, we benchmark several classifiers, namely Random Forest, AdaBoost, Naive Bayes, and Logistic Regression, using the classical LDA and n-stage LDA topic modeling methods for feature extraction in headlines classification. We run our experiments on 3 and 5 classes publicly available Turkish and English datasets. We demonstrate that n-stage LDA as a feature extractor obtains state-of-the-art performance for any downstream classifier. It should also be noted that Random Forest was the most successful algorithm for both datasets.

  • Special issue: Innovations in Intelligent Systems and Applications
    Petia Koprinkova-Hristova, Mirjana Ivanovic, and Banu Diri

    Informa UK Limited
    Intelligent Systems can be thought of as a concept with a very broad scope. They can have hardware from small microprocessors to large processors, micro-mechanics to macromechanics. Intelligent Systems can have software from low-level simple codes to much more complex codes, can be connected locally or via Internet or can work offline independently. They can be operated or managed remotely or can be autonomous and able to act as a rule-based system or as a learning capable Artificial Intelligence system. We can see ‘Intelligent Systems’ in every area that comes to mind such as Robotics, Finance, Industry, Space technologies, Education, Home Appliances, Health, Communication, Security, Military, Aviation, Energy, and so on. The 14th International Conference on Innovations in Intelligent Systems and Applications (INISTA 2020) took place during the period between 24 August and 26 August 2020 as an online event, organized by the University of Novi Sad, Serbia. Previously, INISTA had been hosted at Sofia, Bulgaria (2019), Thessaloniki, Greece (2018), Gdynia, Poland (2017), Sinaia, Romania (2016), Madrid, Spain (2015), Alberobello, Italy (2014), Albena, Bulgaria (2013), Trabzon, Turkey (2012), Istanbul, Turkey (2011), Kayseri, Turkey (2010), Trabzon, Turkey (2009), Istanbul, Turkey (2007), and Istanbul, Turkey (2005). Authors had been invited to submit high-quality, original research papers on the range of topics including, but not limited to, the following:

  • The Performance Comparison of Gene Co-expression Networks of Breast and Prostate Cancer using Different Selection Criteria
    Mustafa Özgür Cingiz, Göksel Biricik, and Banu Diri

    Springer Science and Business Media LLC
    Gene co-expression networks (GCN) present undirected relations between genes to understand molecular structures behind the diseases, including cancer. The utilization of various biological datasets and gene network inference (GNI) algorithms can reveal meaningful gene-gene interactions of GCNs. This study applies three GNI algorithms on mRNA gene expression, RNA-Seq, and miRNA-target genes datasets to infer GCNs of breast and prostate cancers. To evaluate the performance of the GCNs, we utilize overlap analysis via literature data, topological assessment, and Gene Ontology-based biological assessment. The results emphasize how the selection of biological datasets and GNI algorithms affect the performance results on different evaluation criteria. GCNs on microarray gene expression data slightly outperform in overlap analysis. Also, GCNs on RNA-Seq and gene expression datasets follow scale-free topology. The biological assessment results are close to each other on all biological datasets. C3NET algorithm-based GCNs did not contain any biological assessment modules; therefore, it is not optimal for biological assessment. GNI algorithms' selection did not change the overlap analysis and topological assessment results. Our primary objective is to compare the performance results of biological datasets and GNI algorithms based on different evaluation criteria. For this purpose, we developed the GNIAP R package that enables users to select different GNI algorithms to infer GCNs. The GNIAP R package also provides literature-based overlap analysis, and topological and biological analyses on GCNs. Users can access the GNIAP R package via https://github.com/ozgurcingiz/GNIAP .

  • Phishing Web Page Detection Using N-gram Features Extracted from URLs
    Mehmet Korkmaz, Emre Kocyigit, Ozgur Koray Sahingoz, and Banu Diri

    IEEE
    Recently, cyber-attacks have increased worldwide, especially during the pandemic period. The number of connected devices in the world and the anonymous structure of the internet enable this security deficit for not only computer networks but also single computing devices. With the connected use of computing device in anytime and anywhere conditions, lots of real-world activities are transferred to the digital world by adapting them to new lifestyles. Thus, the concept of cybersecurity has become more focused not only for security admins but also for academicians/researchers. Phishing attacks, which hackers mostly prefer to use in the last decade, have become even more harmful because its focuses on the weakest part of the security chain: computer user. Therefore, it is extremely important to prevent these cyber-attacks before they reach users. Based on this idea, we aimed to implement a phishing detection system by using a Convolutional Neural Network with n-gram features that are extracted from URLs. There are different n-gram feature extraction techniques, and in this work, it is aimed to determine which of them is more effective for our proposals. As a second goal, it is aimed to discover what parameters of the n-gram work best. In experiments, it is discovered that unigram has the highest accuracy rate. It was observed that, instead of all the characters that are obtained in unigram, the specified 70 characters (regardless of case sensitivity) give the highest accuracy rate of 88.90% with a High-Risk URL dataset. Experimental results also showed that a URL can be classified (either as legitimate or phishing) in about 0.008 seconds. These metrics can be accepted at a very good rate both in accuracy and run-time efficiency.

  • The effect of transfer learning on Turkish text classification
    Gurkan Sahin and B. Diri


    Text classification is one of the most important issues in natural language processing. In this study, texts belonging to different problems were classified using classical machine learning and deep learning methods. Additionally, transformer-based classifiers using transfer learning were also used, and the effects of transfer learning on classification success were examined. As a result of the experiments, it was seen that higher performance was obtained from the transfer learning based Bert classifier compared to other methods. With the study, transfer learning effect in Turkish text classification was examined in detail.

  • Deep Learning for Discussion-Based Cross-Domain Performance Prediction of MOOC Learners Grouped by Language on FutureLearn
    Ismail Duru, Ayse Saliha Sunar, Su White, and Banu Diri

    Springer Science and Business Media LLC
    Analysing learners’ behaviours in MOOCs has been used to identify predictive features associated with positive outcomes in engagement and learning success. Early methods predominantly analysed numerical features of behaviours such as the page views, video views, and assessment grades. Analysing extracted numeric features using baseline machine learning algorithms performed well to predict the learners’ future performance in MOOCs. We propose categorising learners by likely English language proficiency and extending the range of data to include the content of comment texts. We compare results to a model trained with a combined set of extracted features. Not all platforms provide this rich variety of data. We analysed a series of a FutureLearn language focused MOOCs. Our data were from discussions embedded into each lesson’s content. Analysing whether we gained any additional insights, over 420,000 comments were used to train the algorithm. We created a method for identifying one’s possible first language from their country. We found that using comments alone is a weaker predictive approach than using a combination including extracted features from learners’ activities. Our study contributes to research on generalisability of learning algorithms. We replicated the method across different MOOCs—the performance varies on the model though it always remained over 50%. One of the deep learning architecture, Bidirectional LSTM, trained with discussions on the language learning 73% successfully predicted learners’ performance on a different MOOC.

  • New developer metrics for open source software development challenges: An empirical study of project recommendation systems
    Abdulkadir Şeker, Banu Diri, and Halil Arslan

    MDPI AG
    Software collaboration platforms where millions of developers from diverse locations can contribute to the common open source projects have recently become popular. On these platforms, various information is obtained from developer activities that can then be used as developer metrics to solve a variety of challenges. In this study, we proposed new developer metrics extracted from the issue, commit, and pull request activities of developers on GitHub. We created developer metrics from the individual activities and combined certain activities according to some common traits. To evaluate these metrics, we created an item-based project recommendation system. In order to validate this system, we calculated the similarity score using two methods and assessed top-n hit scores using two different approaches. The results for all scores with these methods indicated that the most successful metrics were binary_issue_related, issue_commented, binary_pr_related, and issue_opened. To verify our results, we compared our metrics with another metric generated from a very similar study and found that most of our metrics gave better scores that metric. In conclusion, the issue feature is more crucial for GitHub compared with other features. Moreover, commenting activity in projects can be equally as valuable as code contributions. The most of binary metrics that were generated, regardless of the number of activities, also showed remarkable results. In this context, we presented improvable and noteworthy developer metrics that can be used for a wide range of open-source software development challenges, such as user characterization, project recommendation, and code review assignment.

  • Open Source Software Development Challenges: A Systematic Literature Review on GitHub
    Abdulkadir Seker, Banu Diri, Halil Arslan, and Mehmet Fatih Amasyalı

    IGI Global
    GitHub is the most common code hosting and repository service for open-source software (OSS) projects. Thanks to the great variety of features, researchers benefit from GitHub to solve a wide range of OSS development challenges. In this context, the authors thought that was important to conduct a literature review on studies that used GitHub data. To reach these studies, they conducted this literature review based on a GitHub dataset source study instead of a keyword-based search in digital libraries. Since GHTorrent is the most widely known GitHub dataset according to the literature, they considered the studies that cite this dataset for the systematic literature review. In this study, they reviewed the selected 172 studies according to some criteria that used the dataset as a data source. They classified them within the scope of OSS development challenges thanks to the information they extract from the metadata of studies. They put forward some issues about the dataset and they offered the focused and attention-grabbing fields and open challenges that we encourage the researchers to study on them.

  • Open Source Software Development Challenges: A Systematic Literature Review on GitHub
    Abdulkadir Seker, Banu Diri, Halil Arslan, and Mehmet Fatih Amasyalı

    IGI Global
    GitHub is the most common code hosting and repository service for open-source software (OSS) projects. Thanks to the great variety of features, researchers benefit from GitHub to solve a wide range of OSS development challenges. In this context, the authors thought that was important to conduct a literature review on studies that used GitHub data. To reach these studies, they conducted this literature review based on a GitHub dataset source study instead of a keyword-based search in digital libraries. Since GHTorrent is the most widely known GitHub dataset according to the literature, they considered the studies that cite this dataset for the systematic literature review. In this study, they reviewed the selected 172 studies according to some criteria that used the dataset as a data source. They classified them within the scope of OSS development challenges thanks to the information they extract from the metadata of studies. They put forward some issues about the dataset and they offered the focused and attention-grabbing fields and open challenges that we encourage the researchers to study on them.

  • Hate Speech Dataset from Turkish Tweets
    Islam Mayda, Yunus Emre Demir, Tugba Dalyan, and Banu Diri

    IEEE
    Today, while the content produced by users on online platforms increases rapidly due to the spread of the internet, hate speech expressions on these platforms also increase similarly. Social media platforms with millions ofusers are especially among the areas where hate speech expressions are shared frequently. Popular social media companies form their own policies within the scope of combating hate speech. However, the size of the data on the internet makes it almost impossible to do this manually. Consequently, especially in recent years, many studies have been conducted on the automatic detection of hate speech. While most of the studies in the literature are on English, there are published studies on hate speech detection in many languages such as German, French, Arabic, Indonesian, Portuguese. One of the main reasons for fewer studies in languages other than English is the smaller number and size of publicly shared hate speech datasets in those languages. There is a similar situation for Turkish. Therefore, within the scope of the study, a hate speech dataset comprising 10,224 Turkish tweets was generated and shared publicly. Tweets were labeled as hate, offensive, and none, and tweets tagged as hate were assigned subclass labels such as ethnic, religious, sexist, and political, which express the type of hate. In the first step of the labeling process, two annotators labeled all tweets separately. In the comparison made after this process, it was seen that the agreement rate in the given labels was 92.5%. Afterwards, the two annotators discussed the tweets they gave different labels by exchanging ideas and increased the agreement rate to 98.4%. For the remaining tweets, the opinion of the third evaluator was sought. After the labeling process, it was seen that the rate of hate speech in the data set was 22.8%. This publicly available data set, which is a first for Turkish in terms of its scope and size, is expected to be an important resource for automatic hate speech detection studies in Turkish.

  • Deep Neural Network Based Phishing Classification on a High-Risk URL Dataset
    Mehmet Korkmaz, Emre Kocyigit, Ozgur Koray Sahingoz, and Banu Diri

    Springer International Publishing

  • A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up
    Emre Delibaş, Ahmet Arslan, Abdulkadir Şeker, and Banu Diri

    Elsevier BV
    DNA sequence similarity analysis is an essential task in computational biology and bioinformatics. In nearly all research that explores evolutionary relationships, gene function analysis, protein structure prediction and sequence retrieving, it is necessary to perform similarity calculations. As an alternative to alignment-based sequence comparison methods, which result in high computational cost, alignment-free methods have emerged that calculate similarity by digitizing the sequence in a different space. In this paper, we proposed an alignment-free DNA sequence similarity analysis method based on top-k n-gram matches, with the prediction that common repeating DNA subsections indicate high similarity between DNA sequences. In our method, we determined DNA sequence similarities by measuring similarity among feature vectors created according to top-k n-gram match-up scores without the use of similarity functions. We applied the similarity calculation for three different DNA data sets of different lengths. The phylogenetic relationships revealed by our method show that our trees coincide almost completely with the results of the MEGA software, which is based on sequence alignment. Our findings show that a certain number of frequently recurring common sequence patterns have the power to characterize DNA sequences.

  • Sentiment Analysis for Hotel Attributes from Online Reviews
    Yunus Emre Demir, Semih Durmaz, Ahmet Elbir, Ibrahim Onur Sigirci, and Banu Diri

    IEEE
    Users evaluate hotels by using online reviews depend on their various attributes. In this study, sentiment analysis of the reviews consisting of these attributes has been carried out by using 11 attributes, which determined beforehand. Thanks to the this analysis, user’s overall assessment has been determined and summarized from reviews regarding any aspect of the hotels. In order to determine words with similar meaning to represent the 11 attributes, Word2Vec method have been employed. Ad- ditionally; FastText method have been utilized in an effort to disregard possible spelling errors. Since the employed data set is not labeled, VADER, which is a dictionary based method, has been employed in order to perform sentiment analysis. The performance score has been calculated by evaluating the comments in three categories as positive, negative and neutral.

  • MBTI Personality Prediction with Machine Learning
    Kaan Sonmezoz, Ozgur Ugur, and Banu Diri

    IEEE
    Personality traits, continuously affect our lives, from our behavior to our career decisions. It's possible to design more accurate recommendation systems and develop more efficient digital marketing strategies by the help of personality traits. In this work, people's MBTI personality traits were predicted according to their social media posts. Although it is the first study which uses Turkish language, the results show that personality type prediction can be applied in Turkish language as well. The best results were obtained when the MBTI dimensions were predicted. The F-Score values of those except one dimension are approximately 60%.

  • Irony Detection with Deep Learning in Turkish Microblogs
    Ahmet Karabas and Banu Diri

    IEEE
    The number of people sharing on social media is constantly increasing. One of the most popular microblogging sites is Twitter, where 500 million tweets are posted every day. Categorizing manually in such large data is a challenging task. Therefore, classification using autonomous systems is of great importance. Irony is a term in which the opposite meaning of something is meant. The verbal or spoken verb, under its serious appearance, aims to speak opposite or to draw the verb to the point of contradiction. Recently, after the successful results of emotion analysis over the tweets, studies on the determination of irony have been made. While it is easier to detect irony in face-toface conversation, it can be difficult even for normal people to understand it in written communication. The character limit on Twitter prevents some people from applying classification methods to spelling mistakes and carelessness in punctuation. For this reason, it has become obligatory to perform preprocessing steps in the first step. After correcting the data, machine learning and deep learning algorithms were applied with different parameters and the success of the results were examined and compared.

  • Open source software development challenges: A systematic literature review on GitHub
    Abdulkadir Seker, Banu Diri, Halil Arslan, and Mehmet Fatih Amasyalı

    IGI Global
    GitHub is the most common code hosting and repository service for open-source software (OSS) projects. Thanks to the great variety of features, researchers benefit from GitHub to solve a wide range of OSS development challenges. In this context, the authors thought that was important to conduct a literature review on studies that used GitHub data. To reach these studies, they conducted this literature review based on a GitHub dataset source study instead of a keyword-based search in digital libraries. Since GHTorrent is the most widely known GitHub dataset according to the literature, they considered the studies that cite this dataset for the systematic literature review. In this study, they reviewed the selected 172 studies according to some criteria that used the dataset as a data source. They classified them within the scope of OSS development challenges thanks to the information they extract from the metadata of studies. They put forward some issues about the dataset and they offered the focused and attention-grabbing fields and open challenges that we encourage the researchers to study on them.

  • An Evolutionary Approach to Multiple Traveling Salesman Problem for Efficient Distribution of Pharmaceutical Products
    Emre Kocyigit, Ozgur Koray Sahingoz, and Banu Diri

    IEEE
    Considerable growth of computer science has created novel solutions for variable problem fields and has increased the efficiency of available solutions. Evolutionary algorithms are quite successful in dealing with real-world problems that require optimization. In this article, we implemented a Genetic Algorithm that is well known evolutionary algorithm in order to provide an efficient solution for the Distribution of Pharmaceutical Products, which is a vital optimization problem, especially in situations such as a pandemic. The Multiple Traveling Salesman Problem approach was used to distribute pharmaceutical products as soon as possible. Moreover, we strengthened our proposal algorithm with 2-Opt Algorithm to get optimal results in earlier iterations. Different datasets from a library were applied to measure the quality of solutions and computation time. At the end of the work, we observed that our proposed algorithm generates successful solutions in an acceptable running time. This study will be extended with a new mutation concept as future work.

  • Detection of Phishing Websites by Using Machine Learning-Based URL Analysis
    Mehmet Korkmaz, Ozgur Koray Sahingoz, and Banu Diri

    IEEE
    In recent years, with the increasing use of mobile devices, there is a growing trend to move almost all real-world operations to the cyberworld. Although this makes easy our daily lives, it also brings many security breaches due to the anonymous structure of the Internet. Used antivirus programs and firewall systems can prevent most of the attacks. However, experienced attackers target on the weakness of the computer users by trying to phish them with bogus webpages. These pages imitate some popular banking, social media, e-commerce, etc. sites to steal some sensitive information such as, user-ids, passwords, bank account, credit card numbers, etc. Phishing detection is a challenging problem, and many different solutions are proposed in the market as a blacklist, rule-based detection, anomaly-based detection, etc. In the literature, it is seen that current works tend on the use of machine learning-based anomaly detection due to its dynamic structure, especially for catching the “zero-day” attacks. In this paper, we proposed a machine learning-based phishing detection system by using eight different algorithms to analyze the URLs, and three different datasets to compare the results with other works. The experimental results depict that the proposed models have an outstanding performance with a success rate.

  • Feature Selections for the Classification of Webpages to Detect Phishing Attacks: A Survey
    Mehmet Korkmaz, Ozgur Koray Sahingoz, and Banu Diri

    IEEE
    In recent years, due to the increased number of Internet-connected devices, almost all the real-world interactions are transferred to the cyberworld. Therefore, most of the commerce (especially in the e-commerce format) are executed over webpages. The anonymous and uncontrollable structure of Internet, enables the malicious use of this cyber environment for a relatively new crime format, named as e-crime, which mainly aims some illegal financial gain by cheating the standard end-users. Phishing attacks are one of the most preferred fraudulent technique which is used for getting some confidential information (like user-id, password, credit card information, etc.) of the end-users. Therefore, security admins of the networks try to decrease the number of victims is their companies. One principal protection mechanism is the use of blacklists to detect the phishing webpages. However, it has a significant deficiency in not protection about new page attacks. Most of the security admins use some learning systems which are trained by a pre-collected a-dataset by extracting some features from the URL and content of the web pages. The performance of the used system directly related with the features used for the classification. In this work, we aimed to analyze the previously used features in the classification of the web pages by making a comparative analysis about the literature. With this study, it is aimed to produce a general survey resource for the researchers, which aim to work on the classification of webpages or the security of the networks.

RECENT SCHOLAR PUBLICATIONS

  • Automatic Fake News Detection in Social Networks
    D Bahar, B Diri
    2023 Innovations in Intelligent Systems and Applications Conference (ASYU), 1-5 2023

  • Detection of Style-Transfer Images Generated Using GAN with Deep Learning models
    S Src, B Diri
    2023 8th International Conference on Computer Science and Engineering (UBMK 2023

  • Transferemble: a classification method for the detection of fake satellite images created with deep convolutional generative adversarial network
    S Src, B Diri
    Journal of Electronic Imaging 32 (4), 043004-043004 2023

  • Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition
    C Parlak, B Diri, Y Altun
    Arabian Journal for Science and Engineering, 1-15 2023

  • Haberler zerinden lkeler Arası İlişkilerin Analizi
    AB UZUN, Alperen, D Banu
    Bitlis Eren niversitesi Fen Bilimleri Dergisi 11 (1), 106-118 2022

  • A hybrid phishing detection system using deep learning-based URL and content analysis
    M Korkmaz, E Kocyigit, O Sahingoz, B Diri
    Elektronika ir Elektrotechnika 28 (5) 2022

  • Impact of n-stage latent Dirichlet allocation on analysis of headline classification
    ZA Guven, B Diri, T Cakaloglu
    Computer Science 23 2022

  • Open Source Software Development Challenges: A Systematic Literature Review on GitHub
    A Seker, B Diri, H Arslan, MF Amasyalı
    Research anthology on agile software, software development, and testing 2022

  • Aspect Based Opinion Mining on Hotel Reviews
    YE DEMİR, S DURMAZ, A ELBİR, İO SIĞIRCI, D Banu
    International Journal of Advances in Engineering and Pure Sciences 33, 28-34 2021

  • Trke Haberlerin Tr Tespiti İin Konu Modelleme Yntemlerinin Karşılaştırılması
    Z Gven, B Diri, T Cakaloglu
    2021

  • N-stage Latent Dirichlet allocation: a novel approach for LDA
    ZA Guven, B Diri, T Cakaloglu
    arXiv preprint arXiv:2110.08591 2021

  • Hate speech dataset from turkish tweets
    İ Mayda, YE Demir, T Dalyan, B Diri
    2021 Innovations in Intelligent Systems and Applications Conference (ASYU), 1-6 2021

  • Evaluation of Non-Negative Matrix Factorization and n-stage Latent Dirichlet Allocation for Emotion Analysis in Turkish Tweets
    ZA Guven, B Diri, T Cakaloglu
    arXiv preprint arXiv:2110.00418 2021

  • The performance comparison of gene co-expression networks of breast and prostate cancer using different selection criteria
    M Cingiz, G Biricik, B Diri
    Interdisciplinary Sciences: Computational Life Sciences 13 (3), 500-510 2021

  • Phishing web page detection using N-gram features extracted from URLs
    M Korkmaz, E Kocyigit, OK Sahingoz, B Diri
    2021 3rd International Congress on Human-Computer Interaction, Optimization 2021

  • The effect of transfer learning on Turkish text classification
    G Şahin, B Diri
    2021 29th Signal Processing and Communications Applications Conference (SIU 2021

  • Trke tweetler zerinde makine ğrenmesi ile nefret sylemi tespiti
    İ Mayda, D Banu, T YILDIZ
    Avrupa Bilim ve Teknoloji Dergisi, 328-334 2021

  • Deep learning for discussion-based cross-domain performance prediction of MOOC learners grouped by language on FutureLearn
    I Duru, AS Sunar, S White, B Diri
    Arabian Journal for Science and Engineering 46 (4), 3613-3629 2021

  • New developer metrics for open source software development challenges: An empirical study of project recommendation systems
    A Şeker, B Diri, H Arslan
    Applied Sciences 11 (3), 920 2021

  • Deep neural network based phishing classification on a high-risk url dataset
    M Korkmaz, E Kocyigit, OK Sahingoz, B Diri
    International conference on soft computing and pattern recognition, 648-657 2020

MOST CITED SCHOLAR PUBLICATIONS

  • A systematic review of software fault prediction studies
    C Catal, B Diri
    Expert systems with applications 36 (4), 7346-7354 2009
    Citations: 1124

  • Machine learning based phishing detection from URLs
    OK Sahingoz, E Buber, O Demir, B Diri
    Expert Systems with Applications 117, 345-357 2019
    Citations: 607

  • Derin ğrenme yntemleri ve uygulamaları hakkında bir inceleme
    A Şeker, B Diri, HH Balık
    Gazi Mhendislik Bilimleri Dergisi 3 (3), 47-64 2017
    Citations: 240

  • Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm
    C Catal, U Sevim, B Diri
    Expert Systems with Applications 38 (3), 2347-2353 2011
    Citations: 137

  • Automatic Turkish text categorization in terms of author, genre and gender
    MF Amasyalı, B Diri
    International Conference on Application of Natural Language to Information 2006
    Citations: 130

  • Clustering and metrics thresholds based software fault prediction of unlabeled program modules
    C Catal, U Sevim, B Diri
    2009 Sixth international conference on information technology: new 2009
    Citations: 123

  • Comparative proteogenomic analysis of right-sided colon cancer, left-sided colon cancer and rectal cancer reveals distinct mutational profiles
    R Imperial, Z Ahmed, OM Toor, C Erdoğan, A Khaliq, P Case, J Case, ...
    Molecular cancer 17, 1-7 2018
    Citations: 92

  • A corpus-based semantic kernel for text classification by using meaning values of terms
    B Altınel, MC Ganiz, B Diri
    Engineering Applications of Artificial Intelligence 43, 54-66 2015
    Citations: 91

  • Detection of phishing websites by using machine learning-based URL analysis
    M Korkmaz, OK Sahingoz, B Diri
    2020 11th International Conference on Computing, Communication and 2020
    Citations: 75

  • Web page classification using RNN
    E Buber, B Diri
    Procedia Computer Science 154, 62-72 2019
    Citations: 64

  • Software fault prediction with object-oriented metrics based artificial immune recognition system
    C Catal, B Diri
    International Conference on Product Focused Software Process Improvement 2007
    Citations: 62

  • An artificial immune system approach for fault prediction in object-oriented software
    C Catal, B Diri, B Ozumut
    2nd International Conference on Dependability of Computer Systems (DepCoS 2007
    Citations: 61

  • Automatic author detection for Turkish texts
    B Diri, MF Amasyalı
    Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP 2003
    Citations: 54

  • NLP based phishing attack detection from URLs
    E Buber, B Diri, OK Sahingoz
    Intelligent Systems Design and Applications: 17th International Conference 2018
    Citations: 53

  • Sentiment analysis on Twitter
    M Meral, B Diri
    2014 22nd Signal Processing and Communications Applications Conference (SIU 2014
    Citations: 49

  • A fault prediction model with limited fault data to improve test process
    C Catal, B Diri
    Product-Focused Software Process Improvement: 9th International Conference 2008
    Citations: 48

  • Twitter verileri ile duygu analizi
    ES Akgl, C Ertano, D Banu
    Pamukkale niversitesi Mhendislik Bilimleri Dergisi 22 (2), 106-110 2016
    Citations: 45

  • Unlabelled extra data do not always mean extra performance for semi‐supervised fault prediction
    C Catal, B Diri
    Expert Systems 26 (5), 458-471 2009
    Citations: 42

  • Abstract feature extraction for text classification
    G BİRİCİK, B Diri, AC SNMEZ
    Turkish Journal of Electrical Engineering and Computer Sciences 20 (7), 1137 2012
    Citations: 41

  • Software defect prediction using artificial immune recognition system
    C Catal, B Diri
    Proceedings of the 25th conference on IASTED international multi-conference 2007
    Citations: 41