@en.knu.ac.kr
Researcher at Center for ICT & Automobile Convergence (CITAC)
Kyungpook National University, Daegu, South Korea
Dr. Alwin Poulose was born in Manjapra, Kerala, India, in 1992. He received a B.Sc. degree in computer maintenance and electronics from the Union Christian College, Aluva, Kerala, India, in 2012, the M.Sc. degree in electronics from the MES College Marampally, Kerala, India, in 2014, the M. Tech degree in communication systems from Christ University, Bangalore, India in 2017, and the Ph.D. degree in electronics and electrical engineering from Kyungpook National University, Daegu, South Korea in 2021. His research interests include indoor localization, human activity recognition, facial emotion recognition, and human behavior prediction. He is a reviewer of prominent engineering and science international journals and has served as a technical program committee member at several international conferences. He is currently a researcher at the Center for ICT & Automobile Convergence (CITAC), Kyungpook National University, Daegu, South Korea.
2017/08/21 – 2021/08/31: Ph.D. Degree in Electronic and Electrical Engineering, Kyungpook National University, Daegu, South Korea.
2015/06/01 – 2017/05/21: M. Tech Degree in Electronics and Communication Engineering, Christ University, Bangalore, India.
2014/08/05 – 2015/04/25: Trainee for International English Language Testing System (IELTS), Newman's Academy, Angamaly, Kerala, India.
2012/07/02 – 2014/07/31: M.Sc. Degree in Electronics, MES College, Marampally, Kerala, India.
2009/07/01 – 2012/04/30: B. Sc. Degree in Computer Maintenance and Electronics, Aluva, Kerala, India.
indoor localization, human activity recognition, facial emotion recognition, and human behavior prediction
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Anagha Muralidharan, Amrutha Swaminathan, and Alwin Poulose
Elsevier BV
Chittathuru Himala Praharsha and Alwin Poulose
Elsevier BV
Chetan M Badgujar, Alwin Poulose, and Hao Gan
Elsevier BV
Anagha Muralidharan, Amrutha Swaminathan, and Alwin Poulose
Elsevier BV
Chittathuru Himala Praharsha and Alwin Poulose
IEEE
Sign language is the primary communication of the vocal and auditory impaired. Sign language includes hand gestures, facial emotions, and lip movements. Learning sign language helps in accessible communication with the impaired and promotes inclusivity. Sign language translation can be implemented using machine learning techniques to help the impaired seamlessly access their surrounding facilities equivalent to others. Indian Sign Language (ISL) has standard numbers (1–9) and English alphabets (A-Z) signs and expressions. Image classification using convolution neural networks (CNNs) is effective in computer vision and applications. The classification of ISL is challenging to implement through deep neural architectures in the real world due to their high computational complexity. This paper proposes a shallow convolution neural network (SCNN) to extract features and classify static sign-gesture images. To validate the performance of SCNN, we have tested the proposed model on the Kaggle Indian Sign Language (ISL) dataset. Our experimental results show that the proposed SCNN has achieved 98% validation accuracy and 98% F 1 score. We have proposed a composite score metric for performance analysis and hyper-parameter tuning. We have also validated the performance of SCNN on Sobel features of the Kaggle ISL Dataset. Data augmentation techniques and their effects on SCNN performance and analysis are explored through overfitting scenarios. The results indicate that SCNN performs significantly compared to existing deep learning models with less computational costs and lower training time.
Farsana K S and Alwin Poulose
IEEE
This paper explores developing and validating a hybrid classification model for assessing the risk of diabetes among the PIMA Indian population, a community known for its higher susceptibility to diabetes. Leveraging a comprehensive dataset that includes demographic information, genetic markers, lifestyle factors, and historical health records, our proposed hybrid convolutional neural networks (CNNs) employ advanced machine learning algorithms to classify the likelihood of diabetes onset. The proposed hybrid CNN models utilize unique features of machine learning models and a CNN architecture. The research aims to contribute to early detection and prevention strategies, ultimately improving health outcomes within the PIMA Indian community. Our analysis explores traditional machine learning models such as logistic regression, k-nearest neighbors (kNN), Gaussian Naive Bayes (GNB), support vector machine (SVM), decision tree (DT), random forest (RF), AdaBoost, LightGBM, XGBoost, CatBoost, and hybrid CNN models applied to the PIMA Indians Diabetes Database, focusing on classifying diabetes onset. The study provides insights into model performance, classification accuracy, and relevant evaluation metrics, allowing for a comprehensive assessment of the dataset's utility in diabetes prediction tasks. The outcomes of this study are anticipated to contribute significantly to the field of diabetes prediction, specifically tailored to the distinct characteristics of the PIMA Indian population. The developed hybrid CNN models promise to inform targeted interventions, preventive strategies, and personalized healthcare approaches, aiming to mitigate the burden of diabetes within the PIMA Indian community. As we delve into machine learning for health prediction, this research seeks to refine predictive capabilities and empower communities to manage health risks proactively.
Jadov Menaka and Alwin Poulose
IEEE
The customer personality analysis serves as an indepth exploration into the realm of a company's ideal customers, meticulously examining their characteristics, behaviors, and preferences. This paper's analytical approach aims to empower businesses by unraveling the intricacies of customer bases and enabling the customization of products based on diverse customer segments' specific needs and concerns. By leveraging advanced visualization techniques like histograms, bar charts, donut charts, and table charts within Tableau, this study seeks to enhance the understanding of customer attributes, facilitating more targeted and effective marketing strategies. Understanding customers is pivotal for businesses seeking to tailor their products and marketing efforts. Customer personality analysis delves into the wealth of data available, allowing for a nuanced understanding of customer segments. The objective is to shift from a generalized approach to a more targeted one, where resources are allocated efficiently and marketing efforts are tailored to resonate with specific customer groups.
Govindram Neware, John Paul, and Alwin Poulose
IEEE
Effective data visualization is essential for extracting meaningful insights from complex datasets in today's data-driven world. This paper showcases the power of data visualization by transforming the IMDB movies dataset into an interactive and informative dashboard using Tableau, a leading data visualization platform. The paper delves into the global distribution of movie releases, identifies the top-performing movie genres, examines the correlation between budget and box office success, highlights the most financially successful movies, and assesses the linguistic diversity of the dataset. The dashboard presented in the paper insights concisely through maps, treemaps, bar graphs, area charts, scatter plots, and pie charts. This paper also highlights the versatility of Tableau as a tool for creating interactive and visually appealing dashboards that a wide range of audiences can use.
Jovita Biju, Chetan Badgujar, and Alwin Poulose
IEEE
Ensuring access to safe drinking water is a critical global concern with significant implications for public health. This paper investigates the application of the hybrid machine learning model in assessing water potability, offering a comprehensive review of current methodologies and prospects. With water quality assessment a critical component of public health management, integrating machine learning techniques shows promising avenues for improving accuracy, efficiency, and predictive capabilities. This paper synthesizes existing literature on machine learning models in water quality analysis, highlighting various approaches, such as supervised and hybrid machine learning models utilized for water potability assessment. Furthermore, it examines using diverse data sources, including the pH level of the water, water hardness, total dissolved solids in the water, Chloramines concentration, sulfate concentration, electrical conductivity, organic carbon content, Trihalomethanes concentration, and turbidity level to enhance model performance and robustness. Our experiment results on the Water Quality and Potability dataset show that the proposed hybrid machine learning model achieved 68% classification accuracy compared to traditional supervised machine learning techniques. By critically evaluating the strengths and limitations of supervised and hybrid machine learning models, our research contributes to the ongoing discourse on leveraging technology to safeguard water quality and public health, ultimately fostering sustainable water management practices.
Faheena Thesni, Goutham Raj, and Alwin Poulose
IEEE
This paper employs Tableau as a robust analytical tool to examine Netflix’s content distribution, utilizing the Kaggle dataset. Objectives include exploring the country of origin, genre categorizations, and audience ratings through visualization techniques like total counts, geographical mapping, and trend analysis. Our study aims to unveil nuanced insights into Netflix’s global content landscape. Analyzing content distribution by country reveals distinct patterns, emphasizing the prevalence of American productions and illustrating Netflix’s extensive global influence. Tableau captures regional dominance and provides a detailed understanding of content preferences across demographics and locations. For genre preferences, Tableau visually represents content distribution across genres, identifying popular genres and offering insights into potential shifts in audience preferences. This contributes to a holistic understanding of viewer choices and consumption patterns. The analysis extends to audience ratings, unraveling viewer perceptions through Tableau’s visualization capabilities. This sheds light on correlations between genres and ratings, providing insights into audience opinions that may influence strategic content decisions. Through trend analysis, our study provides a thorough overview of how Netflix’s content production has evolved. This reveals emerging trends and highlights strategic decisions made by the streaming platform in response to viewer preferences and industry dynamics.
Swetha Rajeevan, Sourav Ramachandran, and Alwin Poulose
IEEE
In this paper, we present a comprehensive analysis and visualization of COVID-19 data in India using Tableau, emphasizing key metrics such as the total number of cases, the distribution of confirmed cases by age group and gender, and the total number of deaths across states. Additionally, the paper delves into state-wise testing details, the count of testing labs in each state, and the administration of COVID-19 vaccines. The visualizations of this report dynamically show the pandemic’s trajectory, accompanied by interactive graphs showing how many cases there are over time. Further insights are derived from demographic breakdowns, illustrating the proportion of confirmed cases by age group and gender. This analysis not only aids in understanding the impact on various demographic segments but also aids in making targeted public health interventions. A specific focus is placed on mortality, with state-wise graphs detailing the total number of deaths. These visualizations contribute to a nuanced understanding of the geographic distribution of the severity of the pandemic, aiding policymakers in resource allocation and response planning. The report also incorporates a detailed examination of testing strategies, providing state-wise testing details and highlighting the number of testing labs in each region. This information is crucial for assessing the robustness of testing infrastructure and guiding future testing strategies. The integration of Tableau in this analysis ensures that the visualizations are informative and interactive, allowing users to explore the data in real-time and customize views based on their specific interests or inquiries. By presenting this comprehensive analysis, the paper aims to contribute valuable insights to the collective understanding of the COVID-19 pandemic in India and provide a practical resource for decision-makers, researchers, and the general public.
Abhishek J Chandran, Jovita Biju, and Alwin Poulose
IEEE
This paper presents an extensive seismic data analysis using Tableau as a gateway. Earthquakes bring on landslides, tsunamis, and other deadly natural disasters. Therefore, for a more successful analysis, it is essential to comprehend the components associated with earthquakes. We start our data analysis by examining the trends or periodicity of earthquake occurrence on a monthly and annual basis. This aids in determining the precise time at which earthquakes occur. Next, we discuss the various categories that exist for earthquakes. Seismic occurrences are classified into classes according to their magnitude and range, with each class displaying a different degree of destruction. Understanding the frequency at which these various classes occur is essential to having a thorough understanding of seismic patterns. Subsequently, we show the relationship between earthquake frequency, depth, and magnitude. Since these two variables are essential to the research on the subject, we next look for any correlation between magnitude and depth. After that, we examine a worldwide map to show how earthquakes are distributed geographically, offering an in-depth understanding of areas with a high frequency of seismic activity. Developing techniques for efficient risk mitigation requires this exploration. Tableau’s integration into this study ensures dynamic and informative visualizations, allowing users to explore the data in real time and customize views to meet their interests or questions. This comprehensive analysis is presented in a way that advances the global understanding of earthquakes and makes the paper an invaluable resource for researchers, policymakers, and the general public.
A J Aaqilah, K S Farsana, Ephream Jude George, and Alwin Poulose
IEEE
This paper comprehensively explores credit card complaints in the United States spanning 2015 to 2021, utilizing Tableau’s robust visualization capabilities. The primary objective is to unravel insights into multiple dimensions, including trends, geographical patterns, issuer performance, temporal variations, complaint categories, resolutions, and submission channels associated with credit card grievances. Our objective is to identify overarching trends in credit card complaints over the specified period, fostering a nuanced understanding of the dynamic nature of consumer issues. A pivotal aspect of our examination involves geographical analysis, delving into the distribution of complaints across different states to pinpoint regions with higher complaint rates. This geographic scrutiny offers valuable insights into the regional dynamics of credit card-related concerns, providing stakeholders with localized perspectives and facilitating targeted interventions. Temporal patterns are also under our investigative lens as we seek to uncover spikes during specific months or years. This temporal analysis adds a critical dimension to the credit card complaint landscape, offering a nuanced understanding of when particular issues may be more prevalent. By aligning patterns with temporal occurrences, our analysis contributes to a richer comprehension of the cyclical nature of credit card grievances. The study presented in this paper stands to benefit Stakeholders, including financial institutions, regulatory bodies, and consumers alike. It guides informed decision-making, empowering stakeholders to navigate the complexities of credit card challenges. As we navigate through the visual narratives, the symbiotic relationship between data insights and actionable decisions becomes apparent, paving the way for a more transparent, responsive, and robust credit card ecosystem.
Jung Hwan Kim, Alwin Poulose, and Dong Seog Han
Institute of Electrical and Electronics Engineers (IEEE)
Facial emotion recognition (FER) detects a user’s facial expression with the camera sensors and behaves according to the user’s emotions. The FER can apply to entertainment, security, and traffic safety. The FER system requires a highly accurate and efficient algorithm to classify the driver’s emotions. The-state-of-art architectures for FER, such as visual geometry group (VGG), Inception-V1, ResNet, and Xception, have some level of performance for classification. Nevertheless, the original VGG architectures suffer from the vanishing gradient, limited improvement performance, and expensive computational cost. In this paper, we propose the customized visual geometry group-19 (CVGG-19), which adopts the designs of the VGG, Inception-v1, ResNet, and Xception. Our proposed CVGG-19 architecture outperforms the conventional VGG-19 architecture by 59.29%, reducing the computational cost by 89.5%. Moreover, the CVGG-19 architecture’s F1-score, which represents the real-time classifying performance, displays superior to the Inception-V1, ResNet50, and Xception architectures by 3.86% on average
Rutika Sansaria, Krishanu Dey Das, and Alwin Poulose
Elsevier BV
Mayank Kumar, Sachin Kumar, Shubhro Chakrabartty, Alwin Poulose, Hala Mostafa, and Bhawna Goyal
MDPI AG
This paper creates an approximate three-dimensional model for normal and cancerous cervical cells using image processing and computer-aided design (CAD) tools. The model is then exposed to low-frequency electric pulses to verify the work with experimental data. The transmembrane potential, pore density, and pore radius evolution are analyzed. This work adds a study of the electrodeformation of cells under an electric field to investigate cytoskeleton integrity. The Maxwell stress tensor is calculated for the dispersive bi-lipid layer plasma membrane. The solid displacement is calculated under electric stress to observe cytoskeleton integrity. After verifying the results with previous experiments, the cells are exposed to a nanosecond pulsed electric field. The nanosecond pulse is applied using a drift-step rectifier diode (DSRD)-based generator circuit. The cells’ transmembrane voltage (TMV), pore density, pore radius evolution, displacement of the membrane under electric stress, and strain energy are calculated. A thermal analysis of the cells under a nanosecond pulse is also carried out to prove that it constitutes a non-thermal process. The results showed differences in normal and cancerous cell responses to electric pulses due to changes in morphology and differences in the cells’ electrical and mechanical properties. This work is a model-driven microdosimetry method that could be used for diagnostic and therapeutic purposes.
Shubhro Chakrabartty, Abdulkarem H. M. Almawgani, Sachin Kumar, Mayank Kumar, Suvojit Acharjee, Alaaddin Al-Shidaifat, Alwin Poulose, and Turki Alsuwian
MDPI AG
Memristive devices have garnered significant attention in the field of electronics over the past few decades. The reason behind this immense interest lies in the ubiquitous nature of memristive dynamics within nanoscale devices, offering the potential for revolutionary applications. These applications span from energy-efficient memories to the development of physical neural networks and neuromorphic computing platforms. In this research article, the angle toppling technique (ATT) was employed to fabricate titanium dioxide (TiO2) nanoparticles with an estimated size of around 10 nm. The nanoparticles were deposited onto a 50 nm SiOx thin film (TF), which was situated on an n-type Si substrate. Subsequently, the samples underwent annealing processes at temperatures of 550 °C and 950 °C. The structural studies of the sample were done by field emission gun-scanning electron microscope (FEG-SEM) (JEOL, JSM-7600F). The as-fabricated sample exhibited noticeable clusters of nanoparticles, which were less prominent in the samples annealed at 550 °C and 950 °C. The element composition revealed the presence of titanium (Ti), oxygen (O2), and silicon (Si) from the substrate within the samples. X-ray diffraction (XRD) analysis revealed that the as-fabricated sample predominantly consisted of the rutile phase. The comparative studies of charge storage and endurance measurements of as-deposited, 550 °C, and 950 °C annealed devices were carried out, where as-grown device showed promising responses towards brain computing applications. Furthermore, the teaching–learning-based optimization (TLBO) technique was used to conduct further comparisons of results.
Jung Hwan Kim, Alwin Poulose, Savina Jassica Colaco, Suresh Neethirajan, and Dong Seog Han
MDPI AG
The advent of artificial intelligence (AI) in animal husbandry, particularly in pig interaction recognition (PIR), offers a transformative approach to enhancing animal welfare, promoting sustainability, and bolstering climate resilience. This innovative methodology not only mitigates labor costs but also significantly reduces stress levels among domestic pigs, thereby diminishing the necessity for constant human intervention. However, the raw PIR datasets often encompass irrelevant porcine features, which pose a challenge for the accurate interpretation and application of these datasets in real-world scenarios. The majority of these datasets are derived from sequential pig imagery captured from video recordings, and an unregulated shuffling of data often leads to an overlap of data samples between training and testing groups, resulting in skewed experimental evaluations. To circumvent these obstacles, we introduced a groundbreaking solution—the Semi-Shuffle-Pig Detector (SSPD) for PIR datasets. This novel approach ensures a less biased experimental output by maintaining the distinctiveness of testing data samples from the training datasets and systematically discarding superfluous information from raw images. Our optimized method significantly enhances the true performance of classification, providing unbiased experimental evaluations. Remarkably, our approach has led to a substantial improvement in the isolation after feeding (IAF) metric by 20.2% and achieved higher accuracy in segregating IAF and paired after feeding (PAF) classifications exceeding 92%. This methodology, therefore, ensures the preservation of pertinent data within the PIR system and eliminates potential biases in experimental evaluations. As a result, it enhances the accuracy and reliability of real-world PIR applications, contributing to improved animal welfare management, elevated food safety standards, and a more sustainable and climate-resilient livestock industry.
Savina Jassica Colaco, Jung Hwan Kim, Alwin Poulose, Suresh Neethirajan, and Dong Seog Han
MDPI AG
Thermal imaging is increasingly used in poultry, swine, and dairy animal husbandry to detect disease and distress. In intensive pig production systems, early detection of health and welfare issues is crucial for timely intervention. Using thermal imaging for pig treatment classification can improve animal welfare and promote sustainable pig production. In this paper, we present a depthwise separable inception subnetwork (DISubNet), a lightweight model for classifying four pig treatments. Based on the modified model architecture, we propose two DISubNet versions: DISubNetV1 and DISubNetV2. Our proposed models are compared to other deep learning models commonly employed for image classification. The thermal dataset captured by a forward-looking infrared (FLIR) camera is used to train these models. The experimental results demonstrate that the proposed models for thermal images of various pig treatments outperform other models. In addition, both proposed models achieve approximately 99.96–99.98% classification accuracy with fewer parameters.
Samuel Kakuba, Alwin Poulose, and Dong Seog Han
Institute of Electrical and Electronics Engineers (IEEE)
Though acoustic speech emotion recognition has been studied for a while, bimodal speech emotion recognition using both acoustic and text has gained momentum since speech emotion recognition doesn’t only involve the acoustic modality. However, there is less review work on the available bimodal speech emotion recognition (SER) research. The review works available mostly concentrate on the use of convolution neural networks (CNNs) and recurrent neural networks (RNNs). However, recent deep learning techniques like attention mechanisms and fusion strategies have shaped the bimodal SER research without explicit analysis of their significance when used singly or in combination with the traditional deep learning techniques. We therefore, review the recently published literature that involves these deep learning techniques in this paper to ascertain the current trends and challenges of bimodal SER research that have hampered it to be fully deployed in the natural environment for off-the-shelf SER applications. In addition, we carried out experiments to ascertain the optimal combination of acoustic features and the significance of the attention mechanisms and their combination with the traditional deep learning techniques. We propose a multi-technique model called the deep learning-based multi-learning model for emotion recognition (DBMER) that operates with multi-learning capabilities of CNNs, RNNs, and multi-head attention mechanisms. We noted that attention mechanisms play a pivotal role in the performance of bimodal dyadic SER systems. However, few publicly available datasets, the difficulty in acquisition of bimodal SER data, cross-corpus and multilingual studies remain open problems in bimodal SER research. Our experiments on the proposed DBMER model showed that though each of the deep learning techniques benefits the task, the results are more accurate and robust when they are used in careful combination with multi-level fusion approaches.
Alwin Poulose
MDPI AG
Visible light communication (VLC ) is an emerging research area in wireless communication. The system works the same way as optical fiber-based communication systems. However, the VLC system uses free space as its transmission medium. The invention of the light-emitting diode (LED) significantly updated the technologies used in modern communication systems. In VLC, the LED acts as a transmitter and sends data in the form of light when the receiver is in the line of sight (LOS) condition. The VLC system sends data by blinking the light at high speed, which is challenging to identify by human eyes. The detector receives the flashlight at high speed and decodes the transmitted data. One significant advantage of the VLC system over other communication systems is that it is easy to implement using an LED and a photodiode or phototransistor. The system is economical, compact, inexpensive, small, low power, prevents radio interference, and eliminates the need for broadcast rights and buried cables. In this paper, we investigate the performance of an indoor VLC system using Optisystem simulation software. We simulated an indoor VLC system using LOS and non-line-of-sight (NLOS) propagation models. Our simulation analyzes the LOS propagation model by considering the direct path with a single LED as a transmitter. The NLOS propagation model-based VLC system analyses two scenarios by considering single and dual LEDs as its transmitter. The effect of incident and irradiance angles in an LOS propagation model and an eye diagram of LOS/NLOS models are investigated to identify the signal distortion. We also analyzed the impact of the field of view (FOV) of an NLOS propagation model using a single LED as a transmitter and estimated the bitrate (Rb). Our theoretical results show that the system simulated in this paper achieved bitrates in the range of 2.1208×107 to 4.2147×107 bits/s when the FOV changes from 30∘ to 90∘. A VLC hardware design is further considered for real-time implementations. Our VLC hardware system achieved an average of 70% data recovery rate in the LOS propagation model and a 40% data recovery rate in the NLOS propagation model. This paper’s analysis shows that our simulated VLC results are technically beneficial in real-world VLC systems.
Raj Mouli Jujjavarapu and Alwin Poulose
MDPI AG
Micro-processor designs have become a revolutionary technology almost in every industry. They brought the reality of automation and also electronic gadgets. While trying to improvise these hardware modules to handle heavy computational loads, they have substantially reached a limit in size, power efficiency, and similar avenues. Due to these constraints, many manufacturers and corporate entities are trying many ways to optimize these mini beasts. One such approach is to design microprocessors based on the specified operating system. This approach came to the limelight when many companies launched their microprocessors. In this paper, we will look into one method of using an arithmetic logic unit (ALU) module for internet of things (IoT)-enabled devices. A specific set of operations is added to the classical ALU to help fast computational processes in IoT-specific programs. We integrated a compression module and a fast multiplier based on the Vedic algorithm in the 16-bit ALU module. The designed ALU module is also synthesized under a 32-nm HVT cell library from the Synopsys database to generate an overview of the areal efficiency, logic levels, and layout of the designed module; it also gives us a netlist from this database. The synthesis provides a complete overview of how the module will be manufactured if sent to a foundry.
Samuel Kakuba, Alwin Poulose, and Dong Seog Han
Institute of Electrical and Electronics Engineers (IEEE)
The detection and classification of emotional states in speech involves the analysis of audio signals and text transcriptions. There are complex relationships between the extracted features at different time intervals which ought to be analyzed to infer the emotions in speech. These relationships can be represented as spatial, temporal and semantic tendency features. In addition to emotional features that exist in each modality, the text modality consists of semantic and grammatical tendencies in the uttered sentences. Spatial and temporal features have been extracted sequentially in deep learning-based models using convolutional neural networks (CNN) followed by recurrent neural networks (RNN) which may not only be weak at the detection of the separate spatial-temporal feature representations but also the semantic tendencies in speech. In this paper, we propose a deep learning-based model named concurrent spatial-temporal and grammatical (CoSTGA) model that concurrently learns spatial, temporal and semantic representations in the local feature learning block (LFLB) which are fused as a latent vector to form an input to the global feature learning block (GFLB). We also investigate the performance of multi-level feature fusion compared to single-level fusion using the multi-level transformer encoder model (MLTED) that we also propose in this paper. The proposed CoSTGA model uses multi-level fusion first at the LFLB level where similar features (spatial or temporal) are separately extracted from a modality and secondly at the GFLB level where the spatial-temporal features are fused with the semantic tendency features. The proposed CoSTGA model uses a combination of dilated causal convolutions (DCC), bidirectional long short-term memory (BiLSTM), transformer encoders (TE), multi-head and self-attention mechanisms. Acoustic and lexical features were extracted from the interactive emotional dyadic motion capture (IEMOCAP) dataset. The proposed model achieves 75.50% and 75.82% of weighted and unweighted accuracy, 75.32% and 75.57% of recall and F1 score respectively. These results imply that concurrently learned spatial-temporal features with semantic tendencies learned in a multi-level approach improve the model’s effectiveness and robustness.
Samuel Kakuba, Alwin Poulose, and Dong Seog Han
Institute of Electrical and Electronics Engineers (IEEE)
The success of deep learning in speech emotion recognition has led to its application in resource-constrained devices. It has been applied in human-to-machine interaction applications like social living assistance, authentication, health monitoring and alertness systems. In order to ensure a good user experience, robust, accurate and computationally efficient deep learning models are necessary. Recurrent neural networks (RNN) like long short-term memory (LSTM), gated recurrent units (GRU) and their variants that operate sequentially are often used to learn time series sequences of the signal, analyze long-term dependencies and the contexts of the utterances in the speech signal. However, due to their sequential operation, they encounter problems in convergence and sluggish training that uses a lot of memory resources and encounters the vanishing gradient problem. In addition, they do not consider spatial cues that may exist in the speech signal. Therefore, we propose an attention-based multi-learning model (ABMD) that uses residual dilated causal convolution (RDCC) blocks and dilated convolution (DC) layers with multi-head attention. The proposed ABMD model achieves comparable performance while taking global contextualized long-term dependencies between features in a parallel manner using a large receptive field with less increase in the number of parameters compared to the number of layers and considers spatial cues among the speech features. Spectral and voice quality features extracted from the raw speech signals are used as inputs. The proposed ABMD model obtained a recognition accuracy and F1 score of 93.75% and 92.50% on the SAVEE datasets, 85.89% and 85.34% on the RAVDESS datasets and 95.93% and 95.83% on the EMODB datasets. The model’s robustness in terms of the confusion ratio of the individual discrete emotions especially happiness which is often confused with emotions that belong to the same dimensional plane with it also improved when validated on the same datasets.
Alwin Poulose, Minjin Baek, and Dong Seog Han
IEEE
Autonomous vehicles are the future intelligent vehicles, which are expected to reduce the number of human drivers, improve efficiency, avoid collisions, and become the ideal city vehicles of the future. To achieve this goal, vehicle manufacturers have started to work in this field to harness the potential and solve current challenges to achieve the desired results. In this sense, the first challenge is transforming conventional vehicles into autonomous ones that meet users’ expectations. The evolution of conventional vehicles into autonomous vehicles includes the adoption and improvement of different technologies and computer algorithms. The essential task affecting the autonomous vehicle’s performance is its localization, apart from perception, path planning, and control, and the accuracy and efficiency of localization play a crucial role in autonomous driving. In this paper, we describe the implementation of map-based localization using point cloud matching for autonomous vehicles. The Robot Operating System (ROS) along with Autoware, which is an open-source software platform for autonomous vehicles, are utilized for the implementation of the vehicle localization system presented in this paper. Point cloud maps are generated based on 3D lidar points, and a normal distributions transform (NDT) matching algorithm is used for localizing the test vehicle through matching real-time lidar measurements with the pre-built point cloud maps. The experiment results show that the map-based localization system using 3D lidar scans enables real-time localization performance that is sufficiently accurate and efficient for autonomous driving in a campus environment. The paper comprises the methods used for point cloud map generation and vehicle localization as well as the step-by-step procedure for the implementation with a ROS-based system for the purpose of autonomous driving.