@duth.gr
Associate Professor at Electrical and Computer Engineering Department
Democritus University of Thrace
Nikolaos Mitianoudis received the Diploma in Electronic and Computer Engineering from the Aristotle University of Thessaloniki, Greece in 1998. He received the MSc in Communications and Signal Processing from Imperial College London, UK in 2000 and the PhD in Audio Source Separation using Independent Component Analysis from Queen Mary, University of London, UK in 2004. Between 2003 and 2009, he was a Research Associate at the Electrical and Electronic Engineering Department at Imperial College London, UK working on the Data Information Fusion-Defense Technology Centre project "Applied Multi-Dimensional Fusion", sponsored by General Dynamics UK and QinetiQ. From 2009 until 2010, he was an Academic Assistant at the International Hellenic University in Thermi, Greece. In 2010, he joined the Electrical and Computer Engineering Department at the Democritus University of Thrace, Greece, where he currently serves as an Associate Professor in Audio and Image Processing.
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, and Nikos Papamarkos
Institute of Electrical and Electronics Engineers (IEEE)
Estimation of depth in two-dimensional images is among the challenging topics in Computer Vision. This is a well-studied but also an ill-posed problem, which has long been the focus of intense research. This paper is an in-depth review of the topic, presenting two aspects, one that considers the mechanisms of human depth perception, and another that includes the various Deep Learning approaches. The methods are presented in a compact and structured way that outlines the topic and categorizes the approaches according to the line of research followed in the recent decade. Although there has been significant advancement in the topic, it was without any connection with human depth perception and the potential benefits from this sector.
Thomas Stogiannopoulos and Nikolaos Mitianoudis
MDPI AG
This study investigates the potential of low-cost infrared cameras for non-contact monitoring of blood pressure (BP) in individuals with fragile health, particularly the elderly. Previous research has shown success in developing non-contact BP monitoring using RGB cameras. In this study, the Eulerian Video Magnification (EVM) technique is employed to enhance minor variations in skin pixel intensity in specific facial regions captured by an infrared camera from the forehead and palm. The primary focus of this study is to explore the possibility of using infrared cameras for non-contact BP monitoring under low-light or night-time conditions. We have successfully shown that by employing a series of straightforward signal processing techniques and regression analysis, we were able to achieve commendable outcomes in our experimental setup. Specifically, we were able to surpass the stringent accuracy standards set forth by the British Hypertension Society (BHS) and the Association for the Advancement of Medical Instrumentation (AAMI) protocol.
Nikolaos Detsikas, Nikolaos Mitianoudis, and Nikolaos Papamarkos
Elsevier BV
Grigorios-Aris Cheimariotis and Nikolaos Mitianoudis
MDPI AG
This work describes a methodology for sound event detection in domestic environments. Efficient solutions in this task can support the autonomous living of the elderly. The methodology deals with the “Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE)” 2023, and more specifically with Task 4a “Sound event detection of domestic activities”. This task involves the detection of 10 common events in domestic environments in 10 s sound clips. The events may have arbitrary duration in the 10 s clip. The main components of the methodology are data augmentation on mel-spectrograms that represent the sound clips, feature extraction by passing spectrograms through a frequency-dynamic convolution network with an extra attention module in sequence with each convolution, concatenation of these features with BEATs embeddings, and use of BiGRU for sequence modeling. Also, a mean teacher model is employed for leveraging unlabeled data. This research focuses on the effect of data augmentation techniques, of the feature extraction models, and on self-supervised learning. The main contribution is the proposed feature extraction model, which uses weighted attention on frequency in each convolution, combined in sequence with a local attention module adopted by computer vision. The proposed system features promising and robust performance.
Thomas Stogiannopoulos, Grigorios-Aris Cheimariotis, and Nikolaos Mitianoudis
MDPI AG
This work explores the use of infrared low-cost cameras for monitoring peripheral oxygen saturation (SpO2), a vital sign that is particularly important for individuals with fragile health, such as the elderly. The development of contactless SpO2 monitoring utilizing RGB cameras has already proven successful. This study utilizes the Eulerian Video Magnification (EVM) technique to enhance minor variations in skin pixel intensity in particular facial regions. More specifically, the emphasis in this study is in the utilization of infrared cameras, in order to explore the possibility of contactless SpO2 monitoring under low-light or night-time conditions. Many different methods were employed for regression. A study of machine learning regression methods was performed, including a Generalized Additive Model (GAM) and an Extra Trees Regressor, based on 12 novel features extracted from the extracted amplified photoplethysmography (PPG) signal. Deep learning methods were also explored, including a 3D Convolution Neural Network (CNN) and a Video Vision Transformer (ViViT) architecture on the amplified forehead/cheeks video. The estimated SpO2 values of the best performing method reach a low root mean squared error of 1.331 and an R2 score of 0.465 that fall within the acceptable range for these applications.
Thomas Stogiannopoulos, Grigorios-Aris Cheimariotis, and Nikolaos Mitianoudis
IEEE
Peripheral oxygen saturation (SpO2) is one important vital sign to be monitored in individuals, whose health is fragile, such as the elderly. Contactless SpO2 monitoring using RGB cameras has been already developed with satisfactory results. This work explores the case of achieving an acceptable level of performance, when the lightning conditions are not optimal, particularly during night time, by processing solely infrared low-cost camera recordings. The Eulerian Video Magnification (EVM) technique was used to enhance the subtle differences in skin pixel intensity in the facial area. Two approaches were explored for performing regression: one using 12 novel features extracted from the amplified photoplethysmography (PPG) signal and Generalized Additive Models and a second using a 3D Convolution Neural Network (CNN) architecture on the raw amplified forehead video. The root mean square error in the estimated SpO2 levels using both methods is minimal and in the accepted range for these applications.
Pantazis Anagnostou and Nikolaos Mitianoudis
IEEE
In this paper, we explore the problem of automatic detection of dangerous or distracted driving using multi-modal cameras. A deep convolutional model with Gated Recurrent Unit (GRU) layers for classification on the Driver Anomaly Detection (DAD) dataset is proposed. The key features are the limited use of 3D convolutions and the replacement of 2D convolutions with depth-wise separable convolutions, which reduce the computational complexity of the model to a small fraction of previous architectures with a small decrease in AUC performance. In addition, the threshold for binary classification between safe and distracted driving is adaptively estimated through the training data. Finally, an ensemble of all multi-modal inputs yields the final classification with favourable performance.
Odysseas Bouzos, Ioannis Andreadis, and Nikolaos Mitianoudis
Institute of Electrical and Electronics Engineers (IEEE)
The limited depth of field of optical lenses, makes multi-focus image fusion (MFIF) algorithms of vital importance. Lately, Convolutional Neural Networks (CNN) have been widely adopted in MFIF methods, however their predictions mostly lack structure and are limited by the size of the receptive field. Moreover, since images have noise due to various sources, the development of MFIF methods robust to image noise is required. A novel robust to noise Convolutional Neural Network-based Conditional Random Field (mf-CNNCRF) model is introduced. The model takes advantage of the powerful mapping between input and output of CNN networks and the long range interactions of the CRF models in order to reach structured inference. Rich priors for both unary and smoothness terms are learned by training CNN networks. The $\\alpha $ -expansion graph-cut algorithm is used to reach structured inference for MFIF. A new dataset, which includes clean and noisy image pairs, is introduced and is used to train the networks of both CRF terms. A low-light MFIF dataset is also developed to demonstrate real-life noise introduced by the camera sensor. Qualitative and quantitative evaluation prove that mf-CNNCRF outperforms state-of-the-art MFIF methods for clean and noisy input images, while being more robust to different noise types without requiring prior knowledge of noise.
Odysseas Bouzos, Ioannis Andreadis, and Nikolaos Mitianoudis
MDPI AG
Multi-Focus image fusion is of great importance in order to cope with the limited Depth-of-Field of optical lenses. Since input images contain noise, multi-focus image fusion methods that support denoising are important. Transform-domain methods have been applied to image fusion, however, they are likely to produce artifacts. In order to cope with these issues, we introduce the Conditional Random Field (CRF) CRF-Guided fusion method. A novel Edge Aware Centering method is proposed and employed to extract the low and high frequencies of the input images. The Independent Component Analysis—ICA transform is applied to high-frequency components and a Conditional Random Field (CRF) model is created from the low frequency and the transform coefficients. The CRF model is solved efficiently with the α-expansion method. The estimated labels are used to guide the fusion of the low-frequency components and the transform coefficients. Inverse ICA is then applied to the fused transform coefficients. Finally, the fused image is the addition of the fused low frequency and the fused high frequency. CRF-Guided fusion does not introduce artifacts during fusion and supports image denoising during fusion by applying transform domain coefficient shrinkage. Quantitative and qualitative evaluation demonstrate the superior performance of CRF-Guided fusion compared to state-of-the-art multi-focus image fusion methods.
Andreas Maniatopoulos, Paraskevi Alvanaki, and Nikolaos Mitianoudis
MDPI AG
The recent boom of artificial Neural Networks (NN) has shown that NN can provide viable solutions to a variety of problems. However, their complexity and the lack of efficient interpretation of NN architectures (commonly considered black box techniques) has adverse effects on the optimization of each NN architecture. One cannot simply use a generic topology and have the best performance in every application field, since the network topology is commonly fine-tuned to the problem/dataset in question. In this paper, we introduce a novel method of computationally assessing the complexity of the dataset. The NN is treated as an information channel, and thus information theory is used to estimate the optimal number of neurons for each layer, reducing the memory and computational load, while achieving the same, if not greater, accuracy. Experiments using common datasets confirm the theoretical findings, and the derived algorithm seems to improve the performance of the original architecture.
Dionysios Filippas, Nikolaos Margomenos, Nikolaos Mitianoudis, Chrysostomos Nicopoulos, and Giorgos Dimitrakopoulos
Institute of Electrical and Electronics Engineers (IEEE)
Managing random hardware faults requires the faults to be detected online, thus simplifying recovery. Algorithm-based fault tolerance has been proposed as a low-cost mechanism to check online the result of computations against random hardware failures. In this case, the checksum of the actual result is checked against a predicted checksum computed in parallel by a hardware checker. In this work, we target the design of such checkers for convolution engines that are currently the most critical building block in image processing and computer vision applications. The proposed convolution checksum checker, named ConvGuard, utilizes a newly introduced invariance condition of convolution to predict implicitly the output checksum using only the pixels at the border of the input image. In this way, ConvGuard reduces the power required for accumulating the input pixels without requiring large buffers to hold intermediate checksum results. The design of ConvGuard is generic and can be configured for different output sizes and strides. The experimental results show that ConvGuard utilizes only a small percentage of the area/power of an efficient convolution engine while being significantly smaller and more power efficient than a state-of-the-art checksum checker for various practical cases.
Andreas Maniatopoulos, Alexandros Gazis, and Nikolaos Mitianoudis
Inderscience Publishers
Thomas Sgouros, Angelos Bousis, and Nikolaos Mitianoudis
Institute of Electrical and Electronics Engineers (IEEE)
The music source separation problem, where the task at hand is to estimate the audio components that are present in a mixture, has been at the centre of research activity for a long time. In more recent frameworks, the problem is tackled by creating deep learning models, which attempt to extract information from each component by using Short-Time Fourier Transform (STFT) spectrograms as input. Most approaches assume that one source is present at each time-frequency point, which allows to allocate this point from the mixture to the desired source. Since this assumption is strong and is reported not to hold in practice, there is a problem that arises from the use of the magnitude of the STFT as input to these networks, which is the absence of the Fourier phase information during the separated source reconstruction. The recovery of the Fourier phase information is neither easily tractable, nor computationally efficient to estimate. In this paper, we propose a novel Attentive MultiResUNet architecture, that uses real-valued Short-Time Discrete Cosine Transform data as inputs. This step avoids the phase recovery problem, by estimating the appropriate values within the network itself, rather than employing complex estimation or post-processing algorithms. The proposed novel network features a U-Net type structure with residual skip connections and an attention mechanism that correlates the skip connection and the decoder output at the previous level. The proposed network is used for the first time in source separation and is more computationally efficient than state-of-the-art separation networks and features favourable performance compared to the state-of-the-art with a fraction of the computational cost.
Andreas Maniatopoulos and Nikolaos Mitianoudis
MDPI AG
In neural networks, a vital component in the learning and inference process is the activation function. There are many different approaches, but only nonlinear activation functions allow such networks to compute non-trivial problems by using only a small number of nodes, and such activation functions are called nonlinearities. With the emergence of deep learning, the need for competent activation functions that can enable or expedite learning in deeper layers has emerged. In this paper, we propose a novel activation function, combining many features of successful activation functions, achieving 2.53% higher accuracy than the industry standard ReLU in a variety of test cases.
K. Gkentsidis, T. Pistola, N. Mitianoudis, and N. V. Boulgouris
IEEE
We explore the capabilities of a new biometric trait, which is based on information extracted through facial motion amplification. Unlike traditional facial biometric traits, the new biometric does not require the visibility of facial features, such as the eyes or nose, that are critical in common facial biometric algorithms. In this paper we propose the formation of a spatiotemporal facial blood flow map, constructed using small motion amplification. Experiments show that the proposed approach provides significant discriminatory capacity over different training and testing days and can be potentially used in situations where traditional facial biometrics may not be applicable.
Andreas Maniatopoulos, Alexandros Gazis, Venetis P. Pallikaras, and Nikolaos Mitianoudis
North Atlantic University Union (NAUN)
Pattern Recognition and Classification is considered one of the most promising applications in the scientific field of Artificial Neural Networks (ANN). However, regardless of the vast scientific advances in almost every aspect of the technology and mathematics, neural networks still need to be fairly large and complex (i.e., deep), in order to provide robust results. In this article, we propose a novel ANN architecture approach that aims to combine two fairly small Neural Networks based on an introduced probability term of correct classification. Additionally, we present a second ANN, used to reclassify the potentially incorrect results by using the most probable error-free results as additional training data with the predicted labels. The proposed method achieves a rapid decrease in the mean square error compared to other large and complex ANN architectures with a similar execution time. Our approach demonstrates increased effectiveness when applied to various databases, related to wine, iris, the Modified National Institute of Standards and Technology (MNIST) database, the Canadian Institute for Advanced Research (Cifar32), and Fashion MNIST classification problems.
Thomas Sgouros and Nikolaos Mitianoudis
Institute of Electrical and Electronics Engineers (IEEE)
The audio source separation problem is a well-known problem that was addressed using a variety of techniques. A common setback in these techniques is that the total number of sound sources in the audio mixture must be known beforehand. However, this knowledge is not always available and thus needs to be estimated. Many approaches have attempted to estimate the number of sources in an audio mixture. There are several clustering techniques that can count the sources in an audio mixture, nonetheless, there are cases, where the directionality of the audio data in the mixture may lead these techniques to failure. In this article, we propose a generalised Directional Fuzzy C-Means (DFCM) framework that offers a complete multi-dimensional, directional solution to this problem. Our proposal shows remarkably high performance in estimating the correct number of sources in the majority of the cases and in addition, it can be used as an effective mechanism to separate the sources. The complete source counting-separation framework can act as a robust low-complexity simultaneous solution to both problems.
Nikolaos Kilis and Nikolaos Mitianoudis
MDPI AG
This paper presents a novel scheme for speech dereverberation. The core of our method is a two-stage single-channel speech enhancement scheme. Degraded speech obtains a sparser representation of the linear prediction residual in the first stage of our proposed scheme by applying orthogonal matching pursuit on overcomplete bases, trained by the K-SVD algorithm. Our method includes an estimation of reverberation and mixing time from a recorded hand clap or a simulated room impulse response, which are used to create a time-domain envelope. Late reverberation is suppressed at the second stage by estimating its energy from the previous envelope and removed with spectral subtraction. Further speech enhancement is applied on minimizing the background noise, based on optimal smoothing and minimum statistics. Experimental results indicate favorable quality, compared to two state-of-the-art methods, especially in real reverberant environments with increased reverberation and background noise.
T. Pistola, A. Papadopoulos, N. Mitianoudis, and N. V. Boulgouris
IEEE
We propose a new biometric trait based on facial motion amplification. The main advantage of the new biometric characteristic is that it does not rely on the visibility of critical facial features, such as nose, mouth, iris, or eyebrows. This makes it effective even when the respective areas are covered. Using the proposed system, facial image sequences are captured using an ordinary video camera and facial blood flow is calculated by means of small motion amplification. The calculated blood flow is captured from limited facial areas and is represented as a template that is suitable for identification purposes. Experiments on a new database show promising performance of the proposed approach, and provide evidence of the discriminatory capacity of the proposed biometric.
Ioannis Merianos and Nikolaos Mitianoudis
MDPI AG
Modern imaging applications have increased the demand for High-Definition Range (HDR) imaging. Nonetheless, HDR imaging is not easily available with low-cost imaging sensors, since their dynamic range is rather limited. A viable solution to HDR imaging via low-cost imaging sensors is the synthesis of multiple-exposure images. A low-cost sensor can capture the observed scene at multiple-exposure settings and an image-fusion algorithm can combine all these images to form an increased dynamic range image. In this work, two image-fusion methods are combined to tackle multiple-exposure fusion. The luminance channel is fused using the Mitianoudis and Stathaki (2008) method, while the color channels are combined using the method proposed by Mertens et al. (2007). The proposed fusion algorithm performs well without halo artifacts that exist in other state-of-the-art methods. This paper is an extension version of a conference, with more analysis on the derived method and more experimental results that confirm the validity of the method.
Odysseas Bouzos, Ioannis Andreadis, and Nikolaos Mitianoudis
Institute of Electrical and Electronics Engineers (IEEE)
In this paper, a novel multi-focus image fusion algorithm based on conditional random field optimization (mf-CRF) is proposed. It is based on an unary term that includes the combined activity estimation of both high and low frequencies of the input images, while a spatially varying smoothness term is introduced, in order to align the graph-cut solution with boundaries of focused and defocused pixels. The proposed model retains the advantages of both spatial-domain methods and multi-spectral methods and by solving an energy minimization problem and finds an optimal solution for the multi-focus image fusion problem. Experimental results demonstrate the effectiveness of the proposed method that outperforms current state-of-the-art multi-focus image fusion algorithms in both qualitative and quantitative comparisons. In this paper, the successful application of the mf-CRF model in multi-modal image fusion (visible-infrared and medical) is also presented.
Dimitrios Mallis, Thomas Sgouros, and Nikolaos Mitianoudis
Springer Science and Business Media LLC
George E. Tsekouras, Vasilis Trygonis, Andreas Maniatopoulos, Anastasios Rigos, Antonios Chatzipavlis, John Tsimikas, Nikolaos Mitianoudis, and Adonis F. Velegrakis
Elsevier BV
Dimitrios Alexiadis, Nikolaos Mitianoudis, and Tania Stathaki
MDPI AG
In this paper, the problem of joint disparity and motion estimation from stereo image sequences is formulated in the spatiotemporal frequency domain, and a novel steerable filter-based approach is proposed. Our rationale behind coupling the two problems is that according to experimental evidence in the literature, the biological visual mechanisms for depth and motion are not independent of each other. Furthermore, our motivation to study the problem in the frequency domain and search for a filter-based solution is based on the fact that, according to early experimental studies, the biological visual mechanisms can be modelled based on frequency-domain or filter-based considerations, for both the perception of depth and the perception of motion. The proposed framework constitutes the first attempt to solve the joint estimation problem through a filter-based solution, based on frequency-domain considerations. Thus, the presented ideas provide a new direction of work and could be the basis for further developments. From an algorithmic point of view, we additionally extend state-of-the-art ideas from the disparity estimation literature to handle the joint disparity-motion estimation problem and formulate an algorithm that is evaluated through a number of experimental results. Comparisons with state-of-the-art-methods demonstrate the accuracy of the proposed approach.
Dimitrios S. Alexiadis, Nikolaos Mitianoudis, and Tania Stathaki
Elsevier BV