Nikolaos Mitianoudis received the Diploma in Electronic and Computer Engineering from the Aristotle University of Thessaloniki, Greece in 1998. He received the MSc in Communications and Signal Processing from Imperial College London, UK in 2000 and the PhD in Audio Source Separation using Independent Component Analysis from Queen Mary, University of London, UK in 2004. Between 2003 and 2009, he was a Research Associate at the Electrical and Electronic Engineering Department at Imperial College London, UK working on the Data Information Fusion-Defense Technology Centre project "Applied Multi-Dimensional Fusion", sponsored by General Dynamics UK and QinetiQ. From 2009 until 2010, he was an Academic Assistant at the International Hellenic University in Thermi, Greece. In 2010, he joined the Electrical and Computer Engineering Department at the Democritus University of Thrace, Greece, where he currently serves as an Associate Professor in Audio and Image Processing.
79
Scopus Publications
2353
Scholar Citations
24
Scholar h-index
45
Scholar i10-index
Scopus Publications
Interpretable Vision Transformers in Monocular Depth Estimation via SVDA Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos Mathematics, 2026 Monocular depth estimation is a central problem in computer vision with applications in robotics, augmented reality, and autonomous driving, yet the self-attention mechanisms used by modern Transformer architectures remain opaque. In this work, we integrate SVD-Inspired Attention (SVDA) into the Dense Prediction Transformer (DPT), introducing a spectrally structured attention formulation for dense prediction that decouples directional alignment from spectral modulation through a learnable diagonal matrix embedded in normalized query–key interactions. Experiments on KITTI and NYU-v2 show that SVDA preserves competitive predictive performance while enabling intrinsic interpretability: on KITTI, AbsRel improves from 0.058 to 0.056 and δ1 from 0.976 to 0.979, while on NYU-v2, AbsRel improves from 0.133 to 0.124 and δ1 from 0.865 to 0.872. This is achieved with only 0.01% additional parameters, at the cost of a measurable runtime overhead associated with the added normalization and spectral modulation. More importantly, SVDA enables six spectral indicators that quantify entropy, rank, sparsity, alignment, selectivity, and robustness, revealing consistent cross-dataset and depth-wise patterns in how attention organizes during training. These properties make the model easier to inspect and better suited to applications where transparency and reliability are important, such as robotics and autonomous navigation.
On Segment-Aware Monocular Depth Estimation Using Vision Transformers Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos Information Switzerland, 2026 Monocular Depth Estimation (MDE) infers per-pixel scene geometry from a single RGB image. Despite recent progress, global MDE models often blur depth discontinuities at object boundaries and fail to capture object-level structure. Segment-aware depth estimation addresses this limitation by exploiting semantic segmentation to decompose depth prediction into simpler, class-specific subproblems. In this work, we study semantic-aware MDE in a multi-branch design where each semantic class is handled by a lightweight Vision Transformer (ViT) branch that predicts dense depth for its class while suppressing interference from other regions. We further examine fusion strategies that merge the branch outputs into a single prediction: (i) a learnable cross-attention fusion module that predicts depth from the stack of per-class proposals and masks, and (ii) a parameter-free stitched summation that sums mask-gated outputs. The proposed architecture is simple, scalable, end-to-end trainable, and compatible with arbitrary transformer backbones. Experiments on Virtual KITTI 2, where ground-truth depth and semantic labels are available, show that segment-aware modeling produces sharper depth boundaries and improves standard error metrics compared to a single-branch baseline (AbsRel 0.243→0.152; RMSE 11.952→9.101). Finally, we find that the parameter-free summation matches, and in most cases improves upon, the accuracy of learned fusion while adding no computational overhead.
Interpretable Vision Transformers in Image Classification via SVDA Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos IEEE Access, 2026 Vision Transformers (ViTs) have achieved state-of-the-art performance in image classification, yet their attention mechanisms often remain opaque and exhibit dense, non-structured behaviors. In this work, we adapt our previously proposed SVD-Inspired Attention (SVDA) mechanism to the ViT architecture, introducing a geometrically grounded formulation that enhances interpretability, sparsity, and spectral structure. We apply the use of interpretability indicators—originally proposed with SVDA—to monitor attention dynamics during training and assess structural properties of the learned representations. Experimental evaluations on four widely used benchmarks—CIFAR-10, FashionMNIST, CIFAR-100, and ImageNet-100—together with an additional pretrained fine-tuning study in a standard ViT setting show that SVDA preserves competitive classification behavior in our experimental settings while providing descriptive diagnostics of attention structure. In the pretrained setting, we integrate the exact SVDA operator into the late transformer blocks of a standard pretrained ViT and fine-tune on ImageNet-100, providing additional evidence that the proposed mechanism remains viable beyond compact from-scratch training. While the current framework offers descriptive insights rather than prescriptive guidance, our results establish SVDA as a comprehensive and informative tool for analyzing and developing structured attention models in computer vision. This work lays the foundation for future advances in explainable AI, spectral diagnostics, and attention-based model design.
Geometry Meets Attention: Interpretable Transformers via SVD Inspiration Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos IEEE Access, 2025 Self-attention is a cornerstone of modern deep learning, yet its dense dot-product formulation offers limited interpretability and lacks explicit structural constraints. We propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SVD-inspired Attention</i> (SVDA), a novel self-attention mechanism that introduces normalized query/key projections and a learnable diagonal spectral modulation, drawing direct motivation from the structure of Singular Value Decomposition (SVD). This formulation separates directional alignment from spectral emphasis, offering a geometrically grounded and interpretable variant of attention. We formalize SVDA within a standard multi-head Transformer architecture and introduce a suite of structure-aware indicators—such as spectral entropy, effective rank, and selectivity—that quantify interpretability and sparsity in attention dynamics. Our analysis highlights SVDA’s capacity for structured, energy-aware attention without compromising architectural compatibility or expressiveness. This work provides a theoretical foundation and diagnostic framework for structured attention models aimed at interpretability, compression, and semantic transparency.
Towards Explainability in Monocular Depth Estimation Vasileios Arampatzakis, George Pavlidis, Kyriakos Pantoglou, Nikolaos Mitianoudis, Nikos Papamarkos Communications in Computer and Information Science, 2025
Person Identification Using Temporal Analysis of Facial Blood Flow Maria Raia, Thomas Stogiannopoulos, Nikolaos Mitianoudis, Nikolaos V. Boulgouris Electronics Switzerland, 2024 Biometrics play an important role in modern access control and security systems. The need of novel biometrics to complement traditional biometrics has been at the forefront of research. The Facial Blood Flow (FBF) biometric trait, recently proposed by our team, is a spatio-temporal representation of facial blood flow, constructed using motion magnification from facial areas where skin is visible. Due to its design and construction, the FBF does not need information from the eyes, nose, or mouth, and, therefore, it yields a versatile biometric of great potential. In this work, we evaluate the effectiveness of novel temporal partitioning and Fast Fourier Transform-based features that capture the temporal evolution of facial blood flow. These new features, along with a “time-distributed” Convolutional Neural Network-based deep learning architecture, are experimentally shown to increase the performance of FBF-based person identification compared to our previous efforts. This study provides further evidence of FBF’s potential for use in biometric identification.
MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation Nikolaos Detsikas, Nikolaos Mitianoudis, Ioannis Pratikakis Journal of Imaging, 2024 A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications.
Monocular Depth Estimation: A Thorough Review Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 Estimation of depth in two-dimensional images is among the challenging topics in Computer Vision. This is a well-studied but also an ill-posed problem, which has long been the focus of intense research. This paper is an in-depth review of the topic, presenting two aspects, one that considers the mechanisms of human depth perception, and another that includes the various Deep Learning approaches. The methods are presented in a compact and structured way that outlines the topic and categorizes the approaches according to the line of research followed in the recent decade. Although there has been significant advancement in the topic, it was without any connection with human depth perception and the potential benefits from this sector.
Non-Contact Blood Pressure Estimation Using Forehead and Palm Infrared Video Thomas Stogiannopoulos, Nikolaos Mitianoudis Biomedinformatics, 2024 This study investigates the potential of low-cost infrared cameras for non-contact monitoring of blood pressure (BP) in individuals with fragile health, particularly the elderly. Previous research has shown success in developing non-contact BP monitoring using RGB cameras. In this study, the Eulerian Video Magnification (EVM) technique is employed to enhance minor variations in skin pixel intensity in specific facial regions captured by an infrared camera from the forehead and palm. The primary focus of this study is to explore the possibility of using infrared cameras for non-contact BP monitoring under low-light or night-time conditions. We have successfully shown that by employing a series of straightforward signal processing techniques and regression analysis, we were able to achieve commendable outcomes in our experimental setup. Specifically, we were able to surpass the stringent accuracy standards set forth by the British Hypertension Society (BHS) and the Association for the Advancement of Medical Instrumentation (AAMI) protocol.
A memristive circular buffer for real-time signal processing Christos Sichonidis, Ioannis Vourkas, Nikolaos Mitianoudis, Georgios Ch. Sirakoulis 2016 5th International Conference on Modern Circuits and Systems Technologies Mocast 2016, 2016
Real time hand detection in a complex background Ekaterini Stergiopoulou, Kyriakos Sgouropoulos, Nikos Nikolaou, Nikos Papamarkos, Nikos Mitianoudis Engineering Applications of Artificial Intelligence, 2014
Applied multi-dimensional fusion A. Mahmood, P. M. Tudor, W. Oxford, R. Hansford, J. D. B. Nelson, N. G. Kingsbury, A. Katartzis, M. Petrou, N. Mitianoudis, T. Stathaki, A. Achim, D. Bull, N. Canagarajah, S. Nikolov, A. Loza, N. Cvejic Computer Journal, 2007
A fixed point solution for convolved audio source separation IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, 2001
RECENT SCHOLAR PUBLICATIONS
Interpretable Vision Transformers in Image Classification via SVDA V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos IEEE Access , 2026 2026
Interpretable Vision Transformers in Monocular Depth Estimation via SVDA V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos Mathematics 14 (8), 1272 , 2026 2026
When Slots Compete: Slot Merging in Object-Centric Learning C Chatzisavvas, P Rigas, G Ioannakis, V Katsouros, N Mitianoudis arXiv preprint arXiv:2603.11246 , 2026 2026
On Segment-Aware Monocular Depth Estimation Using Vision Transformers V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos Information 17 (2), 145 , 2026 2026
Geometry meets attention: Interpretable transformers via SVD inspiration V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos IEEE Access , 2025 2025 Citations: 3
Person Identification Using Temporal Analysis of Facial Blood Flow M Raia, T Stogiannopoulos, N Mitianoudis, NV Boulgouris Electronics 13 (22), 4499 , 2024 2024
Automatic processing of dynamic responses via wavelet transform for the development of dynamic equivalent models EP Saroudis, TA Papadopoulos, GA Barzegkar-Ntovom, EO Kontis, ... Sustainable Energy, Grids and Networks 38, 101383 , 2024 2024 Citations: 4
MResTNet: a multi-resolution transformer framework with CNN extensions for semantic segmentation N Detsikas, N Mitianoudis, I Pratikakis Journal of Imaging 10 (6), 125 , 2024 2024 Citations: 6
A Dilated MultiRes Visual Attention U-Net for historical document image binarization N Detsikas, N Mitianoudis, N Papamarkos Signal Processing: Image Communication 122, 117102 , 2024 2024 Citations: 6
Non-Contact Blood Pressure Estimation Using Forehead and Palm Infrared Video T Stogiannopoulos, N Mitianoudis BioMedInformatics 4 (1), 437-453 , 2024 2024 Citations: 4
Monocular depth estimation: A thorough review V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (4), 2396-2414 , 2023 2023 Citations: 128
Sound event detection in domestic environment using frequency-dynamic convolution and local attention GA Cheimariotis, N Mitianoudis Information 14 (10), 534 , 2023 2023 Citations: 5
Historical Document Image Binarization using a lightweight U-Net derivative architecture N Detsikas, N Mitianoudis 2023 18th International Workshop on Cellular Nanoscale Networks and their … , 2023 2023
Contactless Blood Pressure Estimation using infrared video from facial and hand regions T Stogiannopoulos, N Mitianoudis 2023 18th International Workshop on Cellular Nanoscale Networks and their … , 2023 2023
Towards explainability in monocular depth estimation V Arampatzakis, G Pavlidis, K Pantoglou, N Mitianoudis, N Papamarkos Joint European Conference on Machine Learning and Knowledge Discovery in … , 2023 2023 Citations: 3
A lightweight ConvGRU network for Distracted Driving detection P Anagnostou, N Mitianoudis 2023 24th International Conference on Digital Signal Processing (DSP), 1-5 , 2023 2023 Citations: 4
A non-contact SpO 2 estimation using video magnification and infrared data T Stogiannopoulos, GA Cheimariotis, N Mitianoudis ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and … , 2023 2023 Citations: 9
A Study of Machine Learning Regression Techniques for Non-Contact SpO 2 Estimation from Infrared Motion-Magnified Facial Video T Stogiannopoulos, GA Cheimariotis, N Mitianoudis Information 14 (6), 301 , 2023 2023 Citations: 29
A convolutional neural network-based conditional random field model for structured multi-focus image fusion robust to noise O Bouzos, I Andreadis, N Mitianoudis IEEE Transactions on Image Processing 32, 2915-2930 , 2023 2023 Citations: 37
Technical analysis forecasting and evaluation of stock markets: the probabilistic recovery neural network approach A Maniatopoulos, A Gazis, N Mitianoudis International journal of economics and business research 25 (1), 64-100 , 2023 2023 Citations: 6
MOST CITED SCHOLAR PUBLICATIONS
Pixel-based and region-based image fusion schemes using ICA bases N Mitianoudis, T Stathaki Information fusion 8 (2), 131-142 , 2007 2007 Citations: 453
Audio source separation of convolutive mixtures N Mitianoudis, ME Davies IEEE transactions on Speech and Audio processing 11 (5), 489-497 , 2003 2003 Citations: 188
Simple mixture model for sparse overcomplete ICA M Davies, N Mitianoudis IEE Proceedings-Vision, Image and Signal Processing 151 (1), 35-43 , 2004 2004 Citations: 132
Monocular depth estimation: A thorough review V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (4), 2396-2414 , 2023 2023 Citations: 128
Learnable leaky ReLU (LeLeLU): An alternative accuracy-optimized activation function A Maniatopoulos, N Mitianoudis Information 12 (12), 513 , 2021 2021 Citations: 90
Converting a plant to a battery and wireless sensor with scatter radio and ultra-low cost C Konstantopoulos, E Koutroulis, N Mitianoudis, A Bletsas IEEE Transactions on Instrumentation and Measurement 65 (2), 388-398 , 2015 2015 Citations: 87
Document image binarization using local features and Gaussian mixture modeling N Mitianoudis, N Papamarkos Image and Vision Computing 38, 33-51 , 2015 2015 Citations: 71
Optimal contrast correction for ICA-based fusion of multimodal images N Mitianoudis, T Stathaki IEEE sensors journal 8 (12), 2016-2026 , 2008 2008 Citations: 68
Spatial kernel K-harmonic means clustering for multi-spectral image segmentation Q Li, N Mitianoudis, T Stathaki IET Image Processing 1 (2), 156-167 , 2007 2007 Citations: 57
Audio source separation: Solutions and problems N Mitianoudis, ME Davies International Journal of Adaptive Control and Signal Processing 18 (3), 299-314 , 2004 2004 Citations: 51
Overcomplete source separation using Laplacian mixture models N Mitianoudis, T Stathaki IEEE Signal Processing Letters 12 (4), 277-280 , 2005 2005 Citations: 48
Batch and online underdetermined source separation using laplacian mixture models N Mitianoudis, T Stathaki IEEE transactions on audio, speech, and language processing 15 (6), 1818-1832 , 2007 2007 Citations: 45
Conditional random field model for robust multi-focus image fusion O Bouzos, I Andreadis, N Mitianoudis IEEE Transactions on Image Processing 28 (11), 5636-5648 , 2019 2019 Citations: 44
New fixed-point ica algorithms for convolved mixtures N Mitianoudis, M Davies Proc. 3rd International Workshop on Independent Component Analysis and Blind … , 2001 2001 Citations: 43
Adaptive image fusion using ICA bases N Mitianoudis, T Stathaki 2006 IEEE International Conference on Acoustics Speech and Signal Processing … , 2006 2006 Citations: 42
MASTERS: A virtual lab on multimedia systems for telecommunications, medical, and remote sensing applications DS Alexiadis, N Mitianoudis IEEE Transactions on Education 56 (2), 227-234 , 2012 2012 Citations: 39
A convolutional neural network-based conditional random field model for structured multi-focus image fusion robust to noise O Bouzos, I Andreadis, N Mitianoudis IEEE Transactions on Image Processing 32, 2915-2930 , 2023 2023 Citations: 37
Permutation alignment for frequency domain ICA using subspace beamforming methods N Mitianoudis, M Davies International Conference on Independent Component Analysis and Signal … , 2004 2004 Citations: 36
Audio source separation using independent component analysis N Mitianoudis University of London , 2004 2004 Citations: 33
Low-cost online convolution checksum checker D Filippas, N Margomenos, N Mitianoudis, C Nicopoulos, ... IEEE Transactions on Very Large Scale Integration (VLSI) Systems 30 (2), 201-212 , 2021 2021 Citations: 30