Masoumeh received her Ph.D. in computer science from Jamia Hamdard University, New Delhi, India. She has been a researcher at East China Normal University and Shanghai Jiao Tong University, Shanghai, China. Additionally, she worked as a visiting researcher at ETS in Montreal, to conduct cutting-edge research on deep learning models that are specialized for image geo-localization. Her research activities mainly focus on machine learning, computer vision, and image processing.
RESEARCH, TEACHING, or OTHER INTERESTS
Artificial Intelligence, Computer Vision and Pattern Recognition, Computers in Earth Sciences
66
Scopus Publications
4318
Scholar Citations
30
Scholar h-index
54
Scholar i10-index
Scopus Publications
BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning Masoumeh Zareapoor, Pourya Shamsolmoali, Yue Lu Proceedings of the Aaai Conference on Artificial Intelligence, 2025 Achieving robust performance in vision-language tasks requires strong multimodal alignment, where textual and visual data interact seamlessly. Existing frameworks often combine contrastive learning with image captioning to unify visual and textual representations. However, reliance on global representations and unidirectional information flow from images to text limits their ability to reconstruct visual content accurately from textual descriptions. To address this limitation, we propose BiMAC, a novel framework that enables bidirectional interactions between images and text at both global and local levels. BiMAC employs advanced components to simultaneously reconstruct visual content from textual cues and generate textual descriptions guided by visual features. By integrating a text-region alignment mechanism, BiMAC identifies and selects relevant image patches for precise cross-modal interaction, reducing information noise and enhancing mapping accuracy. BiMAC achieves state-of-the-art performance across diverse vision-language tasks, including image-text retrieval, captioning, and classification.
ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer Yifan Xu, Pourya Shamsolmoali, Masoume Zareapoor, Jie Yang IEEE Transactions on Artificial Intelligence, 2025 Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.
ShapeMorph: 3D Shape Completion via Blockwise Discrete Diffusion Jiahui Li, Pourya Shamsolmoali, Yue Lu, Masoumeh Zareapoor Proceedings 2025 IEEE Winter Conference on Applications of Computer Vision Wacv 2025, 2025 We introduce ShapeMorph, a diffusion-based method specifically designed for generating precise and diverse 3D shape completions. By integrating an irregular dis-crete representation with a novel blockwise discrete dif-fusion model, ShapeMorph can produce multiple, high-quality shape completions while maintaining fidelity to the input. In particular, each 3D shape is encoded into a com-pact sequence of irregularly distributed discrete variables, ensuring an accurate capture of the object's topological de-tails. We then propose a blockwise discrete diffusion model to precisely learn the shape completion distribution based on various incompleteness. We also introduce a Flow trans-former into our diffusion process, serving as a denoising network, to enhance the modeling adaptability and flexibil-ity. ShapeMorph addresses common challenges in existing methods, such as poor completion, limited diversity, and misalignment with the input. Results show ShapeMorph outperforms state-of-the-art methods and effectively pro-cesses a variety of input types and levels of incompleteness.
Hybrid Gromov-Wasserstein Embedding for Capsule Learning Pourya Shamsolmoali, Masoumeh Zareapoor, Swagatam Das, Eric Granger, Salvador García IEEE Transactions on Neural Networks and Learning Systems, 2025 Capsule networks (CapsNets) aim to parse images into a hierarchy of objects, parts, and their relationships using a two-step process involving part-whole transformation and hierarchical component routing. However, this hierarchical relationship modeling is computationally expensive, which has limited the wider use of CapsNet despite its potential advantages. The current state of CapsNet models primarily focuses on comparing their performance with capsule baselines, falling short of achieving the same level of proficiency as deep convolutional neural network (CNN) variants in intricate tasks. To address this limitation, we present an efficient approach for learning capsules that surpasses canonical baseline models and even demonstrates superior performance compared with high-performing convolution models. Our contribution can be outlined in two aspects: first, we introduce a group of subcapsules onto which an input vector is projected. Subsequently, we present the hybrid Gromov-Wasserstein (HGW) framework, which initially quantifies the dissimilarity between the input and the components modeled by the subcapsules, followed by determining their alignment degree through optimal transport (OT). This innovative mechanism capitalizes on new insights into defining alignment between the input and subcapsules, based on the similarity of their respective component distributions. This approach enhances CapsNets' capacity to learn from intricate, high-dimensional data while retaining their interpretability and hierarchical structure. Our proposed model offers two distinct advantages: 1) its lightweight nature facilitates the application of capsules to more intricate vision tasks, including object detection; and 2) it outperforms baseline approaches in these demanding tasks. Our empirical findings illustrate that HGW capsules (HGWCapsules) exhibit enhanced robustness against affine transformations, scale effectively to larger datasets, and surpass CNN and CapsNet models across various vision tasks.
From Missing Pieces to Masterpieces: Image Completion With Context-Adaptive Diffusion Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Michael Felsberg, Dacheng Tao, Xuelong Li IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025 Image completion is a challenging task, particularly when ensuring that generated content seamlessly integrates with existing parts of an image. While recent diffusion models have shown promise, they often struggle with maintaining coherence between known and unknown (missing) regions. This issue arises from the lack of explicit spatial and semantic alignment during the diffusion process, resulting in content that does not smoothly integrate with the original image. Additionally, diffusion models typically rely on global learned distributions rather than localized features, leading to inconsistencies between the generated and existing image parts. In this work, we propose ConFill, a novel framework that introduces a Context-Adaptive Discrepancy (CAD) model to ensure that intermediate distributions of known and unknown regions are closely aligned throughout the diffusion process. By incorporating CAD, our model progressively reduces discrepancies between generated and original images at each diffusion step, leading to contextually aligned completion. Moreover, ConFill uses a new Dynamic Sampling mechanism that adaptively increases the sampling rate in regions with high reconstruction complexity. This approach enables precise adjustments, enhancing detail and integration in restored areas. Extensive experiments demonstrate that ConFill outperforms current methods, setting a new benchmark in image completion.
Fractional Correspondence Framework in Detection Transformer Masoumeh Zareapoor, Pourya Shamsolmoali, Huiyu Zhou, Yue Lu, Salvador García Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia, 2024 The Detection Transformer (DETR), by incorporating the Hungarian algorithm, has significantly simplified the matching process in object detection tasks. This algorithm facilitates optimal one-to-one matching of predicted bounding boxes to ground-truth annotations during training. While effective, this strict matching process does not inherently account for the varying densities and distributions of objects, leading to suboptimal correspondences such as failing to handle multiple detections of the same object or missing small objects. To address this, we propose the Regularized Transport Plan (RTP). RTP introduces a flexible matching strategy that captures the cost of aligning predictions with ground truths to find the most accurate correspondences between these sets. By utilizing the differentiable Sinkhorn algorithm, RTP allows for soft, fractional matching rather than strict one-to-one assignments. This approach enhances the model’s capability to manage varying object densities and distributions effectively. Our extensive evaluations on the MS-COCO and VOC benchmarks demonstrate the effectiveness of our approach. RTP-DETR, surpassing the performance of the Deform-DETR and the recently introduced DINO-DETR, achieving absolute gains in mAP of +3.8% and +1.7% , respectively.
SeTformer Is What You Need for Vision and Language Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, Michael Felsberg Proceedings of the Aaai Conference on Artificial Intelligence, 2024 The dot product self-attention (DPSA) is a fundamental component of transformers. However, scaling them to long sequences, like documents or high-resolution images, becomes prohibitively expensive due to the quadratic time and memory complexities arising from the softmax operation. Kernel methods are employed to simplify computations by approximating softmax but often lead to performance drops compared to softmax attention. We propose SeTformer, a novel transformer where DPSA is purely replaced by Self-optimal Transport (SeT) for achieving better performance and computational efficiency. SeT is based on two essential softmax properties: maintaining a non-negative attention matrix and using a nonlinear reweighting mechanism to emphasize important tokens in input sequences. By introducing a kernel cost function for optimal transport, SeTformer effectively satisfies these properties. In particular, with small and base-sized models, SeTformer achieves impressive top-1 accuracies of 84.7% and 86.2% on ImageNet-1K. In object detection, SeTformer-base outperforms the FocalNet counterpart by +2.2 mAP, using 38% fewer parameters and 29% fewer FLOPs. In semantic segmentation, our base-size model surpasses NAT by +3.5 mIoU with 33% fewer parameters. SeTformer also achieves state-of-the-art results in language modeling on the GLUE benchmark. These findings highlight SeTformer applicability for vision and language tasks.
A Hybrid Model for Container-code Detection Cai Sun, Kuikun Liu, Haoyuan Chi, Mesoume Zareapoor Proceedings 2020 13th International Congress on Image and Signal Processing Biomedical Engineering and Informatics Cisp Bmei 2020, 2020
Deep supervised auto-encoder hashing for image retrieval Sanli Tang, Haoyuan Chi, Jie Yang, Xiaolin Huang, Masoumeh Zareapoor Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2018
ShapeMorph: 3D Shape Completion via Blockwise Discrete Diffusion YL J Li, P Shamsolmoali, M Zareapoor IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2025 2025 Citations: 1
ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer Y Xu, P Shamsolmoali, M Zareapoor, J Yang IEEE Transactions on Artificial Intelligence , 2024 2024 Citations: 4
Learning Region-Word Alignment with Attentive Masking for Open-Vocabulary Object Detection M Zareapoor, P Shamsolmoali, Y Lu NeurIPS 2024 Workshop on Open-World Agents , 2024 2024 Citations: 3
Efficient Routing in Sparse Mixture-of-Experts M Zareapoor, P Shamsolmoali, F Vesaghati 2024 International Joint Conference on Neural Networks (IJCNN), 1-8 , 2024 2024 Citations: 4
Efficient Routing in Sparse Mixture-Of-Experts M Zareapoor, P Shamsolmoali, F Vesaghati International Joint Conference on Neural Networks (IJCNN), 2024 , 2024 2024 Citations: 4
Distance-based Weighted Transformer Network for image completion P Shamsolmoali, M Zareapoor, H Zhou, X Li, Y Lu Pattern Recognition 147, 110120 , 2024 2024 Citations: 14
SeTformer is What You Need for Vision and Language P Shamsolmoali, M Zareapoor, E Granger, M Felsberg Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4713-4721 , 2024 2024 Citations: 12
Hybrid Gromov–Wasserstein Embedding for Capsule Learning P Shamsolmoali, M Zareapoor, S Das, E Granger, S Garcia IEEE Transactions on Neural Networks and Learning Systems , 2024 2024 Citations: 4
Fractional correspondence framework in detection transformer M Zareapoor, P Shamsolmoali, H Zhou, Y Lu, S García ACM Multimedia, 5498-5506 , 2024 2024 Citations: 3
Rethinking Fast Adversarial Training: A Splitting Technique to Overcome Catastrophic Overfitting M Zareapoor, P Shamsolmoali European Conference on Computer Vision, 34-51 , 2024 2024 Citations: 3
Training Mixture-of-Experts: A Focus on Expert-Token Matching F Vesaghati, M Zareapoor Tiny Papers Track at ICLR 2024 , 2024 2024 Citations: 4
Image completion via dual-path cooperative filtering P Shamsolmoali, M Zareapoor, E Granger ICASSP 2023-2023 IEEE international conference on acoustics, speech and … , 2023 2023 Citations: 4
What influences news learning and sharing on mobile platforms? An analysis of multi-level informational factors J Wang, M Zareapoor, YC Chen, P Shamsolmoali, J Xie Library Hi Tech 41 (5), 1395-1419 , 2023 2023 Citations: 26
TransInpaint: Transformer-based Image Inpainting with Context Adaptation P Shamsolmoali, M Zareapoor, E Granger Proceedings of the IEEE/CVF International Conference on Computer Vision, 849-858 , 2023 2023 Citations: 45
VTAE: Variational Transformer Autoencoder with Manifolds Learning P Shamsolmoali, M Zareapoor, H Zhou, D Tao, X Li IEEE Transactions on Image Processing 32, 4486 - 4500 , 2023 2023 Citations: 23
Self-organized design of virtual reality simulator for identification and optimization of healthcare software components AK Srivastava, S Kumar, M Zareapoor Journal of Ambient Intelligence and Humanized Computing, 1-15 , 2023 2023 Citations: 16
Distance Weighted Trans Network for Image Completion P Shamsolmoali, M Zareapoor, H Zhou, X Li, Y Lu arXiv e-prints, arXiv: 2310.07440 , 2023 2023
Entropy Transformer Networks: A Learning Approach via Tangent Bundle Data Manifold P Shamsolmoali, M Zareapoor 2023 International Joint Conference on Neural Networks (IJCNN), 1-8 , 2023 2023 Citations: 2
GEN: Generative equivariant networks for diverse image-to-image translation P Shamsolmoali, M Zareapoor, S Das, S Garcia, E Granger, J Yang IEEE Transactions on Cybernetics 53 (2), 874-886 , 2023 2023 Citations: 21
Salient skin lesion segmentation via dilated scale-wise feature fusion network P Shamsolmoali, M Zareapoor, J Yang, E Granger, H Zhou 2022 26th International Conference on Pattern Recognition (ICPR), 4219-4225 , 2022 2022 Citations: 5
MOST CITED SCHOLAR PUBLICATIONS
Application of credit card fraud detection: Based on bagging ensemble classifier M Zareapoor, S Pourya Procedia Computer Science 48, 679-685 , 2015 2015 Citations: 552
Hybrid Deep Neural Networks for Face Emotion Recognition N Jain, S Kumar, A Kumar, P Shamsolmoali, M Zareapoor Pattern Recognition Letter , 2019 2019 Citations: 383
Rotation Equivariant Feature Image Pyramid Network for Object Detection in Optical Remote Sensing Imagery P Shamsolmoali, M Zareapoor, J Chanussot, H Zhou, J Yang IEEE Transactions on Geoscience and Remote Sensing 60 , 2021 2021 Citations: 266
Fraudminer: A novel credit card fraud detection model based on frequent itemset mining KR Seeja, M Zareapoor The Scientific World Journal 2014 (1), 252797 , 2014 2014 Citations: 243
Image synthesis with adversarial networks: A comprehensive survey and case studies P Shamsolmoali, M Zareapoor, E Granger, H Zhou, R Wang, ME Celebi, ... Information Fusion 72, 126-146 , 2021 2021 Citations: 211
A novel deep structure u-net for sea-land segmentation in remote sensing images P Shamsolmoali, M Zareapoor, R Wang, H Zhou, J Yang IEEE Journal of Selected Topics in Applied Earth Observations and Remote … , 2019 2019 Citations: 183
Analysis on Credit Card Fraud Detection Techniques: Based on Certain Design Criteria M Zareapoor, KR Seeja, MA Alam Foundation of Computer Science 52 (3) , 2013 2013 Citations: 180
Oversampling Adversarial Network for Class-Imbalanced Fault Diagnosis M Zareapoor, P Shamsolmoali, J Yang Mechanical Systems and Signal Processing , 2020 2020 Citations: 167
Road Segmentation for Remote Sensing Images using Adversarial Spatial Pyramid Networks P Shamsolmoali, M Zareapoor, H Zhou, R Wang, J Yang IEEE Transactions on Geoscience and Remote Sensing , 2020 2020 Citations: 162
Feature extraction or feature selection for text classification: A case study on phishing email detection M Zareapoor, KR Seeja International Journal of Information Engineering and Electronic Business 7 … , 2015 2015 Citations: 157
Deep Learning based Small Surface Defect Detection via Exaggerated Local Variation-based Generative Adversarial Network J Lian, W Jia, M Zareapoor, Y Zheng, R Luo, Jain IEEE Transactions on Industrial Informatics , 2019 2019 Citations: 150
Kernelized support vector machine with deep learning: an efficient approach for extreme multiclass dataset M Zareapoor, P Shamsolmoali, DK Jain, H Wang, J Yang Pattern Recognition Letter , 2017 2017 Citations: 110
Multi-scale convolutional neural network for multi-focus image fusion HT Mustafa, J Yang, M Zareapoor Image and Vision Computing , 2019 2019 Citations: 108
Pattern Recognit AK Jain, S Kumar, A Kumar, P Shamsolmoali, M Zareapoor Lett 31, 651-666 , 2019 2019 Citations: 104
Imbalanced Data Learning by Minority Class Augmentation using Capsule Adversarial Networks P Shamsolmoali, M Zareapoor, L Shen, AH Sadka, J Yang arXiv preprint arXiv:2004.02182 , 2020 2020 Citations: 93
G-GANISR: Gradual generative adversarial network for image super resolution P Shamsolmoali, M Zareapoor, R Wang, DK Jain, J Yang Neurocomputing 366, 140-153 , 2019 2019 Citations: 85
Statistical-based filtering system against DDOS attacks in cloud computing P Shamsolmoali, M Zareapoor 2014 8th International Conference on Communications and Informatics, 1234-1239 , 2014 2014 Citations: 68
Deep convolution network for surveillance records super-resolution P Shamsolmoali, M Zareapoor, DK Jain, VK Jain, J Yang Multimedia Tools and Applications, 1-15 , 2018 2018 Citations: 67
Multipatch feature pyramid network for weakly supervised object detection in optical remote sensing images P Shamsolmoali, J Chanussot, M Zareapoor, H Zhou, J Yang IEEE Transactions on Geoscience and Remote Sensing 60, 1-13 , 2021 2021 Citations: 57
Image super resolution by dilated dense progressive network P Shamsolmoali, M Zareapoor, J Zhang, J Yang Image and Vision Computing 88, 9-18 , 2019 2019 Citations: 54
Publications
M Zareapoor, P Shamsolmoali, F Vesaghati, An Efficient Sparse Mixture of Experts. International Joint Conference on Neural Networks (IJCNN), 2024.
M. Zareapoor, P. Shamsolmoali, Y. Lu, E. Granger, J. Yang, Mapping the Invisible: Object Detection in Remote Sensing Imagery via Cost-Regularized Optimal Transport, ISPRS Journal of Photogrammetry and Remote Sensing, 2024.