Advancing Hindi Text Summarization: Named Entity Recognition and Content Augmentation Strategies Saumay Gupta, Sukomal Pal ACM Transactions on Asian and Low Resource Language Information Processing, 2025 We explore advancements in Hindi text summarization, a critical area in natural language processing that aids in managing information overload. Despite a growing corpus of Hindi data, there’s a significant gap in practical summarization tools due to intricate linguistic features and limited resources compared to English. Previous works focused on extractive methods, but recent shifts towards abstractive approaches promise more natural and coherent summaries by understanding and paraphrasing content. Our research introduces novel methodologies, Named Entity Aware-Abstractive Text Summarization (NEA-ATS) and Query-Driven Content Augmentation for Summarization (QDCAS), aimed at enhancing the accuracy and richness of Hindi summaries. NEA-ATS integrates Named Entity Recognition to prioritize crucial information, improving language model attention to critical details but occasionally disrupting context. While NEA-ATS shows some improvements, it occasionally disrupts the text’s context, leading to only marginal gains in summary quality. Meanwhile, QDCAS addresses extrinsic hallucinations—common in state-of-the-art models—by augmenting source documents with relevant content through focused web crawling—a technique to selectively gather topic-specific web pages—broadening contextual understanding and refining outputs. Empirical results demonstrate the effectiveness of QDCAS, showing marginal improvements in ROUGE and BERTScores over traditional language models. This work advances Hindi text summarization and explores content-rich strategies, potentially expanding to other languages and domains.
A case study on decompounding in Indian language IR Siba Sankar Sahu, Sukomal Pal Natural Language Processing, 2025 Decompounding is an essential preprocessing step in text-processing tasks such as machine translation, speech recognition, and information retrieval (IR). Here, the IR issues are explored from five viewpoints. (A) Does word decompounding impact the Indian language IR? If yes, to what extent? (B) Can corpus-based decompounding models be used in the Indian language IR? If yes, how? (C) Can machine learning and deep learning-based decompounding models be applied in the Indian language IR? If yes, how? (D) Among the different decompounding models (corpus-based, hybrid machine learning-based, and deep learning-based), which provides the best effectiveness in the IR domain? (E) Among the different IR models, which provides the best effectiveness from the IR perspective? This study proposes different corpus-based, hybrid machine learning-based, and deep learning-based decompounding models in Indian languages (Marathi, Hindi, and Sanskrit). Moreover, we evaluate the effectiveness of each activity from an IR perspective only. It is observed that the different decompounding models improve IR effectiveness. The deep learning-based decompounding models outperform the corpus-based and hybrid machine learning-based models in Indian language IR. Among the different deep learning-based models, the Bi-LSTM-A model performs best and improves mean average precision (MAP) by 28.02% in Marathi. Similarly, the Bi-RNN-A model improves MAP by 18.18% and 6.1% in Hindi and Sanskrit, respectively. Among the retrieval models, the In_expC2 model outperforms others in Marathi and Hindi, and the BB2 model outperforms others in Sanskrit.
Overview of the shared task on code-mixed information retrieval from social media data Ceur Workshop Proceedings, 2025
Advancing Vision and Language in GI Diagnosis: Florence2 for Question Answering and Stable Diffusion for Image Synthesis Ceur Workshop Proceedings, 2025
IReL, IIT(BHU) at MEDIQA-MAGIC 2025: Tackling Multimodal Dermatology with CLIPSeg-Based Segmentation and BERT-Swin Question Answering Ceur Workshop Proceedings, 2025
Arcturus at CheckThat! 2025: DeBERTa-v3-Base for Multilingual Subjectivity Detection in News Articles Ceur Workshop Proceedings, 2025
Findings of the Code-Mixed Information Retrieval from Social Media Data (CMIR) Shared Task at FIRE 2025 Ceur Workshop Proceedings, 2025
SAViOR: Sentiment, Sarcasm, Abuse, and Vulgarity in Online Realities (Memes) Ceur Workshop Proceedings, 2025