@iiita.ac.in
PhD scholar in Department of Information Technology
Indian Institute of Information Technology Allahabad
I have worked on Dialogue System as a research scholar in Human-Computer-Interaction department of Indian Institute of Information Technology Allahabad (IIITA), India under Prof. U.S. Tiwary and also a member of SILP (Speech, Image and Language Processing) Lab.
My area is more focussed toward the natural language understanding and state tracking in the dialogue. I am enthusiastic to work in the field of artificial intelligence, machine learning, applied mathematics and statistics.
Currently, working as an AI Scientist at Uniphore Private Limited since Sept 2022
BTech in CSE department from Guru Ghasidas Visvidhayala (Central University)
MTech-PhD in IT department from Indian Institute of Information Technology Allahabad.
Human-Computer Interaction, Artificial Intelligence, Computer Engineering, Computer Science
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Shrikant Malviya, Rohit Mishra, Santosh Kumar Barnwal, and Uma Shanker Tiwary
Springer Science and Business Media LLC
Rohit Mishra, Shrikant Malviya, Sumit Singh, Varsha Singh, and Uma Shanker Tiwary
Springer Science and Business Media LLC
Rohit Mishra, Shrikant Malviya, Rudra Chandra Ghosh, and Uma Shanker Tiwary
IOS Press
Impreciseness and uncertainty are the fabrics that make life interesting. For decades, human beings have developed strategies to cope with uncertainties and automate them. In personnel selection for the I.T. field, selectors often find it very difficult to select candidates by going through a set of resumes containing similar kinds of skills. Hence the selection task becomes a fuzzy decision making with the uncertainty involved. A combination of fuzzy clustering and Interval Type-2 fuzzy sets (IT2FS) is proposed in such scenarios. An experiment is conducted over a resume dataset containing fifteen hundred resumes for a particular job description. Firstly, Fuzzy C-means clustering (FCM) is applied for selective clustering, while decision-making under uncertainty is carried through IT2FS. The candidates in the selected cluster are given a score for ranking as per the skillset criteria. The final decision for shortlisting the resumes is carried through IT2FS. The model shows an average accuracy of 88.2% with an F1-score of 0.76 compared to (K-means + IT2FS) model with an F1-score of 0.72. Thus, the proposed model performs better while decision-making under uncertainty.
Dipam Goswami, Shrikant Malviya, Rohit Mishra, and Uma Shanker Tiwary
IEEE
This paper presents the analysis of non-contextual word embeddings trained on AI4Bharat-IndicNLP corpus containing 2.7 billion words covering 10 Indian languages. We share the pre-trained embeddings for research and development in Indic languages. These embeddings are evaluated on several evaluation tasks like word similarity and analogy evaluation, classification tasks on multiple datasets. The analysis of word embeddings is expected to give researchers a better understanding of the Indic Languages. We show that Word2Vec skip-gram and FastText skip-gram embeddings are the best performing models for NLP tasks on Indic languages. All the embeddings are made freely available.
Shrikant Malviya, Rohit Mishra, Santosh Kumar Barnwal, and Uma Shanker Tiwary
Institute of Electrical and Electronics Engineers (IEEE)
Due to the rapid increase in the development of Task-oriented dialogue systems, the need for labelled dialogue corpus has become inevitable. For the Hindi language, there is no such dialogue corpus yet available. As a first attempt, we release a Hindi Dialogue Restaurant Search (HDRS) corpus and compare various state-of-the-art dialogue state tracking (DST) models on it. The corpus consists of 1.4 k human-to-human typed dialogues collected using Wizard-of-Oz paradigm. The paper starts with a brief description of the corpus by providing the details of features, corpus collection process and statistical analysis, then the performance of baseline NLU and DST models are investigated. Further, we experimented two categories of state-of-the-art belief state trackers: (1) Non-contextual pre-trained word embedding based DST models; (2) Contextual pre-trained BERT based DST models. All belief trackers follow a three-layered generic architecture. The category-1 models use the static domain ontology, while category-2 models have the capability to handle the dynamic ontology. The DST models are compared on joint-goal and turn-request accuracy. Global encoder and Slot-ATtentive decoders (GSAT) outperforms all the models with 83.25% joint-goal accuracy, followed by SUMBT.
Reetu Kumari, Rohit Mishra, Shrikant Malviya, and Uma Shanker Tiwary
Springer International Publishing
Rohit Mishra, Santosh Kumar Barnwal, Shrikant Malviya, Varsha Singh, Punit Singh, Sumit Singh, and Uma Shanker Tiwary
Springer International Publishing
Sumit Singh, Shrikant Malviya, Rohit Mishra, Santosh Kumar Barnwal, and Uma Shanker Tiwary
Springer International Publishing
Rohit Mishra, Santosh Kumar Barnwal, Shrikant Malviya, Prasoon Mishra, and Uma Shanker Tiwary
Springer International Publishing
Shrikant Malviya, Rohit Mishra, and Uma Shanker Tiwary
IEEE
Automatic speech recognition (ASR) and Text to speech (TTS) are two prominent area of research in human computer interaction nowadays. A set of phonetically rich sentences is in a matter of importance in order to develop these two interactive modules of HCI. Essentially, the set of phonetically rich sentences has to cover all possible phone units distributed uniformly. Selecting such a set from a big corpus with maintaining phonetic characteristic based similarity is still a challenging problem. The major objective of this paper is to devise a criteria in order to select a set of sentences encompassing all phonetic aspects of a corpus with size as minimum as possible. First, this paper presents a statistical analysis of Hindi phonetics by observing the structural characteristics. Further a two stage algorithm is proposed to extract phonetically rich sentences with a high variety of triphones from the EMILLE Hindi corpus. The algorithm consists of a distance measuring criteria to select a sentence in order to improve the triphone distribution. Moreover, a special preprocessing method is proposed to score each triphone in terms of inverse probability in order to fasten the algorithm. The results show that the approach efficiently build uniformly distributed phonetically-rich corpus with optimum number of sentences.
1 year and 4 month (currently) working at Uniphore Private Limited, Bangalore