Verified @iiita.ac.in
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Pankaj Tyagi, Rahul Semwal, Anju Sharma, Uma Shanker Tiwary, and Pritish Varadwaj
PAGEPress Publications
All fruits emit some specific volatile organic compounds (VOCs) during their life cycle. These VOCs have specific characteristics, by using these characteristics fruit ripening stage can be identified without destructing the fruit. In this study, an application-specific electronic nose device was designed for monitoring fruit ripeness.The proposed electronic nose is cost-efficient and does not require any modern or costly laboratory instruments. Metal oxide semiconductor (MOS) sensors were used for designing the proposed electronic nose. These MOS sensors were integrated with a microcontroller board to detect and extract the meaningful features of VOCs, and an artificial neural network (ANN) algorithm was used for pattern recognition. Measurements were done with apples, bananas, oranges, grapes, and pomegranates. The designed electronic nose proved to be reliable in classifying fruit samples into three different fruit ripening stage (unripe, ripe, and over-ripe) with high precision and recall. The proposed electronic nose performed uniformly on all three fruit ripening stages with an average accuracy of ≥ 95%.
Pankaj Tyagi, Anju Sharma, Rahul Semwal, Uma Shanker Tiwary, and Pritish Kumar Varadwaj
Informa UK Limited
Determining the structure-odor relationship has always been a very challenging task. The main challenge in investigating the correlation between the molecular structure and its associated odor is the ambiguous and obscure nature of verbally defined odor descriptors, particularly when the odorant molecules are from different sources. With the recent developments in machine learning (ML) technology, ML and data analytic techniques are significantly being used for quantitative structure-activity relationship (QSAR) in the chemistry domain toward knowledge discovery where the traditional Edisonian methods have not been useful. The smell perception of odorant molecules is one of the aforementioned tasks, as olfaction is one of the least understood senses as compared to other senses. In this study, the XGBoost odor prediction model was generated to classify smells of odorant molecules from their SMILES strings. We first collected the dataset of 1278 odorant molecules with seven basic odor descriptors, and then 1875 physicochemical properties of odorant molecules were calculated. To obtain relevant physicochemical features, a feature reduction algorithm called PCA was also employed. The ML model developed in this study was able to predict all seven basic smells with high precision (>99%) and high sensitivity (>99%) when tested on an independent test dataset. The results of the proposed study were also compared with three recently conducted studies. The results indicate that the XGBoost-PCA model performed better than the other models for predicting common odor descriptors. The methodology and ML model developed in this study may be helpful in understanding the structure-odor relationship.Communicated by Ramaswamy H. Sarma.
Rahul Semwal, Imlimaong Aier, Pankaj Tyagi, Utkarsh Raj, and Pritish Kumar Varadwaj
IEEE
In the recent past, with the improvement of high throughput technology, the availability of protein structural data has increased exponentially. All these structural data have to be correctly mapped to their functional attributes to decode their biological role. However, to perform the functional annotation of these structural entities, the essential move is to locate the ligand-binding site (LBS) information. Although many approaches have been proposed to locate the LBS, most have low performance in terms of predictive quality. In this proposed work, we are presenting a deep neural network-based approach, DeepLBS, which uses geometrical as well as pharmacophoric properties to quantify the ligand-binding site (LBS) with high accuracy. To determine the efficiency of our work, DeepLBS was compared with the most recently developed deep learning tools. The result demonstrated that DeepLBS outperformed the existing state of art tools in terms of predictive quality.
Rahul Semwal, Imlimaong Aier, Utkarsh Raj, and Pritish Kumar Varadwaj
Institute of Electrical and Electronics Engineers (IEEE)
Motifs are the evolutionarily conserved patterns which are reported to serve the crucial structural and functional role. Identification of motif patterns in a set of protein sequences has been a prime concern for researchers in computational biology. The discovery of such a protein motif using existing algorithms is purely based on the parameters derived from sequence composition and length. However, the discovery of variable length motif remains a challenging task, as it is not possible to determine the length of a motif in advance. In current work, a k-mer based motif discovery approach called Pr[m], is proposed for the detection of the statistically significant un-gapped motif patterns, with or without wildcard characters. In order to analyze the performance of the proposed approach, a comparative study was performed with MEME and GLAM2, which are two widely used non-discriminative methods for motif discovery. A set of 7,500 test dataset were used to compare the performance of the proposed tool and the ones mentioned above. Pr[m] outperformed the existing methods in terms of predictive quality and performance. The proposed approach is hosted at https://bioserver.iiita.ac.in/Pr[m].
Anju Sharma, Rajnish Kumar, Rahul Semwal, Imlimaong Aier, Pankaj Tyagi, and Pritish K. Varadwaj
Institute of Electrical and Electronics Engineers (IEEE)
Olfaction transduction mechanism is triggered by the binding of odorants to the specific olfactory receptors (OR's) present in the nasal cavity. Different odorants stimulate different OR's due to the difference in shape, physical and chemical properties. In this paper, a deep neural network architecture DeepOlf, based on molecular features and fingerprints of odorants and ORs, to predict whether a chemical compound is a potential odorant or not along with its interacting OR is proposed. Odorant identification and Odorant-OR interaction were modeled as a binary classification through multiple classifiers. The evaluation of these classifier's performance showed that the deep-neural network framework not only fits data with better accuracy in comparison to other classical methods (SVM, RF, k-NN) but also able to predict odorant-OR interactions more accurately. To our knowledge, this study is the first realization of deep learning ideas for the problem of odorant and interacting OR prediction. The accuracy of DeepOlf was found to be 94.83 and 99.92 percent for the prediction of odorants and Odorant- OR interactions respectively. Comparison of DeepOlf prediction with the existing SVM based prediction server, ODORactor, showed that better performance can be achieved with the proposed deep learning approach. The DeepOlf tool can be accessed at https://bioserver.iiita.ac.in/deepolf/.
Viswajit Mulpuru, Rahul Semwal, Pritish Kumar Varadwaj, and Nidhi Mishra
Bentham Science Publishers Ltd.
Background: Antimicrobial peptides (AMPs) can defend the hosts against various pathogens and are found in almost every life form from microorganisms to humans. As the rapid increase of drug-resistant strains in recent years is presenting a serious challenge to healthcare, antimicrobial peptides (AMPs) can revolutionize the antimicrobial development against the drugresistant microbes. Objective: The objective was to encourage the study on the human microbiome towards the inhibition of drug-resistant bacteria by the development of a database containing antimicrobial peptides from the human microbiome. Methods: This database is an outcome of an extended analysis of human metagenome, involving the prediction of coding regions, extraction of peptides, prediction of antimicrobial peptides, and modeling their structure utilizing different in silico tools. Furthermore, an intelligent hash function-based query engine was designed to validate the novelty of specific candidate peptide over the reported Knowledge-base. Result and Discussion: This Knowledge-base currently focuses on antimicrobial peptide sequences (AMPs) predicted from the human microbiome along with their 3D structures modeled using various modeling and molecular dynamics approaches. It includes a total of 1087 unique AMPs from various body sites, with 454 AMPs from the oral cavity, 180 AMPs from the gastrointestinal tract, 42 AMPs from the skin, 12 AMPs from the airway, 6 AMPs from the urogenital tract and 393 AMPs from undefined body locations. A scoring matrix has been generated based on the similarity scores of the sequences that have been incorporated into the Knowledge-base. Furthermore, a Jmol applet is included in the website to help users visualize the 3D structures. Conclusion: The information and functions of the Knowledge-base can offer great help in finding novel antimicrobial drugs, especially towards finding inhibitors for drug-resistant bacteria. The HAMP is freely available at https://bioserver.iiita.ac.in/amp/index.html.
Imlimaong Aier, Rahul Semwal, Anju Sharma, and Pritish Kumar Varadwaj
Informa UK Limited
Abstract Pancreatic ductal adenocarcinoma (PDAC) is a major health issue that has been eluding efforts to identify viable therapeutic treatment options. Besides having the lowest survival rate among all types of cancer, almost all conventional methods of treatment are futile against this condition, leaving patients to succumb to this ailment faster than ever. As it is increasingly becoming difficult to come up with new compounds for the treatment of various diseases, alternative solutions are required for tackling these problems. In this study, publically available miRNA and gene expression data were used to identify common elements that were present in gemcitabine-resistant PDAC cell lines. By selecting overexpressed genes involved in pancreatic cancer and cancer pathways in general, potential drug candidates for the treatment of PDAC were identified. In this study, 21 differentially expressed miRNAs were identified from PANC-1 cell line treated with gemcitabine. Pathway analysis revealed that MET and PPARG were overexpressed in cancer-related pathways, including pancreatic cancer, and could be targeted for PDAC treatment. Using CMap, fisetin was identified a likely candidate drug for the treatment of PDAC. Docking studies indicated that fisetin was bound to c-Met and PPARG with an XP G score of –12.819 and –7.021 kcal/mol, respectively. As miRNAs have increasingly been shown to part take in important cancer-related processes and pathways, researching drug development methods based on miRNA targets could be beneficial for pharmaceutical industries. Communicated by Ramaswamy H. Sarma
Rahul Semwal, Imlimaong Aier, Pankaj Tyagi, and Pritish Kumar Varadwaj
Informa UK Limited
Abstract With the advancement of high throughput techniques, the discovery rate of enzyme sequences has increased significantly in the recent past. All of these raw sequences are required to be precisely mapped to their respective functional attributes, which helps in deciphering their biological role. In the recent past, various prediction models have been proposed to predict the enzyme functional class; however, all of these models were able to quantify at most six functional enzyme classes (EC1 to EC6) out of existing seven functional classes, making these approaches inappropriate for handling enzymes corresponding to the seventh functional class (EC7). In this study, a Deep Neural Network-based approach, DeEPn, has been proposed, which can quantify enzymes corresponding to all seven functional classes with high precision and accuracy. The proposed model was compared with two recently developed tools, ECPred and SVM-Prot. The result demonstrated that DeEPn outperformed ECPred and SVM-Prot in terms of predictive quality. The DeEPn tool has been hosted as a web-based tool at https://bioserver.iiita.ac.in/DeEPn/. Communicated by Ramaswamy H. Sarma.
Imlimaong Aier, Rahul Semwal, Utkarsh Raj, and Pritish Kumar Varadwaj
Informa UK Limited
Abstract Pancreatic ductal adenocarcinoma (PDAC) is a pancreatic malignancy suffering from poor prognosis; the worst among all types of cancer. Chemotherapy, which is the standard regime for treatment in most cases, is often rendered useless as drug resistance quickly sets in after prolonged exposure to the drug. The implication of PAX2 transcription factor in regulating several ATP-binding cassette (ABC) transporter proteins that are responsible for the acquisition of drug resistance in PDAC makes it a potential target for treatment purposes. In this study, the 3D structure of PAX2 protein was modeled, and the response of key amino acids to perturbation was identified. Subsequently, kappadione, a vitamin K derivative, was found to bind efficiently to PAX2 with a binding energy of −9.819 kcal/mol. The efficacy of mechanism and mode of binding was studied by docking the protein with DNA in the presence and absence of the drug. The presence of kappadione disrupted DNA binding with key effector resides, preventing the DNA from coming into contact with the binding region essential for protein translation. By occupying the DNA binding region and replacing it with a ligand, the mechanism by which DNA interacts with PAX2 could be manipulated. Inhibition of PAX2-DNA binding using kappadione and other small molecules can prove to be beneficial for combating chemoresistance in PDAC, as proposed through in silico approaches. Communicated by Ramaswamy H. Sarma
Utkarsh Raj, Imlimaong Aier, Rahul Semwal, and Pritish Kumar Varadwaj
Springer Science and Business Media LLC
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Mausami Mondal, Rahul Semwal, Utkarsh Raj, Imlimaong Aier, and Pritish Kumar Varadwaj
Springer Science and Business Media LLC
Gene expression levels obtained from microarray data provide a promising technique for doing classification on cancerous data. Due to the high dimensionality of the microarray datasets, the redundant genes need to be removed and only significant genes are required for building the classifier. In this work, an entropy-based method was used based on supervised learning to differentiate between normal tissue and breast tumor based on their gene expression profiles. This work employs four widely used machine learning techniques for breast cancer prediction, namely support vector machine (SVM), random forest, k -nearest neighbor (KNN) and naive Bayes. The performance of these techniques was evaluated on four different classification performance measurements which result in getting more accuracy in case of SVM as compared to other machine learning algorithms. Classification accuracy of 91.5% was achieved by support vector machine with 0.833 F 1 measures. Furthermore, these techniques were evaluated on the basis of performance by ROC curve and calibration graph.
Rahul Semwal and Pritish Kumar Varadwaj
Bentham Science Publishers Ltd.
Aims: To develop a tool that can annotate subcellular localization of human proteins. Background: With the progression of high throughput human proteomics projects, an enormous amount of protein sequence data has been discovered in the recent past. All these raw sequence data require precise mapping and annotation for their respective biological role and functional attributes. The functional characteristics of protein molecules are highly dependent on the subcellular localization/ compartment. Therefore, a fully automated and reliable protein subcellular localization prediction system would be very useful for current proteomic research. Objective: To develop a machine learning-based predictive model that can annotate the subcellular localization of human proteins with high accuracy and precision. Methods: In this study, we used the PSI-CD-HIT homology criterion and utilized the sequence-based features of protein sequences to develop a powerful subcellular localization predictive model. The dataset used to train the HumDLoc model was extracted from a reliable data source, Uniprot knowledge base, which helps the model to generalize on the unseen dataset. Result : The proposed model, HumDLoc, was compared with two of the most widely used techniques: CELLO and DeepLoc, and other machine learning-based tools. The result demonstrated promising predictive performance of HumDLoc model based on various machine learning parameters such as accuracy (≥97.00%), precision (≥0.86), recall (≥0.89), MCC score (≥0.86), ROC curve (0.98 square unit), and precision-recall curve (0.93 square unit). Conclusion: In conclusion, HumDLoc was able to outperform several alternative tools for correctly predicting subcellular localization of human proteins. The HumDLoc has been hosted as a web-based tool at https://bioserver.iiita.ac.in/HumDLoc/.
Imlimaong Aier, Rahul Semwal, Aiindrila Dhara, Nirmalya Sen, and Pritish Kumar Varadwaj
Public Library of Science (PLoS)
Pancreatic ductal adenocarcinoma (PDAC) is notoriously difficult to treat due to its aggressive, ever resilient nature. A major drawback lies in its tumor grade; a phenomenon observed across various carcinomas, where highly differentiated and undifferentiated tumor grades, termed as low and high grade respectively, are found in the same tumor. One eminent problem due to such heterogeneity is drug resistance in PDAC. This has been implicated to ABC transporter family of proteins that are upregulated in PDAC patients. However, the regulation of these transporters with respect to tumor grade in PDAC is not well understood. To combat these issues, a study was designed to identify novel genes that might regulate drug resistance phenotype and be used as targets. By integrating epigenome with transcriptome data, several genes were identified based around high grade PDAC. Further analysis indicated oncogenic PAX2 transcription factor as a novel regulator of drug resistance in high grade PDAC cell lines. It was observed that silencing of PAX2 resulted in increased susceptibility of high grade PDAC cells to various chemotherapeutic drugs. Mechanistically, the study showed that PAX2 protein can bind and alter transcriptionally; expression of many ABC transporter genes in high grade PDAC cell lines. Overall, the study indicated that PAX2 significantly upregulated ABC family of genes resulting in drug resistance and poor survival in PDAC.
Imlimaong Aier, Rahul Semwal, Anju Sharma, and Pritish Kumar Varadwaj
Elsevier BV
Anju Sharma, Rajnish Kumar, Imlimaong Aier, Rahul Semwal, Pankaj Tyagi, and Pritish Varadwaj
Bentham Science Publishers Ltd.
Olfaction, the sense of smell detects and discriminate odors as well as social cues which influence our innate responses. The olfactory system in human beings is found to be weak as compared to other animals; however, it seems to be very precise. It can detect and discriminate millions of chemical moieties (odorants) even in minuscule quantities. The process initiates with the binding of odorants to specialized olfactory receptors, encoded by a large family of Olfactory Receptor (OR) genes belonging to the G-protein-coupled receptor superfamily. Stimulation of ORs converts the chemical information encoded in the odorants, into respective neuronal action-potentials which causes depolarization of olfactory sensory neurons. The olfactory bulb relays this signal to different parts of the brain for processing. Odors are encrypted using a combinatorial approach to detect a variety of chemicals and encode their unique identity. The discovery of functional OR genes and proteins provided an important information to decipher the genomic, structural and functional basis of olfaction. ORs constitute 17 gene families, out of which 4 families were reported to contain more than hundred members each. The olfactory machinery is not limited to GPCRs; a number of non- GPCRs is also employed to detect chemosensory stimuli. The article provides detailed information about such olfaction machinery, structures, transduction mechanism, theories of odor perception, and challenges in the olfaction research. It covers the structural, functional and computational studies carried out in the olfaction research in the recent past.
Rahul Semwal, Imlimaong Aier, Pritish Kumar Varadwaj, and Slava Antsiperov
Springer International Publishing
To carry out functional annotation of proteins, the most crucial step is to identify the ligand binding site (LBS) information. Although several algorithms have been reported to identify the LBS, most have limited accuracy and efficiency while considering the number and type of geometrical and physio-chemical features used for such predictions. In this proposed work, a fast and accurate algorithm “PROcket” has been implemented and discussed. The algorithm uses grid-based approach to cluster the local residue neighbors that are present on the solvent accessible surface of proteins. Further with inclusion of selected physio-chemical properties and phylogenetically conserved residues, the algorithm enables accurate detection of the LBS. A comparative study with well-known tools; LIGSITE, LIGSITECS, PASS and CASTptool was performed to analyze the performance of our tool. A set of 48 ligand-bound protein structures from different families were used to compare the performance of the tools. The PROcket algorithm outperformed the existing methods in terms of quality and processing speed with 91% accuracy while considering top 3 rank pockets and 98% accuracy considering top 5 rank pockets.
Rahul Semwal, Imlimaong Aier, Utkarsh Raj, and Pritish Kumar Varadwaj
Springer Science and Business Media LLC
The term pharmacophore is used to define the important features of one or more molecules having the same biological activity. Pharmacophores are selected based on several common features, such as the type of functional groups present, the distance between each atom or group of atoms and the angle between such groups or an individual atom. In this paper, we present the design and implementation of a pharmacophore searching tool, Pharmadoop, using the Hadoop framework. Due to its Hadoop implementation, Pharmadoop is a faster approach as compared to the existing standalone pharmacophore search tools. It utilizes the MapReduce algorithm to support the comparison of millions of conformers in a short time span. We further demonstrated and compared the utility of Pharmadoop on ten distinct chemical datasets of ligand molecules by running common substructure searching job on standalone and multi-system Hadoop platforms. These results were further used to perform pharmacophore searching applications on standalone and multi-node Hadoop distributions. The performance, speed and accuracy of the tool were evaluated through time-scale analysis and receiver operating curve. The Pharmadoop tool can be accessed at http://bioserver.iiita.ac.in/Pharmadoop/.
Utkarsh Raj, Imlimaong Aier, Rahul Semwal, and Pritish Kumar Varadwaj
Springer Science and Business Media LLC
Breast cancer is the most common cancer in women both in the developed and less developed countries, and it imposes a considerable threat to human health. Therefore, in order to develop effective targeted therapies against Breast cancer, a deep understanding of its underlying molecular mechanisms is required. The application of deep transcriptional sequencing has been found to be reported to provide an efficient genomic assay to delve into the insights of the diseases and may prove to be useful in the study of Breast cancer. In this study, ChIP-Seq data for normal samples and Breast cancer were compared, and differential peaks identified, based upon fold enrichment (with P-values obtained via t-tests). The Protein–protein interaction (PPI) network analysis was carried out, following which the highly connected genes were screened and studied, and the most promising ones were selected. Biological pathway involved in the process were then identified. Our findings regarding potential Breast cancer-related genes enhances the understanding of the disease and provides prognostic information in addition to standard tumor prognostic factors for future research.