@pace.ac.in
Assist Processor & Electronics and Communication Engineering
PACE INSTITUTE OF TECHNOLOGY & SCIENCES
Assistant Professor in ECE Dept., PACE INSTITUTE OF TECHNOLOGY & SCIENCES. My research interests are computer vision and image processing.
I received my Ph. D in electronics and communication engineering as a specialization of machine learning from Koneru Lakshmaiah Education Foundation, India, in September 2020, supervised by Prof. A. S. C. S. Sastry and Prof. P. V. V. Kishore., I received my M. Tech in electronics and communication engineering as a specialization of signal processing from Koneru Lakshmaiah Education Foundation, India, in June 2016, supervised by Prof. P. V. V. Kishore, and B. Tech in electronics and communication engineering from Guntur Engineering College (JNTU Kakinada), India, in 2014.
Computer vision, Machine Learning, Deep Learning, Gesture Recognition.
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
P. V. V. Kishore, D. Anil Kumar, and K. Srinivasa Rao
Springer Science and Business Media LLC
Anil Kumar D., Kishore P.V.V., Chaithanya T.R., and Sravani K.
Elsevier BV
D. Anil Kumar, E. Kiran Kumar, M. Suneetha, and L. Rajasekhar
AIP Publishing
E. Kiran Kumar, B. Pavan Kumar, L. Rajasekhar, K. Siri Chandana, and D. Anil Kumar
AIP Publishing
E. Kiran Kumar, D. Anil Kumar, T. Manwitha, and G. Yaswanth Sai
AIP Publishing
E. Kiran Kumar, D. Anil Kumar, K. Murali, P. Sasi Kiran, and M. Teja Kiran Kumar
AIP Publishing
P. V. V. Kishore, D. Anil Kumar, P. Praveen Kumar, D. Srihari, N. Sasikala, and L. Divyasree
Institute of Electrical and Electronics Engineers (IEEE)
P. V. V. Kishore, D. Anil Kumar, Rama Chaithanya Tanguturi, K. Srinivasarao, P. Praveen Kumar, and D. Srihari
Institute of Electrical and Electronics Engineers (IEEE)
Previous works on 3D joint based feature representations of the human body as colour coded images (maps) were developed based on the joint positions, distances and angles or a combination of them for applications such as human action (sign language) recognition. These 3D joint maps have shown to singularly characterize both the spatial and temporal relationships between skeletal joints describing an action (sign). Consequently, the joint position and motion identification problem transformed into an image classification problem for 3D skeletal sign language (action) recognition. However, the previously proposed process of transforming 3D skeletal joints to colour coded maps has a negative proportionality component which resulted in a map with small pixel densities when the joint relationships are high. This drawback greatly impacts the learning of the classifiers to quantify the joint relationships within the colour coded maps. We hypothesized that a positive proportionality between joint motions and the corresponding maps would certainly improve classifiers performance. Hence, joint motion affinity maps(JMAM). These JMAMs use radial basis kernel on joint distances which assures a positive proportionality constant between joint motions and pixel densities of colour coded maps. To further improve the classification of 3D sign language, this work proposes congruent body part joints which results in motion directed JMAMs with maximally discriminating positive definite spatio temporal features. Finally, JMAMs are trained on the proposed multi-resolution convolutional neural network with spatial attention (MRCNNSA) architecture which produces an influencing result for the constructed 3D sign language data, KL3DISL. Consequently, online 3D datasets and standard deep learning models benchmark the proposed with respect to sign and action recognition. The results conclude that JMAMs with clustered joints characterize the subtle relationships which are otherwise difficult to be learned by a classifier.
S Sampath, Mudarakola Lakshmi Prasad, Mohammad Manzoor Hussain, R Parameswari, D Anil Kumar, and Pundru Chandra Shaker Reddy
IEEE
Living in a major metropolitan area has been linked to an increased risk of developing multiple forms of chronic-kidney-disease(CKD). In developed nations, predicting CKDs is a top priority. Predictive analytics for the purpose of predicting CKDs are the primary focus of this work. However, it is getting harder and harder to forecast outcomes for massive samples. While doing so, the MapReduce architecture makes it possible to write predictive algorithms by combining map and reduce operations. Problems with the scalability and effectiveness of anticipative learning approaches are alleviated by the comparatively straightforward programming interface. To efficiently handle small subsets of massive datasets, the authors propose using an iterative weighted mapreduce approach. Ensemble-nonlinear support-vector-machines(ENSVM) and random-forests(RF) are used to design a binary classification issue. As a result, the suggested approach generates nonlinear blends of kernel activations in example prototypes, as opposed to the conventional linear combination of activations. In addition, an ensemble of deep-SVM is utilized to integrate the descriptors, with the product rule being employed to merge the classifiers' likelihood estimates. Prediction accuracy and results interpretability are used to gauge performance.
D. Anil Kumar, P. V. V. Kishore, G.V.K. Murthy, T. R. Chaitanya, and SK. Subhani
IEEE
In action recognition, varied views are very challenging task due to same action appeared in different from different views. To address these problems, we proposed a novel ActionNet framework for perspective view invariant human action recognition system based on convolutional neural networks (CNNs) trained with multi-view dataset captured by 5 depth cameras. Recently, maps are cauterized geometric feature like joint locations, distance, angle, velocity, or combination of features used for skeleton based action recognition. Against their success, features represent earlier find complex in representing a relative variation in 3D actions. Hence, we introduced a novel spatiotemporal color-coded image maps called a joint relational surface maps (JRSMs). JRSMs are calculated subset of three joints in a sequential order covering all joints. In literature, single view depth data with multi steam CNN to recognize human actions but cannot recognize accurately view invariant actions. In this work, we trained by multi view action data with single stream deep CNN model for recognizing view invariant actions. To test the performance of proposed architecture, we compare our results with other state-of-the-art action recognition architecture by using our own multi-view 3D skeleton action dataset named as KLU3DAction and two benchmark skeleton action datasets like NTU RGB-D and PKU-MMD.
D. Anil Kumar, T. Suresh Babu, E. Sai Gowtham, M. Anusha Chandana, G. V. Vineelka, and K. Narendra Reddy
IEEE
The classification of Indian classical dance is complex because it contains complex human body parts. The Identification and recognition of classic dance is a challenging task due to every dance gesture consisting of tough single-hand, double-hand, body movements, leg movements, facial emotions and background Music. In this work, we have tried five different songs, every song consists of thirty to forty categories. For the identification and classification of dance, we have proposed a convolutional neural network, which performs better than various state-of-the-art works. The inputs to the proposed model were captured from Microsoft Kinect, we analogize and separate machine learning models and deep learning models. The achieved recognition score of our work is 95.63
P. V. V. Kishore, D. Anil Kumar, SK. Khwaja Moinuddin, L. Divyasree, and E. Kiran Kumar
IEEE
The goal of this work is to develop a deep Indian classical dance classifier on online bharatanatyam videos to assist amateurish dance seekers. Previous learning models demonstrated that the global feature representations of dance poses in videos with unpredictable backgrounds have unreliable performance metrics.Therfore, tiny dance datasets with few dancers in controlled environments were used rather than recording from live performances. In this work, the random pixel distributions of the dancer in the online videos with cluttered background are being emphasized using multi frame multi head layer attention (MFMHLA) on deep ResNet features at different resolutions across deep layers. This results in a chronological enhancement of the pose at multiple resolutions across the depth. The experiments were conducted on our online bharatanatyam ICD, BOICDVD22 with 10 songs. The results conclude that the presence of MFMHLA has improved pose feature representations of online dance videos burdened with deformations.
Rejeti Hima Sameer, S. Rambabu, P. V. V. Kishore, D. Anil Kumar, and M. Suneetha
Springer Nature Singapore
D. Anil Kumar, A.S.C.S. Sastry, P.V.V. Kishore, and E. Kiran Kumar
Elsevier BV
Abstract 3D Sign language recognition is challenging from capturing to recognition. 3D signs are a set of spatio temporal variations of hands and fingers with respect to face, head and torso. 3D motion capture technology has enabled us to capture these complex 3D human motions preserving 95% of the visual information required for recognition. A twin motion algorithm is proposed to recognize 3D signs with variable motion joints. Variable motions in joints arise duo to non-uniform distances between the joints. For example, finger motions are different from hand motions. A common measure to extract motion features from 3D skeletal data is relative range of joint relative distance (RRJRD). However, relative range of joint relative distance cannot quantify all the relative joint motions for characterizing a sign because of the different in motion ranges between different body parts used in defining a sign. Hence, we propose a wide RRJRD and narrow RRJRD based characterization to project the motion features on to graph. Each sign is characterized by a set of spatio temporal projections on to a constructed sign graph. The experiment results show that the proposed method is signer invariant, motion invariant and faster compared to state-of-the-art graph kernel methods.
Tummala Chandra Suhas, Pattem Om Prakash Ravi Teja, Nakirikanti Sai Rakesh, D. Anil Kumar, and P. V. V. Kishore
IEEE
Developing a highly accurate 3D action recognition framework using traditional feature maps on top of conventional neural networks has been limited by inter class discriminations of similar looking action sequences. We approach this problem through local joint Perimeter maps (LJPM) on skeletal action datasets that are learned by deep metric learning (DML) process. We propose to solve gaps in training pipelines to attain higher accuracies using a feature embedding space which is learned using the triplet loss function. To test our approach, we applied our 3D motion captured action dataset, KLHA3D-102 and two other benchmarks, HDM05 and NTU RGB D. The results obtained show that the embedding features performed better due to triplet loss which maximized the separation between multiple classes similar features. Further it the triplet loss embedding has minimal false positive effects on 3D skeletal action data recognition tasks.
D. Arpitha, M. Balasubrahmanyam, and D. Anil Kumar
IEEE
I honorary present, an interesting application of Indian classical dance hasta mudra recognition as computer vision techniques. The dance mudras form a complex human gesture that are complicated to interpret by a machine. To solve this problem, I proposed a new framework for Indian classical dance recognition using depth sensor. First, the dance mudras of various classical dance datasets were created by using Microsoft Kinect (depth) sensor. Second, extract histogram of oriented (HOG) features of dance mudras as input of depth image and third, dance mudras are classified by using support vector machine (SVM) as converting dance mudras to text labels. The proposed framework tested by 50 dance mudras form different classical dance videos. To find the performance of proposed algorithm tested with different features and state-of-the-art methods.
E. Kiran Kumar, P.V.V. Kishore, D. Anil Kumar, and M. Teja Kiran Kumar
Elsevier BV
Abstract Machine translation of sign language is a critical task of computer vision. In this work, we propose to use 3D motion capture technology for sign capture and graph matching for sign recognition. Two problems related to 3D sign matching are addressed in this work: (1) how to identify same signs with different number of motion frames and (2) sign extraction from a clutter of non-sign hand motions. These two problems make the 2D or 3D sign language machine translation a challenging task. We propose graph matching with early estimation model to address these problems in two phases. The first phase consists of intra graph matching for motion frame extraction, which retains motion intensive frames in database and query 3D videos. The second phase applies inter graph matching with early estimation model on motion extracted query and dataset 3D videos. The proposed model increases the speed of the graph matching algorithm in estimating a sign with fewer frames. To test the graph matching model, we recorded 350 words of Indian sign language with 3D motion capture technology. For testing 4 variations per sign are captured for all signs with 5 different signers at same, slower, faster hand speeds and sign mixed cluttered hand motions. The early estimation graph matching model is tested for accuracy and efficiency in classifying 3D signs with the two induced real time constraints. In addition to 3D sign language dataset, the proposed method is validated on five benchmark datasets and against the state-of-the-art graph matching methods.
M. Teja Kiran Kumar, P. V. V. Kishore, B. T. P. Madhav, D. Anil Kumar, N. Sasi Kala, K. Praveen Kumar Rao, and B. Prasad
Institute of Electrical and Electronics Engineers (IEEE)
3D skeletal based action recognition is being practiced with features extracted from joint positional sequence modeling on deep learning frameworks. However, the spatial ordering of skeletal joints during the entire action recognition lifecycle is found to be fixed across datasets and frameworks. Intuition inspired us to investigate through experimentation, the influence of multiple random skeletal joint ordered features on the performance of deep learning systems. Therefore, the argument: can joint order independent learning for skeletal action recognition practicable? If practicable, the goal is to discover how many different types of randomly ordered joint feature representations are sufficient for training deep networks. Implicitly, we further investigated on multiple features and deep networks that recorded highest performance on jumbled joints. This work proposes a novel idea of learning skeletal joint volumetric features on a spectrally graded CNN to achieve joint order independence. Intuitively, we propose 4 joint features called as quad joint volumetric features (QJVF), which are found to offer better spatio temporal relationships between time series joint data when compared to existing features. Consequently, we propose a Spectrally graded Convolutional Neural Network (SgCNN) to characterize spatially divergent features extracted from jumbled skeletal joints. Finally, evaluation of the proposed hypothesis has been experimented on our 3D skeletal action KLHA3D102, KLYOGA3D datasets along with benchmarks, HDM05, CMU and NTU RGB D. The results demonstrated that the joint order independent feature learning is achievable on CNNs trained on quantified spatio temporal feature maps extracted from randomly shuffled skeletal joints from action sequences.
D. Anil Kumar, P. Sudheer Chakravarthi, and K. Suresh Babu
IEEE
In general, Indian economy highly depends on agricultural productivity. In the agricultural field, the identification and classification of leaf diseases play an important role. In developing countries, physical observation of plant leaf diseases can be prohibitively expensive due to the naked eye observation. The proposed research work has developed a framework to identify and classify different plant leaf diseases using K-means segmentation with a multiclass support vector machine (SVM) based classification. The proposed framework is implemented in four steps, step I performs the RGB to HSI colour transformation. In step-II, image segmentation using K-means clustering is performed. Next, colour, texture and shape features are extracted in step III. Finally, in step-IV, multiclass SVM is used for the extracted feature classification. Experimental results indicate that the proposed approach results in an improved detection and classification compared to other existing methods. Efficiency of the proposed algorithm recognizes the accuracy of leaf diseases at about 95.7%.
D. Srihari, P. V. V. Kishore, E. Kiran Kumar, D. Anil Kumar, M. Teja Kiran Kumar, M. V. D. Prasad, and Ch. Raghava Prasad
Springer Science and Business Media LLC
Appearance and depth-based action recognition has been researched exclusively for improving recognition accuracy by considering motion and shape recovery particulars from RGB-D video data. Convolutional neural networks (CNN) have shown evidences of superiority on action classification problems with spatial and apparent motion inputs. The current generation of CNNs use spatial RGB videos and depth maps to recognize action classes from RGB-D video. In this work, we propose a 4-stream CNN architecture that has two spatial RGB-D video data streams and two apparent motion streams, with inputs extracted from the optical flow of RGB-D videos. Each CNN stream is packed with 8 convolutional layers, 2 dense and one SoftMax layer, and a score fusion model to merge the scores from four streams. Performance of the proposed 4-stream action recognition framework is tested on our own action dataset and three benchmark datasets for action recognition. The usefulness of the proposed model is evaluated with state-of-the-art CNN architectures for action recognition.
E. Kiran Kumar, P.V.V. Kishore, M. Teja Kiran Kumar, and D. Anil Kumar
Elsevier BV
Abstract Currently, one of the challenging and most interesting human action recognition (HAR) problems is the 3D sign language recognition problem. The sign in the 3D video can be characterized in the form of 3D joint location information in 3D space over time. Therefore, the objective of this study is to construct a color coded topographical descriptor from joint distances and angles computed from joint locations. We call these two color coded images the joint distance topographic descriptor (JDTD) and joint angle topographical descriptor (JATD) respectively. For the classification we propose a two stream convolutional neural network (2CNN) architecture, which takes as input the color-coded images JDTD and JATD. The two independent streams were merged and concatenated together with features from both streams in the dense layer. For a given query 3D sign (or action), a list of class scores was obtained as a text label corresponding to the sign. The results showed improvement in classifier performance over the predecessors due to the mixing of distance and angular features for predicting closely related spatio temporal discriminative features. To benchmark the performance of our proposed model, we compared our results with the state-of-the-art baseline action recognition frameworks by using our own 3D sign language dataset and two publicly available 3D mocap action datasets, namely, HDM05 and CMU.
P. Vasavi, Suman Maloji, E. Kiran, D. Anil, and N. Sasikala
The Science and Information Organization
Hand gestures with finger relationships are among the toughest features to extract for machine recognition. In this paper, this particular research challenge is addressed with 3D hand joint features extracted from distance measurements which are then colour mapped as spatio temporal features. Further patterns are learned using an 8-layer convolutional neural network (CNN) to estimate the hand gesture. The results showed a higher degree of recognition accuracy when compared to similar 3D hand gesture methods. The recognition accuracy for our dataset KL 3DHG with 220 classes was around 94.32%. Robustness of the proposed method was validated with only available benchmark 3D skeletal hand gesture dataset DGH 14/28.