@cqu.edu.au
Associate Professor in Information and Communications Technology
Central Queensland University
Computer Science, Artificial Intelligence, Computer Vision and Pattern Recognition, Multidisciplinary
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Wenju Zhang, Yaowu Wang, Leifeng Guo, Greg Falzon, Paul Kwan, Zhongming Jin, Yongfeng Li, and Wensheng Wang
MDPI AG
Standing and lying are the fundamental behaviours of quadrupedal animals, and the ratio of their durations is a significant indicator of calf health. In this study, we proposed a computer vision method for non-invasively monitoring of calves’ behaviours. Cameras were deployed at four viewpoints to monitor six calves on six consecutive days. YOLOv8n was trained to detect standing and lying calves. Daily behavioural budget was then summarised and analysed based on automatic inference on untrained data. The results show a mean average precision of 0.995 and an average inference speed of 333 frames per second. The maximum error in the estimated daily standing and lying time for a total of 8 calf-days is less than 14 min. Calves with diarrhoea had about 2 h more daily lying time (p < 0.002), 2.65 more daily lying bouts (p < 0.049), and 4.3 min less daily lying bout duration (p = 0.5) compared to healthy calves. The proposed method can help in understanding calves’ health status based on automatically measured standing and lying time, thereby improving their welfare and management on the farm.
Andrew J. Shepley, Greg Falzon, Paul Kwan, and Ljiljana Brankovic
Institute of Electrical and Electronics Engineers (IEEE)
Confluence is a novel non-Intersection over Union (IoU) alternative to Non-Maxima Suppression (NMS) in bounding box post-processing in object detection. It overcomes the inherent limitations of IoU-based NMS variants to provide a more stable, consistent predictor of bounding box clustering by using a normalized Manhattan Distance inspired proximity metric to represent bounding box clustering. Unlike Greedy and Soft NMS, it does not rely solely on classification confidence scores to select optimal bounding boxes, instead selecting the box which is closest to every other box within a given cluster and removing highly confluent neighboring boxes. Confluence is experimentally validated on the MS COCO and CrowdHuman benchmarks, improving Average Precision by 0.2--2.7% and 1--3.8% respectively and Average Recall by 1.3--9.3 and 2.4--7.3% when compared against Greedy and Soft-NMS variants. Quantitative results are supported by extensive qualitative analysis and threshold sensitivity analysis experiments support the conclusion that Confluence is more robust than NMS variants. Confluence represents a paradigm shift in bounding box processing, with potential to replace IoU in bounding box regression processes.
Paul Kwan, Tayab D. Memon, Saad S. Hashmi, Flemming Rhode, and Rajan Kadel
MDPI AG
Results of recent studies have suggested that intensive methods of delivery might improve engagement, attendance, and achievement for students from diverse backgrounds. Contributing to this area of inquiry, this study assesses how students perceived their experience studying a certificate course that was delivered in an online intensive block mode and flipped classroom (BMFC), pedagogy amidst COVID-19 restrictions. The subjects were students enrolled at Melbourne Institute of Technology between July 2021 and January 2022 across four certificate courses, three at postgraduate and one at undergraduate level. These certificate courses differed from normal degree courses in several aspects: (a) a shorter 4-week (undergraduate) or 5-week (postgraduate), instead of a 12-week duration, (b) subjects were taken sequentially instead of concurrently as in a normal semester, (c) taught using an online flipped classroom rather than the in-class approach, and (d) open to both high-school leavers and mature aged students who did not study full-time. A questionnaire involving 10 perception-based questions was used to survey students’ satisfaction with the BMFC delivery, in relation to their learning and engagement experience. The mean, median, and mode calculated from the responses revealed that students regarded the BMFC approach as more satisfied than not on a 5-star rating scale in 7 out of the 10 questions. This is further supported by high correlations among the questions (the lowest at r = 0.48 and the highest at r = 0.87). Multiple regression analysis using the first nine questions as predictors of the 10th question (overall satisfaction) revealed that six of these are statistically significant predictors (p < 0.05) of the overall satisfaction, implying that an increase in the overall satisfaction can potentially be achieved by improving these key factors of the BMFC delivered certificate courses. Our findings correlate with existing research that student learning and engagement might be improved by intensive modes of delivery. Furthermore, the BMFC pedagogy proposed in our study differentiates us from existing research, where block scheduling was used only in a face-to-face delivery in pre COVID-19 environment. Our study, therefore, contributes a novel delivery method for learning and teaching that is suitable for both online and face-to-face mode in a post COVID-19 era.
Yulin Shen, Benoît Mercatoris, Zhen Cao, Paul Kwan, Leifeng Guo, Hongxun Yao, and Qian Cheng
MDPI AG
Yield prediction is of great significance in agricultural production. Remote sensing technology based on unmanned aerial vehicles (UAVs) offers the capacity of non-intrusive crop yield prediction with low cost and high throughput. In this study, a winter wheat field experiment with three levels of irrigation (T1 = 240 mm, T2 = 190 mm, T3 = 145 mm) was conducted in Henan province. Multispectral vegetation indices (VIs) and canopy water stress indices (CWSI) were obtained using an UAV equipped with multispectral and thermal infrared cameras. A framework combining a long short-term memory neural network and random forest (LSTM-RF) was proposed for predicting wheat yield using VIs and CWSI from multi-growth stages as predictors. Validation results showed that the R2 of 0.61 and the RMSE value of 878.98 kg/ha was achieved in predicting grain yield using LSTM. LSTM-RF model obtained better prediction results compared to the LSTM with n R2 of 0.78 and RMSE of 684.1 kg/ha, which is equivalent to a 22% reduction in RMSE. The results showed that LSTM-RF considered both the time-series characteristics of the winter wheat growth process and the non-linear characteristics between remote sensing data and crop yield data, providing an alternative for accurate yield prediction in modern agricultural management.
Abdulaziz Salamah Aljaloud, Diaa Mohammed Uliyan, Adel Alkhalil, Magdy Abd Elrhman, Azizah Fhad Mohammed Alogali, Yaser Mohammed Altameemi, Mohammed Altamimi, and Paul Kwan
Institute of Electrical and Electronics Engineers (IEEE)
Learning Management Systems (LMSs) are increasingly utilized for the administration, tracking, and reporting of educational activities. One such widely used LMS in higher education institutions around the world is Blackboard. This is due to its capabilities of aligning items of learning content, student-student and student-teacher interactions, and assessment tasks to specified goals and student learning outcomes. This study aimed to determine how certain Key Performance Indicators (KPIs) based on student interactions with Blackboard helped to forecast the learning outcomes of students. A mixed-methods study design was used which included analysis of four deep learning models for predicting student performance. Data were collected from reports on seven general preparation courses. They were analyzed using a documentary analysis approach to establish possible predictive KPIs associated with the electronic Blackboard report. Correlational analyses were performed to examine the extent to which these factors are linearly correlated with the performance indicators of students. Results indicated that a predictive model which combined convolutional neural networks and long short-term memory (CNN-LSTM) was the optimal method among the four models tested. The main conclusion drawn from this finding is that the combined CNN-LSTM approach may lead to interventions that optimize and expand use of the Blackboard LMS in universities.
Tayab D. Memon, Monica Jurin, Paul Kwan, Tony Jan, Nandini Sidnal, and Nazmus Nafi
MDPI AG
This article describes an empirical study to evaluate how the flipped learning (FL) approach has impacted a learner’s perception in attaining the graduate attributes (GAs) of five capstone project units offered at Melbourne Institute of Technology in Australia, where the authors are affiliated. The subjects include one undergraduate and one postgraduate business unit, and one undergraduate and two postgraduate units in networking. Our study is distinguished from previous research in two novel aspects. First, the subject matter concerns capstone project units which are taken by students in the final year of their degree. In these units, students are expected to apply a variety of knowledge and skills that they have acquired thus far in carrying out an industry-based project of substantial complexity. The learning outcomes (LOs) require students to apply skills and knowledge that they have learned across completed units and connect them with real-world problems. Second, the FL approach has been applied wholly in an online virtual classroom setting due to the social distancing restrictions enforced by local authorities in response to the COVID-19 pandemic. Our hypothesis is that FL has positively influenced the perception of learners in their attaining the GAs. We tested this hypothesis by using data collected by an online survey administered to the student cohorts of the five chosen units at the end of Trimester 1 of 2021. The survey, which comprised 14 questions, assesses a student’s perception of achieving the LOs through developments in three dimensions, including cognitive, affective, and behavioural, acquired in a real-world client setting. Statistical analyses of the survey data reveal that the FL approach resulted in a positive perception by students of their attaining the GAs through achieving the LOs of the capstone project units, which in turn is supported by the responses to the three measured dimensions.
Ali Shojaeipour, Greg Falzon, Paul Kwan, Nooshin Hadavi, Frances C. Cowley, and David Paul
MDPI AG
Livestock welfare and management could be greatly enhanced by the replacement of branding or ear tagging with less invasive visual biometric identification methods. Biometric identification of cattle from muzzle patterns has previously indicated promising results. Significant barriers exist in the translation of these initial findings into a practical precision livestock monitoring system, which can be deployed at scale for large herds. The objective of this study was to investigate and address key limitations to the autonomous biometric identification of cattle. The contributions of this work are fourfold: (1) provision of a large publicly-available dataset of cattle face images (300 individual cattle) to facilitate further research in this field, (2) development of a two-stage YOLOv3-ResNet50 algorithm that first detects and extracts the cattle muzzle region in images and then applies deep transfer learning for biometric identification, (3) evaluation of model performance across a range of cattle breeds, and (4) utilizing few-shot learning (five images per individual) to greatly reduce both the data collection requirements and duration of model training. Results indicated excellent model performance. Muzzle detection accuracy was 99.13% (1024 × 1024 image resolution) and biometric identification achieved 99.11% testing accuracy. Overall, the two-stage YOLOv3-ResNet50 algorithm proposed has substantial potential to form the foundation of a highly accurate automated cattle biometric identification system, which is applicable in livestock farming systems. The obtained results indicate that utilizing livestock biometric monitoring in an advanced manner for resource management at multiple scales of production is possible for future agriculture decision support systems, including providing useful information to forecast acceptable stocking rates of pastures.
Andrew Shepley, Greg Falzon, Paul Meek, and Paul Kwan
Wiley
Abstract A time‐consuming challenge faced by camera trap practitioners is the extraction of meaningful data from images to inform ecological management. An increasingly popular solution is automated image classification software. However, most solutions are not sufficiently robust to be deployed on a large scale due to lack of location invariance when transferring models between sites. This prevents optimal use of ecological data resulting in significant expenditure of time and resources to annotate and retrain deep learning models. We present a method ecologists can use to develop optimized location invariant camera trap object detectors by (a) evaluating publicly available image datasets characterized by high intradataset variability in training deep learning models for camera trap object detection and (b) using small subsets of camera trap images to optimize models for high accuracy domain‐specific applications. We collected and annotated three datasets of images of striped hyena, rhinoceros, and pigs, from the image‐sharing websites FlickR and iNaturalist (FiN), to train three object detection models. We compared the performance of these models to that of three models trained on the Wildlife Conservation Society and Camera CATalogue datasets, when tested on out‐of‐sample Snapshot Serengeti datasets. We then increased FiN model robustness by infusing small subsets of camera trap images into training. In all experiments, the mean Average Precision (mAP) of the FiN trained models was significantly higher (82.33%–88.59%) than that achieved by the models trained only on camera trap datasets (38.5%–66.74%). Infusion further improved mAP by 1.78%–32.08%. Ecologists can use FiN images for training deep learning object detection solutions for camera trap image processing to develop location invariant, robust, out‐of‐the‐box software. Models can be further optimized by infusion of 5%–10% camera trap images into training data. This would allow AI technologies to be deployed on a large scale in ecological applications. Datasets and code related to this study are open source and available on this repository: https://doi.org/10.5061/dryad.1c59zw3tx.
Andrew Shepley, Greg Falzon, Christopher Lawson, Paul Meek, and Paul Kwan
MDPI AG
Image data is one of the primary sources of ecological data used in biodiversity conservation and management worldwide. However, classifying and interpreting large numbers of images is time and resource expensive, particularly in the context of camera trapping. Deep learning models have been used to achieve this task but are often not suited to specific applications due to their inability to generalise to new environments and inconsistent performance. Models need to be developed for specific species cohorts and environments, but the technical skills required to achieve this are a key barrier to the accessibility of this technology to ecologists. Thus, there is a strong need to democratize access to deep learning technologies by providing an easy-to-use software application allowing non-technical users to train custom object detectors. U-Infuse addresses this issue by providing ecologists with the ability to train customised models using publicly available images and/or their own images without specific technical expertise. Auto-annotation and annotation editing functionalities minimize the constraints of manually annotating and pre-processing large numbers of images. U-Infuse is a free and open-source software solution that supports both multiclass and single class training and object detection, allowing ecologists to access deep learning technologies usually only available to computer scientists, on their own device, customised for their application, without sharing intellectual property or sensitive data. It provides ecological practitioners with the ability to (i) easily achieve object detection within a user-friendly GUI, generating a species distribution report, and other useful statistics, (ii) custom train deep learning models using publicly available and custom training data, (iii) achieve supervised auto-annotation of images for further training, with the benefit of editing annotations to ensure quality datasets. Broad adoption of U-Infuse by ecological practitioners will improve ecological image analysis and processing by allowing significantly more image data to be processed with minimal expenditure of time and resources, particularly for camera trap images. Ease of training and use of transfer learning means domain-specific models can be trained rapidly, and frequently updated without the need for computer science expertise, or data sharing, protecting intellectual property and privacy.
Beibei Xu, Wensheng Wang, Greg Falzon, Paul Kwan, Leifeng Guo, Zhiguo Sun, and Chunlei Li
Informa UK Limited
ABSTRACT Quadcopters equipped with machine learning vision systems are bound to become an essential technique for precision agriculture applications in pastures in the near future. This paper presents a low-cost approach for livestock counting jointly with classification and semantic segmentation which provide the potential of biometrics and welfare monitoring in animals in real time. The method used in the paper adopts the state-of-the-art deep-learning technique known as Mask R-CNN for feature extraction and training in the images captured by quadcopters. Key parameters such as IoU (Intersection over Union) threshold, the quantity of the training data and the effect the proposed system performs on various densities have been evaluated to optimize the model. A real pasture surveillance dataset is used to evaluate the proposed method and experimental results show that our proposed system can accurately classify the livestock with an accuracy of 96% and estimate the number of cattle and sheep to within 92% of the visual ground truth, presenting competitive advantages of the approach feasible for monitoring the livestock.
Beibei Xu, Wensheng Wang, Greg Falzon, Paul Kwan, Leifeng Guo, Guipeng Chen, Amy Tait, and Derek Schneider
Elsevier BV
Khalifa M Al Kindi, Paul Kwan, Nigel R Andrew, and Mitchell Welch
Wiley
AbstractBACKGROUNDOmmatissus lybicus de Bergevin (Hemiptera: Tropiduchidae) (Dubas Bug, DB) is an insect pest attacking date palms. It occurs in Arab countries including Oman. In this paper, the logistic, ordinary least square, and geographical weighted regressions were applied to model the absence/presence and density of DB against climate factors. A method is proposed for modelling spatially correlated prorations annually over the study period, based on annual and seasonal outbreaks. The historical 2006–2015 climate data were obtained from weather stations located in nine governorates in northern Oman, while dataloggers collected the 2017 microclimate data in eight of these nine governorates.RESULTSLogistic regression model showed the percentages of correctly predicted values using a cut‐off point of 0.5 were 90%, 88% and 84%, indicating good classification accuracy. OLS and GWR models showed an overall trend of strong linear correlation between DB infestation levels and short‐ and long‐term climate factors. The three models suggested that precipitation, elevation, temperature, humidity, wind direction and wind speed are important in influencing the spatial distribution and the presence/absence of dense DB populations.CONCLUSIONThe results provide an improved understanding of climate factors that impact DB's spread and is considered useful for managing DB infestations in date palm plantations. © 2019 Society of Chemical Industry
James C. Bishop, Greg Falzon, Mark Trotter, Paul Kwan, and Paul D. Meek
Elsevier BV
Ramendra Prasad, Mumtaz Ali, Paul Kwan, and Huma Khan
Elsevier BV
Abdulaziz Aljaloud, William Billingsley, and Paul Kwan
Informa UK Limited
ABSTRACT Smartphone clicker apps are increasingly used in university classrooms to facilitate teacher–student interaction and collaborative learning. This study aimed to identify the factors that influence teachers’ decisions to adopt smartphone clicker app technology to enhance teacher–student interactions in university classrooms in Saudi Arabia. A mixed-method study design was employed in this study. Thirty-three teachers from a Computer Science faculty completed a questionnaire and 14 of them participated in focus group interviews to provide their views. Two main findings emerged in this study: positive and significant relationships between teachers’ perceptions of the smartphone clicker app’s ease of use and its perceived usefulness; and a significant relationship between teachers’ perceptions of the usefulness of the smartphone clicker app and their attitude towards its use in the classroom. This study also identified that training on how to implement the smartphone clicker app effectively in lesson activities is a significant influence on teachers’ perceptions of the usefulness of, and their decision to use, the app. The main implication of these findings is that smartphone clicker app developers and user training coordinators must consider teachers’ perceptions of the suitability of the technology and their desire to design learning tasks to facilitate student participation and engagement.
Abdualziz Salamah Aljaloud, Nicolas Gromik, Paul Kwan, and William Billingsley
Australasian Society for Computers in Learning in Tertiary Education
This study aimed to investigate how the use of a smartphone clicker app by a group of 390 Saudi Arabian male undergraduate students would impact their learning performance while participating in a computer science class. The smartphone clicker app was used by the students during peer group discussions and to respond to teacher questions. A conceptual framework identified teacher-student and student-student interactions, collaborative learning, and student engagement as three primary practices that could improve student performance when a smartphone clicker app was used. The relationships between these factors were tested empirically by participant completion of a self-administered online survey. This study found the use of a smartphone clicker app promoted increased teacher-student and student-student interactivity, leading to active collaboration learning by students and improved learning performance. No positive relationship was found between the smartphone clicker app use and increased student engagement. These results demonstrated the role of the smartphone clicker app in enhancing the learning experience of the Saudi undergraduate students included in this study, but not the overall student engagement. Further research into how use of a smartphone clicker app in classroom settings might promote student engagement to improve the overall learning performance is needed.
Houssem Chatbri, Keisuke Kameyama, Paul Kwan, Suzanne Little, and Noel E. O’Connor
Springer Science and Business Media LLC
Leifeng Guo, Mitchell Welch, Robin Dobos, Paul Kwan, and Wensheng Wang
Elsevier BV
N. Agarwal, P. Kwan and David Paul
Addleton Academic Publishers
Merger & Acquisition pricing utilises traditional financial models like Discount Cash flow analysis and industry multiples. These methods do not consider behaviour finance biases, for example, prospect theory (Kahneman and Tversky 1979). This paper analyses merger & acquisition pricing using behavioural bias of risk aversion (acquiring company behavioural trait) and optimism (target company trait). It then extends the study to include loss aversion from prospect theory, differences in the way humans view gains and losses based on low or high probability based on cumulative prospect theory, and finally the certainty effect (where humans prefer certain outcome to probabilistic outcomes). All these factors have an impact on merger & acquisition pricing for potential deals as acquiring and target companies behave differently and such impacts are not considered by traditional finance models. Results show that as loss aversion reduces, the positive impact of risk taking and optimism behaviours improve. Also, probabilistic gains and losses can have a positive impact, but certainty has the greatest impact. Humans prefer certain outcomes and acquirers and target company behaviours are more effective in such conditions with increasing utility for both parties under such circumstances. However, in the multiple acquirer setting, competition between the acquirer significantly increases the utility, and the loss aversion co-efficient works in the opposite direction as the perceptive difference between gains and losses decreases.
Khalifa M. Al‐Kindi, Ali K. Al‐Wahaibi, Paul Kwan, Nigel R. Andrew, Mitchell Welch, Mohammed Al‐Oufi, and Zakariya Al‐Hinai
Wiley
AbstractThe Dubas bug (Ommatissus lybicus de Bergevin) is a pest species whose entire life cycle occurs on date palms, Phoenix dactylifera L, causing serious damage and reducing date palm growth and yield. Pseudoligosita babylonica Viggiani, Aprostocetus nr. Beatus, and Bocchus hyalinus Olmi are very important parasitic natural enemies of Ommatissus lybicus in northern Oman. In this study, random farms were selected to (a) model the link between occurrences of the Pseudoligosita babylonica, Aprostocetus nr beatus, and Bocchus hyalinus (dependent variables) with environmental, climatological, and Dubas bug infestation levels (the independent variables), and (b) produce distribution and predictive maps of these natural enemies in northern Oman. The multiple R2 values showed the model explained 63%, 89%, and 94% of the presence of P. babylonica, A. nr beatus, and Bocchus hyalinus, respectively. However, the distribution of each species appears to be influenced by distinct and geographically associated climatological and environmental factors, as well as habitat characteristics. This study reveals that spatial analysis and modeling can be highly useful for studying the distribution, the presence or absence of Dubas bugs, and their natural enemies. It is anticipated to help contribute to the reduction in the extent and costs of aerial and ground insecticidal spraying needed in date palm plantations.
Nipun Agarwal, Paul Kwan, and David Paul
Wiley
AbstractMergers and acquisitions (M&A) are important to companies as it allows them to acquire capabilities that they cannot create internally and to grow quickly. M&A transaction pricing relates to the pricing of these M&A deals and this article analyzes if behavioral finance factors like risk aversion, optimism, and loss aversion have an impact on this pricing. Prospect Theory and Cumulative Prospect Theory are applied to an agent‐based model to solve this problem. Results of this article show that M&A transaction price does respond to a change in risk aversion and optimism traits of the acquirer and target companies respectively, as well as, loss aversion and certainty (probability of gains and losses). When, the acquirer is risk taking and target company is optimistic, the M&A transaction price increases. However, with increasing certainty of gains and reducing loss aversion (increasing loss aversion co‐efficient; as gains are perceived to have the same weight as losses), the M&A transaction price seems to reduce. These results are compared with the recent mergers of Verizon and AOL as well as Verizon and Yahoo, to understand if these results would occur in practice. Analyzing these mergers, it seems that the outcomes from this model does provide insight on the pricing of these M&A transactions. This article also analyzes how these behaviors would impact the pricing when three different acquirers are trying to take over a target company. Results show that loss aversion has a significant effect on this pricing with risk aversion and optimism also having some minor impact. But, the existence of multiple acquirers does positively increase the M&A transaction price.
Nipun Agarwal and Paul Kwan
Wiley
Mergers and acquisitions (M&A) transaction pricing is a negotiation between the acquirer (buyer) and the target firm (seller). Both these firms have a different estimate of the synergies that can be obtained from this merger and as a result the valuation of the target firm is different for the acquirer and the target firm. This perception of synergies can be easily impacted by the behavior of the acquirer and the target. This article analyzes the pricing of M&A transactions based on differential synergy perceptions, while looking at risk‐averse–risk‐taking behavior of acquirers and optimistic–pessimistic behavior of the target firm. Results show that the acquirer’s risk‐taking behavior and perception of merger synergies determines the price offered for the M&A transaction. The target firm’s perception of synergies is less relevant (if at all) and their optimistic behavior is most useful, when the acquirer perceives high synergies existing in the potential M&A transaction.
Houssem Chatbri, Kevin McGuinness, Suzanne Little, Jiang Zhou, Keisuke Kameyama, Paul Kwan, and Noel E. O'Connor
ACM
The amount of MOOC video materials has grown exponentially in recent years. Therefore, their storage and analysis need to be made as fully automated as possible in order to maintain their management quality. In this work, we present a method for automatic topic classification of MOOC videos using speech transcripts and convolutional neural networks (CNN). Our method works as follows: First, speech recognition is used to generate video transcripts. Then, the transcripts are converted into images using a statistical co-occurrence transformation that we designed. Finally, a CNN is used to produce video category labels for a transcript image input. For our data, we use the Khan Academy on a Stick dataset that contains 2,545 videos, where each video is labeled with one or two of 13 categories. Experiments show that our method is strongly competitive against other methods that are also based on transcript features and supervised learning.
Houssem Chatbri, Marlon Oliveira, Kevin McGuinness, Suzanne Little, Keisuke Kameyama, Paul Kwan, Alistair Sutherland, and Noel E. O’Connor
IEEE
In this work, we present a method for automatic topic classification of educational videos using a speech transcript transform. Our method works as follows: First, speech recognition is used to generate video transcripts. Then, the transcripts are converted into images using a statistical cooccurrence transformation that we designed. Finally, a classifier is used to produce video category labels for a transcript image input. For our classifiers, we report results using a convolutional neural network (CNN) and a principal component analysis (PCA) model. In order to evaluate our method, we used the Khan Academy on a Stick dataset that contains 2,545 videos, where each video is labeled with one or two of 13 categories. Experiments show that our method is effective and strongly competitive against other supervised learning-based methods.
Jacob Foley, Paul Kwan, and Mitchell Welch
Association for Computing Machinery (ACM)
Annotations provide a valuable perspective on the semantic information present in digital heritage collections, and in recent years they've been employed in a number of innovative, user-centric techniques that can personalise a user's experience of heritage materials, such as by actively adapting exhibits as a user reveals their interests, or by guiding users to explore collections which are meaningfully linked to what they have previously encountered. Despite the captivating opportunities offered by these techniques, collecting annotations for a large heritage collection is no trivial task. A significant amount of work is required to manually annotate large quantities of heritage materials, and automated, computational approaches leave much to be desired regarding the level of insight and semantic richness that they can currently provide. By analysing the emergent relationships between the initial annotations in a collection, we propose a metadata-driven algorithm for assisting and augmenting the annotation process. This algorithm, called SAGA (Semantically-Annotated Graph Analysis), allows for semi-automatic annotation, which balances the value of the contributions of human annotators with the time and effort-saving benefits of an automatic, suggestion-driven process. SAGA uses an entity relationship-driven approach to make annotation suggestions. It is used in the context of a web-based infrastructure called SAGE (Semantic Annotation by Group Exploration), a multiagent environment which assists groups of experts in creating comprehensive annotation sets for heritage collections. SAGA and SAGE are evaluated from the perspectives of suggestion accuracy, explicit user acceptance and implicit user acceptance, and demonstrate strong results in each evaluation.