@university of sousse
Electronic engineering / ISSATSo
ISSATso
Electrical and Electronic Engineering, Hardware and Architecture, Computer Vision and Pattern Recognition, Signal Processing
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Hana Ben Fredj, Amani Chabbah, Jamel Baili, Hassen Faiedh, and Chokri Souani
Elsevier BV
Hana Ben Fredj, Rim Ghozzi, and Chokri Souani
Springer Nature Switzerland
Rim Ghozzi, Samer Lahouar, and Chokri Souani
Springer Nature Switzerland
Wajdi Farhat, Olfa Ben Rhaiem, Hassene Faiedh, and Chokri Souani
Springer Science and Business Media LLC
Safa Bouguezzi, Hana Ben Fredj, Hassene Faiedh, and Chokri Souani
Springer Science and Business Media LLC
Rim Ghozzi, Samer Lahouar, and Chokri Souani
Pleiades Publishing Ltd
Rim Ghozzi, Samer Lahouar, and Chokri Souani
Institution of Engineering and Technology (IET)
Anis Ammar, Amani Chebbah, Hana Ben Fredj, and Chokri Souani
IEEE
Deep learning is continuously evolving and making significant advances in several applications. Nevertheless, it has a remarkable influence in the field of image processing. Recently, deep learning has hit the motion estimation area well. Optical flow estimation is a mature and ever-growing field of research. It can be considered a multidisciplinary field. However, it is not easy to get dataset suitable for deep learning of its models. While they have made fundamental contributions, However, it is unclear how to generate more data and generalize it to live scene videos. In this paper, we have tried off extensive analyses and categorized various deep learning-based optical flow estimation techniques. Lately hybrid methods have been very successful. Despite their high performance, even they have performed the state of the art of certain datasets, most bibliographic studies haven’t taken into account these methods. For this, we have added a comparative section of these hybrid algorithms to this study. While describing the set of datasets commonly used by the scientific community, we’ve identified the differences and the correspondences between deep methods and conventional methods. We hope that this extensive research will be a fundamental resource for researchers in the field of image processing and help them better understand and use methods using motion estimation.
Nabiha Ben Abid, Rim Ghozzi, Samer Lahouar, and Chokri Souani
Springer International Publishing
Safa Bouguezzi, Hana Ben Fredj, Tarek Belabed, Carlos Valderrama, Hassene Faiedh, and Chokri Souani
MDPI AG
Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceleration using Field Programmable Gate Arrays (FPGA), proving its effectiveness in a variety of computer vision applications such as object segmentation, image classification, face detection, and traffic signs recognition, among others. However, there are numerous constraints for deploying CNNs on FPGA, including limited on-chip memory, CNN size, and configuration parameters. This paper introduces Ad-MobileNet, an advanced CNN model inspired by the baseline MobileNet model. The proposed model uses an Ad-depth engine, which is an improved version of the depth-wise separable convolution unit. Moreover, we propose an FPGA-based implementation model that supports the Mish, TanhExp, and ReLU activation functions. The experimental results using the CIFAR-10 dataset show that our Ad-MobileNet has a classification accuracy of 88.76% while requiring little computational hardware resources. Compared to state-of-the-art methods, our proposed method has a fairly high recognition rate while using fewer computational hardware resources. Indeed, the proposed model helps to reduce hardware resources by more than 41% compared to that of the baseline model.
Tarek Belabed, Vitor Ramos Gomes da Silva, Alexandre Quenon, Carlos Valderamma, and Chokri Souani
MDPI AG
Deep Neural Networks (DNNs) deployment for IoT Edge applications requires strong skills in hardware and software. In this paper, a novel design framework fully automated for Edge applications is proposed to perform such a deployment on System-on-Chips. Based on a high-level Python interface that mimics the leading Deep Learning software frameworks, it offers an easy way to implement a hardware-accelerated DNN on an FPGA. To do this, our design methodology covers the three main phases: (a) customization: where the user specifies the optimizations needed on each DNN layer, (b) generation: the framework generates on the Cloud the necessary binaries for both FPGA and software parts, and (c) deployment: the SoC on the Edge receives the resulting files serving to program the FPGA and related Python libraries for user applications. Among the study cases, an optimized DNN for the MNIST database can speed up more than 60× a software version on the ZYNQ 7020 SoC and still consume less than 0.43W. A comparison with the state-of-the-art frameworks demonstrates that our methodology offers the best trade-off between throughput, power consumption, and system cost.
Anis Ammar, Hana Ben Fredj, and Chokri Souani
MDPI AG
Motion estimation has become one of the most important techniques used in realtime computer vision application. There are several algorithms to estimate object motions. One of the most widespread techniques consists of calculating the apparent velocity field observed between two successive images of the same scene, known as the optical flow. However, the high accuracy of dense optical flow estimation is costly in run time. In this context, we designed an accurate motion estimation system based on the calculation of the optical flow of a moving object using the Lucas–Kanade algorithm. Our approach was applied on a local treatment region implemented into Raspberry Pi 4, with several improvements. The efficiency of our accurate realtime implementation was demonstrated by the experimental results, showing better performance than with the conventional calculation.
Tarek Belabed, Alexandre Quenon, Vitor Ramos Gomes Da Silva, Carlos A. Valderrama Sakuyama, and Chokri Souani
IEEE
FPGAs are gaining popularity as the target of choice for the efficient implementation of Deep Neural Networks (DNNs) approaches. Modern SoCs with integrated FPGAs have low-power on-chip processors and sufficient interfaces to accommodate the most commonly deployed Internet of Things (IoT) devices. However, developing DNN hardware accelerators using integrated FPGAs remains a complicated task due to the complexity of reconfigurable computing and limited hardware resources on embedded devices. In addition, it is necessary to master High-Level Synthesis tools (HLS) and their hidden philosophy driving RTL design. This paper presents our Python framework to fully customize and automate the generation and deployment of FPGA-based DNN topologies for Edge Computing. Our framework environment, Jupyter Notebooks, allows users to customize their desired hardware DNN and its related applications on Xilinx’s Pynq boards. Subsequently, the framework automatically generates TCL (Tool Command Language) scripts driving HLS tools on the host server or cloud. Once the desired FPGA-based architecture is generated, the framework retrieves the bitstream to configure the FPGA. Therefore, the user can deploy this bitstream to accelerate any Python application that performs the same DNN model. The experimental results show that our framework can speed up 59.8× a 784-32-32-10 topology, while the power consumption is less than 0.266W.
Safa Bouguezzi, Hassene Faiedh, and Chokri Souani
IEEE
The Convolutional Neural Network (CNN) dominates the research area of Field Programmable Gate Arrays (FPGAs) and demonstrates its efficiency on computer vision applications. The correct predicted rate of the CNN is highly dependent on the selection of the activation functions. Thus, we intend to deploy a CNN model on Virtex-7 while varying the activation function such as ReLU, PReLU, and Tanh Exponential (TanhExp) activation functions. To this end, we will use a fixed-point representation concerning the arithmetic numbers and the piecewise linear approximation regarding the TanhExp activation function. We present the speed, accuracy and hardware resources of each model of the CNN.
Hana ben Fredj, Souhir Sghaier, and Chokri Souani
IEEE
Convolutional Neural Networks (CNNs) have shown a great success within the field of face recognition. In this paper, we propose a robust face recognition method, which is based on Principal Component Analysis (PCA) and CNN. In our method, PCA is employed to reduce the size of data. Afterwards, we use a CNN as a classifier for face recognition. We relatively decrease the number of layers used in the CNN architecture, utilizing the dropout regulation technique. Most importantly, we implement the classification step on the GPU component. Several experiments have been implemented using available well-known databases. The experimental results verify the effectiveness of our approach, which keeps good recognition accuracy. Thus, it expands an important acceleration of the classification face compared to the standard CNN implementation without data reduction. It achieves also lower memory consumption due to the smaller amounts of data processing used through the PCA method. Moreover, our model is intentionally designed such that both its running time and memory requirements are decreased.
Safa Bouguezzi, Hassene Faiedh, and Chokri Souani
IEEE
The Convolutional Neural Network (CNN) is dominant in computer vision applications such as object detection, traffic sign recognition, image classification, face recognition applications, etc. The MobileNet model is a CNN architecture that was constructed to be implemented on an embedded board. However, there are constraints of such architecture for the hardware deployment, which is the limited memory of micro-controller units. This paper proposes an enhanced version of MobileNet that verify the condition to be implemented on an embedded board while improved the accuracy. The proposed model obtained the name of Slim MobileNet because of its small size of 7.3 MB. Slim MobileNet has fewer number of layers, improved accuracy while depreciating the overall size of the model and lower average time compared to the MobileNet-V1 model. We achieve a significant accuracy by replacing the ReLU activation function with the Tanh Exponential (TanhExp) activation function and by making some modifications in the unit of depthwise separable convolution. The small size of Slim MobileNet is occurred by dropping some layers from the original architecture of baseline MobileNet. The experiment is realized using the CIFAR-10 database.
Safa Bouguezzi, Hassene Faiedh, and Chokri Souani
IEEE
The most active research area for Field Programmable Gate Arrays is the Convolution Neural Network (CNN), and the gist of any CNN is an activation function. Therefore, various non-linear activation functions are required for deeper CNN. In this paper, we aim to implement the Tanh Exponential (TanhExp) activation function on Artix-7 and Zynq-7000. To this end, we will use the piecewise linear approximation and the second-order polynomial approximation while using the IEEE754 2008 floating-point representation. We present an investigation of the required hardware resources. We also evaluate the efficiency of each method of approximation and its derivative.
Anis Ammar, Hana Ben Fredj, and Chokri Souani
IEEE
Motion estimation is considered among the most famous and important applications in the field of computer vision. Indeed, in the last decade, the optical flow has become an alternative technique to estimate motion on successive frames. However, dense and precise estimates of the optical flow are usually costly in computing time. In this context, our approach is focused upon calculating the moving object's optical flow through the use of the Lucas-Kanade algorithm in real time which is implemented into Raspberry Pi 4. The efficiency of our real time implementation is demonstrated by the experimental results, hence showing better performances in comparison with the conventional calculation.
Hana Ben Fredj, Safa Bouguezzi, and Chokri Souani
Springer Science and Business Media LLC
Wajdi Farhat, Olfa Ben Rhaiem, Hassene Faiedh, and Chokri Souani
IEEE
Self-driving vehicles can move autonomously without involving a human pilot by sensing the surrounding environment. Having a forward collision avoidance system will help improve road safety and prevent car accidents. However, this system has drawbacks in terms of crash avoidance (i.e., lack of warning messages, complexity of driving situations and weather conditions). Recently, deep learning algorithms become more suitable to overcome this issue, which have better accuracy and adaptive capability to different environments. In this paper, we propose a Cooperative Forward Collision Avoidance System (CFCA) based on deep learning method. Particularly, this system alerts the driver and broadcast a multi-hop warning messages using vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communication based ITS-G5. The experimental results show that the proposed system performs better than existing systems and can efficiently help drivers avoid collisions. In fact, we have considered two databases (KITTI and a private database). Our model achieved 94.04% accuracy with approximately 5% loss rate using KITTI database. While performance accuracy of approximately 92.42% was achieved using a private database.
Souhir Sghaier, Sabrine Hamdi, Anis Ammar, and Chokri Souani
IEEE
3D image acquisition technology has become feasible and economical. Retention of geometric information of the object is the primary advantage of 3D data which represents real time scenario closely. Hence, it necessitates a good scan of 3D face of a person to enhance the 3D face recognition system performance. This study proposes a methodology that aims to develop a novel 3D face database. This work is based on the Microsoft Kinect, whose sensor is characterized by noise and outliers over a certain distance. Thus, preprocessing and segmentation phaes are necessary to detect and extract the region of interest (the face of the person). Then a 3D reconstruction surface takes place. All the simulations have been performed on the faces of people, easily acquired in our laboratory in real time. As a result, an acceptable 3D face database with different pose and variation of facial expressions was obtained.
Mouna Letaief, Anis Ammar, Hana Ben Fredj, and Chokri Souani
IEEE
This paper provides an HEVC (High-Efficiency Video Coding) video compression or transcoding using Hybrid algorithm. The hybrid compression technique uses DWT (Discrete Wavelet Transformation) transform as well as that use DCT (discrete cosine transformation) transform.In the existing paper video compression was tested using DCT and DWT. In this paper testing is performed using Hybrid DWT-DCT and there by the input coding is not overlapped by another 2-D blocks. In the proposed work PSNR (Peak Signal to Noise Ratio) value is improved when compared to the existing work.Also, hardware implementation of one frame/image real-time HEVC decoder using Hybrid algorithm results is shown by using a field-programmable gate array (FPGA). Hardware implementation become more appealing due to their superior performance in terms of FPGA latency compared to CPU (Central Processing Unit) latency to execute or implement this Hybrid algorithm. Also, pipelined hybrid decoder architecture is used to absorb variations in processing time. This architecture achieves a target operating frequency of 150 MHz.
Anis Ammar, Hana Ben Fredj, Souhir Sghaier, and Chokri Souani
IEEE
Due to the massive quantity of information and the excessive use of filters, Image processing becomes more and more costly at the runtime level. It remains a very complex operation at the programming level mostly in specific real time application. However, several methods have been developed to improve the system performance. In this work, we investigate the performance of object tracking algorithm implemented in raspberry pi3. We use an hybrid method based on combination of color and edge method. Subsequently, we prove that this embedded system presents high performance with an optimal conservation of energy. We present The Simulation results into the support of theoretical analysis.
Tarek Belabed, Maria Gracielly F. Coutinho, Marcelo A. C. Fernandes, Carlos Valderrama Sakuyama, and Chokri Souani
Institute of Electrical and Electronics Engineers (IEEE)
Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI) applications problems. However, owing to topologies with many hidden layers, Deep Neural Networks (DNNs) have high computational complexity, which makes their deployment difficult in contexts highly constrained by requirements such as performance, real-time processing, or energy efficiency. Numerous hardware/software optimization techniques using GPUs, ASICs, and reconfigurable computing (i.e, FPGAs), have been proposed in the literature. With FPGAs, very specialized architectures have been developed to provide an optimal balance between high-speed and low power. However, when targeting edge computing, user requirements and hardware constraints must be efficiently met. Therefore, in this work, we only focus on reconfigurable embedded systems based on the Xilinx ZYNQ SoC and popular DNNs that can be implemented on Embedded Edge improving performance per watt while maintaining accuracy. In this context, we propose an automated framework for the implementation of hardware-accelerated DNN architectures. This framework provides an end-to-end solution that facilitates the efficient deployment of topologies on FPGAs by combining custom hardware scalability with optimization strategies. Cutting-edge comparisons and experimental results demonstrate that the architectures developed by our framework offer the best compromise between performance, energy consumption, and system costs. For instance, the low power (0.266W) DNN topologies generated for the MNIST database achieved a high throughput of 3,626 FPS.
Hana ben fredj, Souhir sghair, and Chokri souani
IEEE
Face detection is a highly efficient component in diverse domains such as security surveillance. Especially, the Viola-Jones algorithm has achieved significant performances in the field of detection face. In the last years, graphics processors have fast become the mainstay to solve the problem of detection face applications and to accelerate data parallel computing. This is due to their flexibility, and in particular, to the single-instruction, multiple-data execution model exploited for streaming processors by a Graphics Processing Unit (GPU). Therefore, in this paper, the researchers develop a robust face detection implementation based on the GPU component. The implementation has been optimized by following up a strategy to use the different memory resources in GPU and the warp scheduler technique, so as to accelerate the access to the memory, with better exploitation of resources proved by GPU. The results display that the suggested method is very important and consumes less execution time compared with the standard implementation and sequential implementation.