Information Systems, Computer Science, Signal Processing
45
Scopus Publications
Scopus Publications
Learning-Based Image Compression With Parameter-Adaptive Rate-Constrained Loss Nilson D. Guerin, Renam Castro da Silva, Bruno Macchiavello IEEE Signal Processing Letters, 2024 In recent years, the crucial task of image compression has been addressed by end-to-end neural network methods. However, achieving fine-grained rate control in this new paradigm has presented challenges. In our previous work, we explored mismatches in rate estimation during target-rate-oriented training and proposed heuristics involving costly parameter searches as a solution. This work proposes a lightweight approach, which dynamically adapts loss parameters to mitigate rate estimation issues, ensuring precise target rate attainment. Inspired by Reinforcement Learning, our method exhibits performance comparable to preceding approaches on the Kodak dataset in terms of PSNR. Additionally, it reduces computational training costs.
Rate-constrained learning-based image compression Nilson D. Guerin, Renam Castro da Silva, Matheus C. de Oliveira, Henrique C. Jung, Luiz Gustavo R. Martins, et al. Signal Processing Image Communication, 2022
Trust and reputation multiagent-driven model for distributed transcoding on fog-edge Ceur Workshop Proceedings, 2021
Learning-based End-to-End Video Compression Using Predictive Coding Matheus C. de Oliveira, Luiz G. R. Martins, Henrique Costa Jung, Nilson Donizete Guerin, Renam Castro da Silva, et al. Proceedings 2021 34th Sibgrapi Conference on Graphics Patterns and Images Sibgrapi 2021, 2021 Driven by the growing demand for video applications, deep learning techniques have become alternatives for implementing end-to-end encoders to achieve applicable compression rates. Conventional video codecs exploit both spatial and temporal correlation. However, due to some restrictions (e.g. computational complexity), they are commonly limited to linear transformations and translational motion estimation. Autoencoder models open up the way for exploiting predictive end-to-end video codecs without such limitations. This paper presents an entire learning-based video codec that exploits spatial and temporal correlations. The presented codec extends the idea of P-frame prediction presented in our previous work. The architecture adopted for I-frame coding is defined by a variational autoencoder with non-parametric entropy modeling. Besides an entropy model parameterized by a hyperprior, the inter-frame encoder architecture has two other independent networks, responsible for motion estimation and residue prediction. Experimental results indicate that some improvements still have to be incorporated into our codec to overcome the all-intra coding set up regarding the traditional algorithms High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC).
Multi-Mode Intra Prediction for Learning-Based Image Compression Henrique Costa Jung, Nilson Donizete Guerin, Raphael Soares Ramos, Bruno Macchiavello, Eduardo Peixoto, et al. Proceedings International Conference on Image Processing Icip, 2020 In recent years image compression techniques based on deep learning have achieved great success and their performances are gradually reaching the methods crafted by experts, such as JPEG, WebP, and Better Portable Graphics (BPG). A technique that is fundamental for modern image and video codecs is intra prediction, which takes advantage of local redundancy to predict the pixels from previously encoded neighbors. In this paper, we use Convolutional Neural Networks (CNN) to develop a new intra-picture prediction mode. More specifically, we propose a multi-mode intra prediction approach that uses two CNN-based prediction modes and all intra modes previously implemented in the High Efficiency Video Coding (HEVC) standard. We also propose a bit allocation technique that increases the bitstream only if the reconstruction error is significantly reduced. Experimental results evince a significant and consistent performance increase compared to other approaches that use a similar backbone architecture, with 28% bitrate reduction compared to the baseline codec.
Joint motion and residual information latent representation for P-frame coding Renam Castro da Silva, Nilson Donizete Guerin, Pedro Sanches, Henrique Costa Jung, Eduardo Peixoto, et al. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2020 This paper proposes an inter-frame prediction frame encoding for the P-frame video compression challenge of the Workshop and Challenge on Learned Image Compression (CLIC). For this challenge, we use an uncompressed reference (previous) frame to compress the current frame. So, this is not a complete solution for learning-based video compression. The main goal is to represent a set of frames with an average of 0.075 bpp (bits per pixel), which is a very low bitrate. A restriction on the model size is also requested to avoid overfitting. Here we propose an autoencoder architecture that jointly represents the motion and residue information at the latent space. Three trained models were used to achieve the target bpp and a bit allocation algorithm is also proposed to optimize the quality performance of the encoded dataset.
A sub-aperture image selection refinement method for progressive light field transmission Wallace Bruno S. de Souza, Bruno Macchiavello, Eduardo Peixoto, Edson M. Hung, Gene Cheung 2018 IEEE 20th International Workshop on Multimedia Signal Processing Mmsp 2018, 2018 Light field cameras capture the emanated light from a scene. This type of images allows for changing point of views or focal points by processing the captured information. Recently, a Progressive Light Field Communication (PLFC) was proposed. PLFC addresses an interactive Light Field (LF) streaming framework, where a client requests a certain view or focal point and a server synthesizes and transmits each requested image as a linear combination of Sub-Aperture Images (SAI). The main idea of PLFC is that as the virtual views are transmitted, the client gradually learns information about the LF, so eventually the client may posses enough information to locally create the virtual view at the required quality, avoiding the transmission of a new image. In order to PLFC work, an optimization algorithm which selects the SAIs that are used to create a certain virtual view is requested. Here, we improve over the previous PLFC proposal by presenting a method that focuses on a refinement algorithm for SAI selection, using dynamic Quantization Parameter (QP) during encoding, using an automatic method to determine the Lagrangian multiplier during optimization and modifying how the initial required cache is created. These proposed changes in the algorithm produce significant gains. The results shows gains up to 85.8% on BD-rate compared to trivial LF transmissions, whereas they're up to 32.8% compared to previous PLFC.
Progressive sub-aperture image recovery for interactive light field data streaming Eduardo Peixoto, Bruno Macchiavello, Edson Mintsu Hung, Gene Cheung Proceedings International Conference on Image Processing Icip, 2018 Due to the large size of a light field image, compressing and transmitting the entire data to a client before rendering any image for observation would incur a significant startup delay. In response, in interactive light field streaming (ILFS) a server synthesizes and transmits a new viewpoint image as a combination of sub-aperture images (SAIs) per user request. However, in so doing the client relies entirely on the server for reconstruction of every requested image. In this paper, we extend a previous proposal of progressive light field data transmission strategy, where the client can incrementally learn SAIs over time. Specifically, requested focal-point images are synthesized using carefully chosen weighted linear combinations of SAIs, so that recovery of SAIs amounts to inversion of a lower-triangular weight matrix-a matrix structure that enables SAI recovery without amplifying quantization noise due to lossy image coding. We design an objective function to encourage specific combinations of SAIs to increase rank of the lower-triangular weight matrix for fast SAI recovery. This new proposal reduces the size of the initial user's cache and the total number of transmitted images compared to our previous work. Experimental results show that our scheme can outperform ILFS by up to 70% in terms of BD-rate.
S-EMG Signal Compression in One-Dimensional and Two-Dimensional Approaches Marcel H. Trabuco, Marcus V. C. Costa, Bruno Macchiavello, Francisco Assis de O. Nascimento IEEE Journal of Biomedical and Health Informatics, 2018 This paper presents algorithms designed for one-dimensional (1-D) and 2-D surface electromyographic (S-EMG) signal compression. The 1-D approach is a wavelet transform based encoder applied to isometric and dynamic S-EMG signals. An adaptive estimation of the spectral shape is used to carry out dynamic bit allocation for vector quantization of transformed coefficients. Thus, an entropy coding is applied to minimize redundancy in quantized coefficient vector and to pack the data. In the 2-D approach algorithm, the isometric or dynamic S-EMG signal is properly segmented and arranged to build a 2-D representation. The high efficient video codec is used to encode the signal, using 16-bit-depth precision, all possible coding/prediction unit sizes, and all intra-coding modes. The encoders are evaluated with objective metrics, and a real signal data bank is used. Furthermore, performance comparisons are also shown in this paper, where the proposed methods have outperformed other efficient encoders reported in the literature.
Predicting vehicle trajectories from surveillance video in a real scenario with Histogram of Oriented Gradient Computer Science Research Notes, 2017
Human action recognition in videos: A comparative evaluation of the classical and velocity adaptation space-time interest points techniques Computer Science Research Notes, 2017
Handwritten text verification on mobile devices Visapp 2015 10th International Conference on Computer Vision Theory and Applications Visigrapp Proceedings, 2015
CQR codes: Colored quick-response codes Max E. Vizcarra Melgar, Alexandre Zaghetto, Bruno Macchiavello, Anderson C. A. Nascimento IEEE International Conference on Consumer Electronics Berlin Icce Berlin, 2012
HEVC-based scanned document compression Alexandre Zaghetto, Bruno Macchiavello, Ricardo L. de Queiroz Proceedings International Conference on Image Processing Icip, 2012
Compression of touchless multiview fingerprints Nelson C. Francisco, Alexandre Zaghetto, Bruno Macchiavello, Eduardo A. B. da Silva, Mamede Lima-Marques, et al. Bioms 2011 2011 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications Proceedings, 2011
Semi-automatic detection of the left ventricular border Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society EMBS 08 Personalized Healthcare Through Technology, 2008
A statistical model for a mixed resolution Wyner-Ziv Framework Pcs 2007 26th Picture Coding Symposium, 2007
A simple reversed-complexity Wyner-Ziv video coding mode based on a spatial reduction framework Proceedings of SPIE the International Society for Optical Engineering, 2007