Ruben Pascual Casas

Scopus Publications

Few-shot multi-token DreamBooth with LoRa for style-consistent character generation
Ruben Pascual, Mikel Sesma-Sara, Aranzazu Jurio, Daniel Paternain, Mikel Galar
Expert Systems with Applications, 2026
The audiovisual industry is undergoing a profound transformation as it is integrating AI developments not only to automate routine tasks but also to inspire new forms of art. This paper addresses the problem of producing a virtually unlimited number of novel characters that preserve the artistic style and shared visual traits of a small set of human-designed reference characters, thus broadening creative possibilities in animation, gaming, and related domains. Our solution builds upon DreamBooth, a well-established fine-tuning technique for text-to-image diffusion models, and adapts it to tackle two core challenges: capturing intricate character details beyond textual prompts and the few-shot nature of the training data. To achieve this, we propose a multi-token strategy, using clustering to assign separate tokens to individual characters and their collective style, combined with LoRA-based parameter-efficient fine-tuning. By removing the class-specific regularization set and introducing random tokens and embeddings during generation, our approach allows for unlimited character creation while preserving the learned style. We evaluate our method on five small specialized datasets, comparing it to relevant baselines using both quantitative metrics and a human evaluation study. Our results demonstrate that our approach produces high-quality, diverse characters while preserving the distinctive aesthetic features of the reference characters, with human evaluation further reinforcing its effectiveness and highlighting the potential of our method.
Speeding-up diffusion models for remote sensing semantic segmentation
Ruben Pascual, Christian Ayala, Ruben Sesma, Aranzazu Jurio, Daniel Paternain, Mikel Galar
International Journal of Applied Earth Observation and Geoinformation, 2025
Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated exceptional potential across various generative modeling tasks. Despite evident promise in semantic segmentation, their adoption for remote sensing remains limited primarily due to computationally demanding inference. While initial approaches using DDPMs in remote sensing achieve competitive accuracy with state-of-the-art models, the multi-step nature of their image generation process poses a major bottleneck. To address this limitation, this paper investigates three key strategies for accelerating inference: (1) optimizing training and inference steps, (2) applying DDPM acceleration techniques adapted to segmentation task (including Denoising Diffusion Implicit Models, Improved Denoising Diffusion Models, and Progressive Distillation), and (3) thoroughly analyzing the trade-off between accuracy improvement and additional inference time when using test-time augmentation. These strategies are extensively tested with two established remote sensing semantic segmentation datasets focused on buildings and roads. Finally, we compare the optimized diffusion-based model with state-of-the-art convolutional-based models in terms of accuracy and inference times, showing the narrowing gap between both approaches and the increasing viability of diffusion-based segmentation for practical applications. • Optimized DDPMs with fewer steps for speeding up remote sensing segmentation. • Progressive Distillation beats other DDPM acceleration methods in segmentation. • Test-time augmentations benefit more Diffusion models than convolutional models. • Accelerated DDPM achieves SOTA building segmentation results 32x faster. • Road segmentation with DDPMs is challenging and requires future research.
ACCELERATING DIFFUSION MODELS WITH PROGRESSIVE DISTILLATION FOR REMOTE SENSING SEMANTIC SEGMENTATION
Ruben Pascual, Christian Ayala, Ruben Sesma, Aranzazu Jurio, Daniel Paternain, Mikel Galar
International Geoscience and Remote Sensing Symposium IGARSS, 2025
Denoising Diffusion Probabilistic Models (DDPMs) have shown significant potential in a variety of generative modeling applications. While they hold promise for semantic segmentation, their use in remote sensing remains limited due to the computational intensity of the inference process. Early attempts using DDPMs in remote sensing have achieved results comparable to state-of-the-art models, but their reliance on a multi-step generation process creates a notable performance bottleneck. To address this issue, this paper explores the application of Progressive Distillation, an acceleration method tailored for DDPMs. This approach is evaluated on a well-known remote sensing dataset focused on building segmentation. The optimized diffusion-based model is compared to state-of-the-art convolutional models in terms of accuracy and inference speed, showing a shrinking performance gap and highlighting the growing practicality of diffusion-based methods for real-world segmentation tasks.
Enhancing DreamBooth with LoRA for Generating Unlimited Characters with Stable Diffusion
Rubén Pascual, Adrián Maiza, Mikel Sesma-Sara, Daniel Paternain, Mikel Galar
Proceedings of the International Joint Conference on Neural Networks, 2024
This paper addresses the challenge of generating unlimited new and distinct characters that encompass the style and shared visual characteristics of a limited set of human designed characters. This is a relevant problem in the audiovisual industry, as the ability to rapidly produce original characters that adhere to specific characteristics greatly increases the possibilities in the production of movies, series, or video games. Our solution is built upon DreamBooth, a widely extended fine-tuning method for text-to-image models. We propose an adaptation focusing on two main challenges: the impracticality of relying on detailed image prompts for character description and the few-shot learning scenario with a limited set of characters available for training. To solve these issues, we introduce additional character-specific tokens to DreamBooth training and remove its class-specific regularization dataset. For an unlimited generation of characters, we propose the usage of random tokens and random embeddings. This proposal is tested on two specialized datasets and the results shows our method’s capability to produce diverse characters that adhere to a style and visual characteristics. An ablation study to analyze the contributions of the proposed modifications is also developed.

Ruben Pascual Casas

RESEARCH, TEACHING, or OTHER INTERESTS

Scopus Publications