M3: 3D-SPATIAL MULTIMODAL MEMORY 13th International Conference on Learning Representations Iclr 2025, 2025
NO POSE, NO PROBLEM: SURPRISINGLY SIMPLE 3D GAUSSIAN SPLATS FROM SPARSE UNPOSED IMAGES 13th International Conference on Learning Representations Iclr 2025, 2025
Scaling Vision Pre-Training to 4K Resolution Baifeng Shi, Boyi Li, Han Cai, Yao Lu, Sifei Liu, et al. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2025
NVILA: Efficient Frontier Visual Language Models Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, et al. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2025
COLMAP-Free 3D Gaussian Splatting Yang Fu, Xiaolong Wang, Sifei Liu, Amey Kulkarni, Jan Kautz, et al. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2024
AGG: Amortized Generative 3D Gaussians for Single Image to 3D Transactions on Machine Learning Research, 2024
3D RECONSTRUCTION WITH GENERALIZABLE NEURAL FIELDS USING SCENE PRIORS 12th International Conference on Learning Representations Iclr 2024, 2024
Context-aware synthesis and placement of object instances D Lee, S Liu, J Gu, MY Liu, J Kautz US Patent App. 19/433,543 , 2026 2026
Scaling rl to long videos Y Chen, W Huang, B Shi, Q Hu, H Ye, L Zhu, Z Liu, P Molchanov, J Kautz, ... Advances in Neural Information Processing Systems 38, 172842-172870 , 2026 2026 Citations: 72
Diffusion-based open-vocabulary segmentation J Xu, S De Mello, S Liu, A Vahdat, W Byeon US Patent 12,586,199 , 2026 2026 Citations: 8
Compositional 3d-consistent freeview image generation with 3d blobs C Liu, W Nie, S Liu, AH Badki, H Su, M Mardani, BD Eckart, A Vahdat US Patent App. 19/227,222 , 2026 2026
Techniques for fine-tuning a machine learning model to reconstruct a three-dimensional scene Y Fu, S Liu, J Kautz, X Li, S De Mello, A Kulkarni, M Naphade US Patent 12,548,234 , 2026 2026 Citations: 2
Techniques for training a machine learning model to reconstruct different three-dimensional scenes Y Fu, S Liu, J Kautz, X Li, S De Mello, A Kulkarni, M Naphade US Patent 12,548,258 , 2026 2026
Learnable fourier series for image restoration S Liu, S De Mello, J Kautz US Patent App. 18/975,124 , 2026 2026
Training and inferencing using a neural network to predict orientations of objects in images SK Mustikovela, V Jampani, S De Mello, S Liu, U Iqbal, J Kautz US Patent App. 19/094,621 , 2025 2025
Context-aware synthesis and placement of object instances D Lee, S Liu, J Gu, MY Liu, J Kautz US Patent 12,462,453 , 2025 2025 Citations: 1
Segmentation using an unsupervised neural network training technique V Jampani, WC Hung, S Liu, P Molchanov, J Kautz US Patent 12,450,748 , 2025 2025
Token-Efficient VLM: High-Resolution Image Understanding Via Dynamic Region Proposal Y Jiang, J Gu, T Xue, KC Cheung, P Molchanov, H Yin, S Liu 2025 IEEE/CVF International Conference on Computer Vision (ICCV), 24147-24158 , 2025 2025 Citations: 5
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM H Ye, CHH Yang, A Goel, W Huang, L Zhu, Y Su, S Lin, AC Cheng, Z Wan, ... arXiv preprint arXiv:2510.15870 , 2025 2025 Citations: 10
QeRL: Beyond Efficiency--Quantization-enhanced Reinforcement Learning for LLMs W Huang, Y Ge, S Yang, Y Xiao, H Mao, Y Lin, H Ye, S Liu, KC Cheung, ... arXiv preprint arXiv:2510.11696 , 2025 2025 Citations: 7
Compositional text-to-image generation with dense blob representations W Nie, S Liu, MM Korani, C Liu, BD Eckart, A Vahdat US Patent App. 18/889,975 , 2025 2025
3d aware region prompted vision language model AC Cheng, Y Fu, Y Chen, Z Liu, X Li, S Radhakrishnan, S Han, Y Lu, ... arXiv preprint arXiv:2509.13317 , 2025 2025 Citations: 19
Region-aware vision language processor Q Guo, S De Mello, H Yin, W Byeon, KC Cheung, SCW See, J Kautz, ... US Patent App. 19/065,367 , 2025 2025
Machine learning framework applied in a semi-supervised setting to perform instance tracking in a sequence of image frames Y Fu, S Liu, U Iqbal, S De Mello, J Kautz US Patent 12,400,341 , 2025 2025 Citations: 1
Sse: Multimodal semantic data selection and enrichment for industrial-scale data assimilation M Shen, N Chang, S Liu, JM Alvarez Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and … , 2025 2025 Citations: 4
Egovla: Learning vision-language-action models from egocentric human videos R Yang, Q Yu, Y Wu, R Yan, B Li, AC Cheng, X Zou, Y Fang, X Cheng, ... arXiv preprint arXiv:2507.12440 , 2025 2025 Citations: 72
View synthesis using camera poses learned from a video Y Fu, S Liu, A Kulkarni, J Kautz US Patent App. 18/963,075 , 2025 2025 Citations: 1
MOST CITED SCHOLAR PUBLICATIONS
Learning continuous image representation with local implicit image function Y Chen, S Liu, X Wang Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2021 2021 Citations: 1182
A face antispoofing database with diverse attacks Z Zhang, J Yan, S Liu, Z Lei, D Yi, SZ Li 2012 5th IAPR international conference on Biometrics (ICB), 26-31 , 2012 2012 Citations: 1120
Groupvit: Semantic segmentation emerges from text supervision J Xu, S De Mello, S Liu, W Byeon, T Breuel, J Kautz, X Wang Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2022 2022 Citations: 868
Generative face completion Y Li, S Liu, J Yang, MH Yang Proceedings of the IEEE conference on computer vision and pattern … , 2017 2017 Citations: 849
Open-vocabulary panoptic segmentation with text-to-image diffusion models J Xu, S Liu, A Vahdat, W Byeon, X Wang, S De Mello Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2023 2023 Citations: 752
Low-light image enhancement via a deep hybrid network W Ren, S Liu, L Ma, Q Xu, X Xu, X Cao, J Du, MH Yang IEEE Transactions on Image Processing 28 (9), 4364-4375 , 2019 2019 Citations: 592
Spatialrgpt: Grounded spatial reasoning in vision-language models AC Cheng, H Yin, Y Fu, Q Guo, R Yang, J Kautz, X Wang, S Liu Advances in Neural Information Processing Systems 37, 135062-135093 , 2024 2024 Citations: 431
Learning affinity via spatial propagation networks S Liu, S De Mello, J Gu, G Zhong, MH Yang, J Kautz Advances in Neural Information Processing Systems 30 , 2017 2017 Citations: 372
Learning linear transformations for fast image and video style transfer X Li, S Liu, J Kautz, MH Yang Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2019 2019 Citations: 338
COLMAP-Free 3D Gaussian Splatting Y Fu, S Liu, A Kulkarni, J Kautz, AA Efros, X Wang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern … , 2024 2024 Citations: 307
Self-supervised single-view 3d reconstruction via semantic consistency X Li, S Liu, K Kim, S De Mello, V Jampani, MH Yang, J Kautz European Conference on Computer Vision, 677-693 , 2020 2020 Citations: 307
Deep cascaded bi-network for face hallucination S Zhu, S Liu, CC Loy, X Tang European conference on computer vision, 614-630 , 2016 2016 Citations: 297
Learning dual convolutional neural networks for low-level vision J Pan, S Liu, D Sun, J Zhang, Y Liu, J Ren, Z Li, J Tang, H Lu, YW Tai, ... Proceedings of the IEEE conference on computer vision and pattern … , 2018 2018 Citations: 266
Semi-supervised 3d hand-object poses estimation with interactions in time S Liu, H Jiang, J Xu, S Liu, X Wang Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2021 2021 Citations: 256
Learning recursive filters for low-level vision via a hybrid neural network S Liu, J Pan, MH Yang European conference on computer vision, 560-576 , 2016 2016 Citations: 211
Joint-task self-supervised learning for temporal correspondence X Li, S Liu, S De Mello, X Wang, J Kautz, MH Yang Advances in Neural Information Processing Systems 32 , 2019 2019 Citations: 209
Scops: Self-supervised co-part segmentation WC Hung, V Jampani, S Liu, P Molchanov, MH Yang, J Kautz Proceedings of the IEEE/CVF conference on computer vision and pattern … , 2019 2019 Citations: 204
No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images B Ye, S Liu, H Xu, X Li, M Pollefeys, MH Yang, S Peng International Conference on Learning Representations 2025, 54009-54033 , 2025 2025 Citations: 194
Synthesizing long-term 3d human motion and interaction in 3d scenes J Wang, H Xu, J Xu, S Liu, X Wang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern … , 2021 2021 Citations: 192
Nvila: Efficient frontier visual language models Z Liu, L Zhu, B Shi, Z Zhang, Y Lou, S Yang, H Xi, S Cao, Y Gu, D Li, X Li, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern … , 2025 2025 Citations: 190