Wenzhao Zheng
I am currently a postdoctoral fellow in the Department of EECS at University of California, Berkeley, affiliated with Berkeley Artificial Intelligence Research Lab (BAIR) and Berkeley Deep Drive (BDD) , supervised by Prof. Kurt Keutzer .
Prior to that, I received my Ph.D degree from the Department of Automation at Tsinghua University, advised by Prof. Jie Zhou and Prof. Jiwen Lu .
In 2018, I received my BS degree from the Department of Physics, Tsinghua University.
I am interested in computer vision and deep learning. My current research focuses on:
Vision-centric autonomous driving that efficiently perceives and predicts the complex 3D world based on images.
Omni-supervised representation learning that exploits various types of supervision signals to learn discriminative and generalizable visual representations.
Explainable artificial intelligence that builds comprehensible and trustworthy AI systems with high performance.
If you want to work with me (in person or remotely) as an intern at BAIR, feel free to drop me an email at wzzheng@berkeley.edu. I will support GPUs if we are a good fit.
Email  / 
CV  / 
Google Scholar  / 
GitHub
|
|
News
2024-07: Four papers are accepted to ECCV 2024.
2024-05: One paper on lane detection is accepted to T-IP.
2024-04: One paper on 3D object detection is accepted to T-MM.
2024-02: Two papers on 3D occupancy prediction are accepted to CVPR 2024.
2024-01: One paper on explainable deep learning is accepted to ICLR 2024.
2023-09: One paper on deep metric learning is accepted to T-PAMI.
2023-09: One paper on unsupervised indoor depth completion is accepted to T-CSVT.
2023-07: Three papers on representation learning and 3D occpuacy prediction are accepted to ICCV 2023.
2023-01: Two papers on 3D occpuacy prediction and deep metric learning are accepted to CVPR 2023.
2023-01: One paper on explainable deep networks is accepted to ICLR 2023.
2023-01: One paper on deep metric learning is accepted to T-PAMI.
|
*Equal contribution †Project leader/Corresponding author.
|
|
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
Lening Wang* ,
Wenzhao Zheng*, †,
Yilong Ren ,
Han Jiang ,
Zhiyong Cui ,
Haiyang Yu ,
Jiwen Lu
arXiv, 2024.
[arXiv]
[Code]
[Project Page]
With trajectory-aware 4D generation, OccSora has the potential to serve as a world simulator for the decision-making of autonomous driving.
|
|
S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving
Nan Huang ,
Xiaobao Wei ,
Wenzhao Zheng†,
Pengju An ,
Ming Lu ,
Wei Zhan ,
Masayoshi Tomizuka ,
Kurt Keutzer ,
Shanghang Zhang
arXiv, 2024.
[arXiv]
[Code]
[Project Page]
S3Gaussian employs 3D Gaussians to model dynamic scenes for autonomous driving without other supervisions (e.g., 3D bounding boxes).
|
|
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Yuanhui Huang ,
Wenzhao Zheng†,
Yunpeng Zhang ,
Jie Zhou ,
Jiwen Lu
European Conference on Computer Vision (ECCV), 2024.
[arXiv]
[Code]
[Project Page]
[中文解读 (in Chinese)]
GaussianFormer proposes the 3D semantic Gaussians as a more efficient object-centric representation for driving scenes compared with 3D occupancy.
|
|
Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection
Shuai Zeng ,
Wenzhao Zheng†,
Jiwen Lu ,
Haibin Yan ,
IEEE Transactions on Multimedia (T-MM, IF: 7.3), 2024.
[arXiv]
[Code]
HASS proposes a scene synthesis strategy to adaptively generate challenging synthetic scenes for more generalizable semi-supervised 3D object detection.
|
|
GenAD: Generative End-to-End Autonomous Driving
Wenzhao Zheng*,
Ruiqi Song* ,
Xianda Guo* ,
Chenming Zhang ,
Long Chen
European Conference on Computer Vision (ECCV), 2024.
[arXiv]
[Code]
[中文解读 (in Chinese)]
GenAD casts autonomous driving as a generative modeling problem.
|
Autonomous Driving
|
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Wenzhao Zheng*,
Weiliang Chen* ,
Yuanhui Huang ,
Borui Zhang ,
Yueqi Duan,
Jiwen Lu
European Conference on Computer Vision (ECCV), 2024.
[arXiv]
[Code]
[Project Page]
[中文解读 (in Chinese)]
OccWorld models the joint evolutions of 3D scenes and ego movements and paves the way for interpretable end-to-end large driving models.
|
|
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Yuanhui Huang* ,
Wenzhao Zheng*,
Borui Zhang ,
Jie Zhou ,
Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[arXiv]
[Code]
[Project Page]
[中文解读 (in Chinese)]
SelfOcc is the first self-supervised work that produces reasonable 3D occupancy for surround cameras.
|
|
PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction
Sicheng Zuo* ,
Wenzhao Zheng*,
Yuanhui Huang ,
Jie Zhou ,
Jiwen Lu
arXiv, 2023.
[arXiv]
[Code]
[中文解读 (in Chinese)]
As the first 2D-projection-based method on the 3D semantic occupancy prediction task, PointOcc significantly outperforms all other methods by a large margin with a much faster speed.
|
|
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
Yi Wei*,
Linqing Zhao*,
Wenzhao Zheng,
Zheng Zhu,
Jie Zhou ,
Jiwen Lu
IEEE International Conference on Computer Vision (ICCV), 2023.
[arXiv]
[Code]
[中文解读 (in Chinese)]
We design a pipeline to generate dense occupancy ground truths without expensive occupancy annotations, which enalbes the training of more dense 3D occupancy prediction models.
|
|
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
Yuanhui Huang* ,
Wenzhao Zheng*,
Yunpeng Zhang ,
Jie Zhou ,
Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[arXiv]
[Code]
[Project Page]
[中文解读 (in Chinese)]
Given only surround-camera motorcycle RGB images barrier as inputs, our model (trained using trailer only sparse traffic cone LiDAR point supervision) can predict the semantic occupancy for all volumes in the 3D space.
|
|
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
Yunpeng Zhang ,
Zheng Zhu,
Wenzhao Zheng,
Junjie Huang,
Guan Huang,
Jie Zhou ,
Jiwen Lu
arXiv, 2022.
[arXiv]
[Code]
[中文解读 (in Chinese)]
We propose a unified framework for 3D perception and prediction based on multi-camera systems. The multi-task BEVerse outperforms existing single-task methods on 3D object detection, semantic map construction, and motion prediction.
|
|
SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
Yi Wei*,
Linqing Zhao*,
Wenzhao Zheng,
Zheng Zhu,
Yonming Rao,
Guan Huang,
Jiwen Lu ,
Jie Zhou
Conference on Robot Learning (CoRL), 2022.
[arXiv]
[Code]
[中文解读 (in Chinese)]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
|
Representation Learning
|
Introspective Deep Metric Learning
Chengkun Wang* ,
Wenzhao Zheng*,
Zheng Zhu,
Jie Zhou ,
Jiwen Lu
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2023.
[arXiv]
[Code]
We propose an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images.
|
|
OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions
Chengkun Wang* ,
Wenzhao Zheng*,
Zheng Zhu,
Jie Zhou ,
Jiwen Lu
IEEE International Conference on Computer Vision (ICCV), 2023.
[arXiv]
[Code]
We unify fully supervised and self-supervised contrastive learning and exploit both supervisions from labeled and unlabeled data for training.
|
|
Token-Label Alignment for Vision Transformers
Han Xiao*,
Wenzhao Zheng*,
Zheng Zhu,
Jie Zhou ,
Jiwen Lu
IEEE International Conference on Computer Vision (ICCV), 2023.
[arXiv]
[Code]
We identify a token fluctuation phenomenon that has suppressed the potential of data mixing strategies for vision transformers. To adress this, we propose a token-label alignment (TL-Align) method to trace the correspondence between transformed tokens and the original tokens to maintain a label for each token.
|
|
Deep Metric Learning with Adaptively Composite Dynamic Constraints
Wenzhao Zheng,
Jiwen Lu ,
Jie Zhou
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2023.
[PDF]
This paper formulates deep metric learning under a unified framework and propose a dynamic constraint generator to produce adaptive composite constraints to train the metric towards good generalization.
|
|
Hardness-Aware Deep Metric Learning
Wenzhao Zheng,
Zhaodong Chen ,
Jiwen Lu ,
Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019 (oral).
[PDF]
[Code]
We perform linear interpolation on embeddings to adaptively manipulate their hardness levels and generate corresponding label-preserving synthetics for recycled training.
|
|
Deep Adversarial Metric Learning
Yueqi Duan ,
Wenzhao Zheng,
Xudong Lin ,
Jiwen Lu ,
Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (spotlight).
[PDF]
[Code]
We generate potential hard negatives adversarial to the learned metric as complements.
|
Explainable Artificial Intelligence
|
Path Choice Matters for Clear Attribution in Path Methods
Borui Zhang,
Wenzhao Zheng,
Jie Zhou ,
Jiwen Lu
International Conference on Learning Representations (ICLR), 2024.
[arXiv]
[Code]
To address the ambiguity in attributions caused by different path choices, we introduced the Concentration Principle and developed SAMP, an efficient model-agnostic interpreter. By incorporating the infinitesimal constraint (IC) and momentum strategy (MS), SAMP provides superior interpretations.
|
|
Exploring Unified Perspective For Fast Shapley Value Estimation
Borui Zhang*,
Baotong Tian*,
Wenzhao Zheng,
Jie Zhou,
Jiwen Lu
arXiv, 2023
[arXiv]
[Code]
This paper analyzes the consistency of existing Shapley value estimators and proposes the simple amortized estimator, SimSHAP.
Extensive experiments conducted on tabular and image datasets validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
|
|
Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint
Borui Zhang,
Wenzhao Zheng,
Jie Zhou ,
Jiwen Lu
International Conference on Learning Representations (ICLR), 2023.
[arXiv]
[Code]
This paper proposes Bort, an optimizer for improving model explainability with boundedness and orthogonality constraints on model parameters, derived from the sufficient conditions of model comprehensibility and transparency.
|
|
Attributable Visual Similarity Learning
Borui Zhang,
Wenzhao Zheng,
Jie Zhou ,
Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[arXiv]
[Code]
This paper proposes an attributable visual similarity learning (AVSL) framework, which employs a generalized similarity learning paradigm to represent the similarity between two images with a graph for a more accurate and explainable similarity measure between images.
|
|
SPTR: Structure-Preserving Transformer for Unsupervised Indoor Depth Completion
Linqing Zhao,
Wenzhao Zheng,
Yueqi Duan,
Jie Zhou ,
Jiwen Lu
IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT, IF: 8.4), 2023.
[PDF]
We propose a Structure-Preserving Encoding (SPE) module to reformulate depth completion as a process of 3D structure generation.
|
|
Deep Factorized Metric Learning
Chengkun Wang* ,
Wenzhao Zheng*,
Junlong Li,
Jie Zhou ,
Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[PDF]
We factorize the backbone network to different sub-blocks and learns an adaptive route for each sample to achieve the diversity of features.
|
|
Probabilistic Deep Metric Learning for Hyperspectral Image Classification
Chengkun Wang ,
Wenzhao Zheng,
Xian Sun ,
Jiwen Lu ,
Jie Zhou
arXiv, 2022.
[arXiv]
[Code]
We propose a probabilistic deep metric learning framework to model the categorical uncertainty of the spectral distribution of an observed pixel for Hyperspectral image classification.
|
|
Dynamic Metric Learning with Cross-Level Concept Distillation
Wenzhao Zheng,
Yuanhui Huang ,
Borui Zhang,
Jie Zhou ,
Jiwen Lu
European Conference on Computer Vision (ECCV), 2022.
[PDF]
[Code]
This paper propose a hierarchical concept refiner to construct multiple levels of concept embeddings of an image and them pull closer the distance of the corresponding concepts to facilitate the cross-level semantic structure of the image representations.
|
|
A Simple Baseline for Multi-Camera 3D Object Detection
Yunpeng Zhang ,
Wenzhao Zheng,
Zheng Zhu,
Guan Huang,
Jie Zhou ,
Jiwen Lu
Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023.
[arXiv]
[Code]
We propose a simple baseline for multi-camera object detection to adapt existing monocular 3D object detection methods with a two-stage propose-and-fuse framework.
|
|
Dimension Embeddings for Monocular 3D Object Detection
Yunpeng Zhang ,
Wenzhao Zheng,
Zheng Zhu,
Guan Huang,
Dalong Du,
Jie Zhou ,
Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[PDF]
We propose a general method to learn appropriate embeddings for dimension estimation in monocular 3D object detection.
|
|
Deep Relational Metric Learning
Wenzhao Zheng*,
Borui Zhang*,
Jiwen Lu ,
Jie Zhou
IEEE International Conference on Computer Vision (ICCV), 2021.
[arXiv]
[Code]
We construct a graph to represent each image and perform relational inference to infer the visual similarity.
|
|
Deep Compositional Metric Learning
Wenzhao Zheng,
Chengkun Wang ,
Jiwen Lu ,
Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[PDF]
[Code]
We adaptively learn a set of composites of embeddings to receive supervision signals from different tasks to improve the generalization of the learned embeddings without sacrificing the discriminativeness.
|
|
Structural Deep Metric Learning for Room Layout Estimation
Wenzhao Zheng,
Jiwen Lu
Jie Zhou
European Conference on Computer Vision (ECCV), 2020.
[PDF]
We are the first to apply deep metric learning to prediction tasks with structured labels.
|
|
Deep Metric Learning via Adaptive Learnable Assessment
Wenzhao Zheng,
Jiwen Lu ,
Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[PDF]
We learn a sample assessment strategy for deep metric learning to maximize the generalization of the trained metric.
|
|
Hardness-Aware Deep Metric Learning
Wenzhao Zheng,
Jiwen Lu ,
Jie Zhou
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2021.
[PDF]
[Code]
We extend the previous conference-verision HDML to generate multiple synthetics for each sample.
|
|
Deep Adversarial Metric Learning
Yueqi Duan ,
Jiwen Lu ,
Wenzhao Zheng,
Jie Zhou
IEEE Transactions on Image Processing (T-IP, IF: 11.041), 2020.
[PDF]
[Code]
We propose a deep adversarial multi-metric learning (DAMML) method by learning multiple local transformations for more complete description.
|
Honors and Awards
Tsinghua Excellent Doctoral Dissertation Award
2023 Beijing Outstanding Graduate
2023 Tsinghua Outstanding Graduate
2022 Xuancheng Scholarship
2021 National Scholarship (highest scholarship given by the government of China)
CVPR 2021 Outstanding Reviewer
2020 Changtong Scholarship (highest scholarship in the Dept. of Automation)
2019 National Scholarship (highest scholarship given by the government of China)
2017 Tung OOCL Scholarship
2016 German Scholarship
|
Academic Services
Conference Reviewer / PC Member: CVPR 2019-2024, ICCV 2019-2023, ECCV 2020-2022, NeurIPS 2023, ICLR 2024, IJCAI 2020-2022, WACV 2020-2022, ICME 2019-2022,
Senior PC Member: IJCAI 2021
Journal Reviewer: T-PAMI, T-NNLS, T-IP, T-BIOM, T-IST, Pattern Recognition, Pattern Recognition Letters
|
|