Wenzhao Zheng

I am currently a postdoctoral fellow in the Department of EECS at University of California, Berkeley, affiliated with Berkeley Artificial Intelligence Research Lab (BAIR) and Berkeley Deep Drive (BDD) , supervised by Prof. Kurt Keutzer . Prior to that, I received my Ph.D degree from the Department of Automation at Tsinghua University, advised by Prof. Jie Zhou and Prof. Jiwen Lu . In 2018, I received my BS degree from the Department of Physics, Tsinghua University.

I am interested in computer vision and deep learning. My current research focuses on:

  • Vision-centric autonomous driving that efficiently perceives and predicts the complex 3D world based on images.
  • Omni-supervised representation learning that exploits various types of supervision signals to learn discriminative and generalizable visual representations.
  • Explainable artificial intelligence that builds comprehensible and trustworthy AI systems with high performance.
  • If you want to work with me (in person or remotely) as an intern at BAIR, feel free to drop me an email at wzzheng@berkeley.edu. I will support GPUs if we are a good fit.

    Email  /  CV  /  Google Scholar  /  GitHub

    profile photo
    News

  • 2024-07: Four papers are accepted to ECCV 2024.
  • 2024-05: One paper on lane detection is accepted to T-IP.
  • 2024-04: One paper on 3D object detection is accepted to T-MM.
  • 2024-02: Two papers on 3D occupancy prediction are accepted to CVPR 2024.
  • 2024-01: One paper on explainable deep learning is accepted to ICLR 2024.
  • 2023-09: One paper on deep metric learning is accepted to T-PAMI.
  • 2023-09: One paper on unsupervised indoor depth completion is accepted to T-CSVT.
  • 2023-07: Three papers on representation learning and 3D occpuacy prediction are accepted to ICCV 2023.
  • 2023-01: Two papers on 3D occpuacy prediction and deep metric learning are accepted to CVPR 2023.
  • 2023-01: One paper on explainable deep networks is accepted to ICLR 2023.
  • 2023-01: One paper on deep metric learning is accepted to T-PAMI.
  • *Equal contribution    Project leader/Corresponding author.

    Newest Papers

    dise OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
    Lening Wang* , Wenzhao Zheng*, †, Yilong Ren , Han Jiang , Zhiyong Cui , Haiyang Yu , Jiwen Lu
    arXiv, 2024.
    [arXiv] [Code] [Project Page]

    With trajectory-aware 4D generation, OccSora has the potential to serve as a world simulator for the decision-making of autonomous driving.

    dise S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving
    Nan Huang , Xiaobao Wei , Wenzhao Zheng, Pengju An , Ming Lu , Wei Zhan , Masayoshi Tomizuka , Kurt Keutzer , Shanghang Zhang
    arXiv, 2024.
    [arXiv] [Code] [Project Page]

    S3Gaussian employs 3D Gaussians to model dynamic scenes for autonomous driving without other supervisions (e.g., 3D bounding boxes).

    dise GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
    Yuanhui Huang , Wenzhao Zheng, Yunpeng Zhang , Jie Zhou , Jiwen Lu
    European Conference on Computer Vision (ECCV), 2024.
    [arXiv] [Code] [Project Page] [中文解读 (in Chinese)]

    GaussianFormer proposes the 3D semantic Gaussians as a more efficient object-centric representation for driving scenes compared with 3D occupancy.

    dise Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection
    Shuai Zeng , Wenzhao Zheng, Jiwen Lu , Haibin Yan ,
    IEEE Transactions on Multimedia (T-MM, IF: 7.3), 2024.
    [arXiv] [Code]

    HASS proposes a scene synthesis strategy to adaptively generate challenging synthetic scenes for more generalizable semi-supervised 3D object detection.

    dise GenAD: Generative End-to-End Autonomous Driving
    Wenzhao Zheng*, Ruiqi Song* , Xianda Guo* , Chenming Zhang , Long Chen
    European Conference on Computer Vision (ECCV), 2024.
    [arXiv] [Code] [中文解读 (in Chinese)]

    GenAD casts autonomous driving as a generative modeling problem.

    Selected Papers

    Autonomous Driving

    dise OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
    Wenzhao Zheng*, Weiliang Chen* , Yuanhui Huang , Borui Zhang , Yueqi Duan, Jiwen Lu
    European Conference on Computer Vision (ECCV), 2024.
    [arXiv] [Code] [Project Page] [中文解读 (in Chinese)]

    OccWorld models the joint evolutions of 3D scenes and ego movements and paves the way for interpretable end-to-end large driving models.

    dise SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
    Yuanhui Huang* , Wenzhao Zheng*, Borui Zhang , Jie Zhou , Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
    [arXiv] [Code] [Project Page] [中文解读 (in Chinese)]

    SelfOcc is the first self-supervised work that produces reasonable 3D occupancy for surround cameras.

    dise PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction
    Sicheng Zuo* , Wenzhao Zheng*, Yuanhui Huang , Jie Zhou , Jiwen Lu
    arXiv, 2023.
    [arXiv] [Code] [中文解读 (in Chinese)]

    As the first 2D-projection-based method on the 3D semantic occupancy prediction task, PointOcc significantly outperforms all other methods by a large margin with a much faster speed.

    dise SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
    Yi Wei*, Linqing Zhao*, Wenzhao Zheng, Zheng Zhu, Jie Zhou , Jiwen Lu
    IEEE International Conference on Computer Vision (ICCV), 2023.
    [arXiv] [Code] [中文解读 (in Chinese)]

    We design a pipeline to generate dense occupancy ground truths without expensive occupancy annotations, which enalbes the training of more dense 3D occupancy prediction models.

    dise Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
    Yuanhui Huang* , Wenzhao Zheng*, Yunpeng Zhang , Jie Zhou , Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
    [arXiv] [Code] [Project Page] [中文解读 (in Chinese)]

    Given only surround-camera motorcycle RGB images barrier as inputs, our model (trained using trailer only sparse traffic cone LiDAR point supervision) can predict the semantic occupancy for all volumes in the 3D space.

    dise BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
    Yunpeng Zhang , Zheng Zhu, Wenzhao Zheng, Junjie Huang, Guan Huang, Jie Zhou , Jiwen Lu
    arXiv, 2022.
    [arXiv] [Code] [中文解读 (in Chinese)]

    We propose a unified framework for 3D perception and prediction based on multi-camera systems. The multi-task BEVerse outperforms existing single-task methods on 3D object detection, semantic map construction, and motion prediction.

    dise SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
    Yi Wei*, Linqing Zhao*, Wenzhao Zheng, Zheng Zhu, Yonming Rao, Guan Huang, Jiwen Lu , Jie Zhou
    Conference on Robot Learning (CoRL), 2022.
    [arXiv] [Code] [中文解读 (in Chinese)]

    We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.

    Representation Learning

    dise Introspective Deep Metric Learning
    Chengkun Wang* , Wenzhao Zheng*, Zheng Zhu, Jie Zhou , Jiwen Lu
    IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2023.
    [arXiv] [Code]

    We propose an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images.

    dise OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions
    Chengkun Wang* , Wenzhao Zheng*, Zheng Zhu, Jie Zhou , Jiwen Lu
    IEEE International Conference on Computer Vision (ICCV), 2023.
    [arXiv] [Code]

    We unify fully supervised and self-supervised contrastive learning and exploit both supervisions from labeled and unlabeled data for training.

    dise Token-Label Alignment for Vision Transformers
    Han Xiao*, Wenzhao Zheng*, Zheng Zhu, Jie Zhou , Jiwen Lu
    IEEE International Conference on Computer Vision (ICCV), 2023.
    [arXiv] [Code]

    We identify a token fluctuation phenomenon that has suppressed the potential of data mixing strategies for vision transformers. To adress this, we propose a token-label alignment (TL-Align) method to trace the correspondence between transformed tokens and the original tokens to maintain a label for each token.

    dise Deep Metric Learning with Adaptively Composite Dynamic Constraints
    Wenzhao Zheng, Jiwen Lu , Jie Zhou
    IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2023.
    [PDF]

    This paper formulates deep metric learning under a unified framework and propose a dynamic constraint generator to produce adaptive composite constraints to train the metric towards good generalization.

    dise Hardness-Aware Deep Metric Learning
    Wenzhao Zheng, Zhaodong Chen , Jiwen Lu , Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019 (oral).
    [PDF] [Code]

    We perform linear interpolation on embeddings to adaptively manipulate their hardness levels and generate corresponding label-preserving synthetics for recycled training.

    dise Deep Adversarial Metric Learning
    Yueqi Duan , Wenzhao Zheng, Xudong Lin , Jiwen Lu , Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (spotlight).
    [PDF] [Code]

    We generate potential hard negatives adversarial to the learned metric as complements.

    Explainable Artificial Intelligence

    dise Path Choice Matters for Clear Attribution in Path Methods
    Borui Zhang, Wenzhao Zheng, Jie Zhou , Jiwen Lu
    International Conference on Learning Representations (ICLR), 2024.
    [arXiv] [Code]

    To address the ambiguity in attributions caused by different path choices, we introduced the Concentration Principle and developed SAMP, an efficient model-agnostic interpreter. By incorporating the infinitesimal constraint (IC) and momentum strategy (MS), SAMP provides superior interpretations.

    dise Exploring Unified Perspective For Fast Shapley Value Estimation
    Borui Zhang*, Baotong Tian*, Wenzhao Zheng, Jie Zhou, Jiwen Lu
    arXiv, 2023
    [arXiv] [Code]

    This paper analyzes the consistency of existing Shapley value estimators and proposes the simple amortized estimator, SimSHAP. Extensive experiments conducted on tabular and image datasets validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.

    dise Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint
    Borui Zhang, Wenzhao Zheng, Jie Zhou , Jiwen Lu
    International Conference on Learning Representations (ICLR), 2023.
    [arXiv] [Code]

    This paper proposes Bort, an optimizer for improving model explainability with boundedness and orthogonality constraints on model parameters, derived from the sufficient conditions of model comprehensibility and transparency.

    dise Attributable Visual Similarity Learning
    Borui Zhang, Wenzhao Zheng, Jie Zhou , Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
    [arXiv] [Code]

    This paper proposes an attributable visual similarity learning (AVSL) framework, which employs a generalized similarity learning paradigm to represent the similarity between two images with a graph for a more accurate and explainable similarity measure between images.

    Other Papers

    dise SPTR: Structure-Preserving Transformer for Unsupervised Indoor Depth Completion
    Linqing Zhao, Wenzhao Zheng, Yueqi Duan, Jie Zhou , Jiwen Lu
    IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT, IF: 8.4), 2023.
    [PDF]

    We propose a Structure-Preserving Encoding (SPE) module to reformulate depth completion as a process of 3D structure generation.

    dise Deep Factorized Metric Learning
    Chengkun Wang* , Wenzhao Zheng*, Junlong Li, Jie Zhou , Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
    [PDF]

    We factorize the backbone network to different sub-blocks and learns an adaptive route for each sample to achieve the diversity of features.

    dise Probabilistic Deep Metric Learning for Hyperspectral Image Classification
    Chengkun Wang , Wenzhao Zheng, Xian Sun , Jiwen Lu , Jie Zhou
    arXiv, 2022.
    [arXiv] [Code]

    We propose a probabilistic deep metric learning framework to model the categorical uncertainty of the spectral distribution of an observed pixel for Hyperspectral image classification.

    dise Dynamic Metric Learning with Cross-Level Concept Distillation
    Wenzhao Zheng, Yuanhui Huang , Borui Zhang, Jie Zhou , Jiwen Lu
    European Conference on Computer Vision (ECCV), 2022.
    [PDF] [Code]

    This paper propose a hierarchical concept refiner to construct multiple levels of concept embeddings of an image and them pull closer the distance of the corresponding concepts to facilitate the cross-level semantic structure of the image representations.

    dise A Simple Baseline for Multi-Camera 3D Object Detection
    Yunpeng Zhang , Wenzhao Zheng, Zheng Zhu, Guan Huang, Jie Zhou , Jiwen Lu
    Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023.
    [arXiv] [Code]

    We propose a simple baseline for multi-camera object detection to adapt existing monocular 3D object detection methods with a two-stage propose-and-fuse framework.

    dise Dimension Embeddings for Monocular 3D Object Detection
    Yunpeng Zhang , Wenzhao Zheng, Zheng Zhu, Guan Huang, Dalong Du, Jie Zhou , Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
    [PDF]

    We propose a general method to learn appropriate embeddings for dimension estimation in monocular 3D object detection.

    dise Deep Relational Metric Learning
    Wenzhao Zheng*, Borui Zhang*, Jiwen Lu , Jie Zhou
    IEEE International Conference on Computer Vision (ICCV), 2021.
    [arXiv] [Code]

    We construct a graph to represent each image and perform relational inference to infer the visual similarity.

    dise Deep Compositional Metric Learning
    Wenzhao Zheng, Chengkun Wang , Jiwen Lu , Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
    [PDF] [Code]

    We adaptively learn a set of composites of embeddings to receive supervision signals from different tasks to improve the generalization of the learned embeddings without sacrificing the discriminativeness.

    dise Structural Deep Metric Learning for Room Layout Estimation
    Wenzhao Zheng, Jiwen Lu Jie Zhou
    European Conference on Computer Vision (ECCV), 2020.
    [PDF]

    We are the first to apply deep metric learning to prediction tasks with structured labels.

    dise Deep Metric Learning via Adaptive Learnable Assessment
    Wenzhao Zheng, Jiwen Lu , Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    [PDF]

    We learn a sample assessment strategy for deep metric learning to maximize the generalization of the trained metric.

    dise Hardness-Aware Deep Metric Learning
    Wenzhao Zheng, Jiwen Lu , Jie Zhou
    IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2021.
    [PDF] [Code]

    We extend the previous conference-verision HDML to generate multiple synthetics for each sample.

    dise Deep Adversarial Metric Learning
    Yueqi Duan , Jiwen Lu , Wenzhao Zheng, Jie Zhou
    IEEE Transactions on Image Processing (T-IP, IF: 11.041), 2020.
    [PDF] [Code]

    We propose a deep adversarial multi-metric learning (DAMML) method by learning multiple local transformations for more complete description.

    Honors and Awards

  • Tsinghua Excellent Doctoral Dissertation Award
  • 2023 Beijing Outstanding Graduate
  • 2023 Tsinghua Outstanding Graduate
  • 2022 Xuancheng Scholarship
  • 2021 National Scholarship (highest scholarship given by the government of China)
  • CVPR 2021 Outstanding Reviewer
  • 2020 Changtong Scholarship (highest scholarship in the Dept. of Automation)
  • 2019 National Scholarship (highest scholarship given by the government of China)
  • 2017 Tung OOCL Scholarship
  • 2016 German Scholarship
  • Academic Services

  • Conference Reviewer / PC Member: CVPR 2019-2024, ICCV 2019-2023, ECCV 2020-2022, NeurIPS 2023, ICLR 2024, IJCAI 2020-2022, WACV 2020-2022, ICME 2019-2022,
  • Senior PC Member: IJCAI 2021
  • Journal Reviewer: T-PAMI, T-NNLS, T-IP, T-BIOM, T-IST, Pattern Recognition, Pattern Recognition Letters

  • Website Template


    © Wenzhao Zheng | Last updated: June 1, 2024.