|
Bingchen Zhao
I am a Ph.D student at the University of Edinburgh, supervised by Dr Oisin Mac Aodha.
I am interested in Concept/Category Discovery, Self-Supervised Learning, and Interpretable AI.
Please feel free to drop me an email if you are interested in what I do and looking for possible collaborations.
Contact: zhaobc.gm/bingchen.zhaox@gmail.com
|
02/2024
|
One paper accepted by CVPR 2024, my first CVPR paper!
|
01/2024
|
One paper accepted by ICLR 2024 as Spotlight!, see you in Vienna!
|
03/2023
|
Our 2nd OOD-CV workshop is accepted at ICCV, stay tuned for more details, see you in Paris!
|
10/2022
|
Recognised as a Top Reviewer for NeurIPS 2022!.
|
07/2022
|
Two papers accepted by ECCV 2022 with one selected as Oral!.
|
04/2022
|
We are organizing a workshop at ECCV 2022, check it out here.
|
09/2021
|
One paper accepted into NeurIPS 2021!
|
07/2021
|
One paper accepted into ICCV 2021 as Oral!
|
09/2019 - 05/2022
|
I am working as a teaching assistant for Prof. Yin Wang's Deep Learning Course at Tongji University.
|
Preprints / Recent Papers
|
|
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li*, Haoqin Tu*, Mude Hui*, Zeyu Wang*, Bingchen Zhao*, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie
arXiv /
Code
Preprint
TL;DR: We generated the caption of for the DataComp-1B dataset and demonstrate an improved performance on various tasks.
|
|
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Mude Hui*, Siwei Yang*, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Cihang Xie, Yuyin Zhou
arXiv /
Code
Preprint
TL;DR: A high quality dataset for image editing is proposed.
|
|
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
Xin Wen, Bingchen Zhao, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi
arXiv /
Code
Preprint
TL;DR: We find CLIP to be relatively robust to pre-training data imbalance, design and conduct controlled experiments to identify the underlying mechanisms and provide insights for recognition and SSL models..
|
|
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Haoqin Tu*, Bingchen Zhao*, Chen Wei, Cihang Xie
arXiv /
Code
Preprint
TL;DR: We present a surprising finding that multi-modal tuning improves LLMs in truthfulness and ethical behaviors.
|
|
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
RWKV Project
arXiv /
Code
COLM 2024
TL;DR: An RNN-based LLM in the era of Transformers! And now being deployed to half a billion systems worldwide!
|
|
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao, Quan Cui, Hao Wu, Osamu Yoshie, Cheng Yang, Oisin Mac Aodha
arXiv /
Website
TMLR 2024
TL;DR: We present a visual representation pre-training method for scalable web image-text data and it achieves state-of-the-art performance on various tasks with promising scaling behavior.
|
|
Labeled Data Selection for Category Discovery
Bingchen Zhao, Nico Lang, Serge Belongie, Oisin Mac Aodha
arXiv
ECCV 2024
TL;DR: What labeled data to use in category discovery matters, and we present two methods for selecting the right labeled data to use.
|
|
PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery
Fernando Cendra, Bingchen Zhao, Kai Han
arXiv /
Code
ECCV 2024
TL;DR: We present a framework for the Continual Category Discovery (CCD) task, and demonstrate state-of-the-art performance.
|
|
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu*, Chenhang Cui*, Zijun Wang*, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie
arXiv /
Code
ECCV 2024
TL;DR: We provide the first comprehensive safety benchmark for VLLMs, including OOD scenarios and Redteaming attacks.
|
|
Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations
Yongshuo Zong, Tingyang Yu, Bingchen Zhao, Ruchika Chavhan, Timothy Hospedales
arXiv /
Code
ICML 2024
TL;DR: We observed that a surprisingly simple answer permutation degrades performance of LLMs.
|
|
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang, Xiaotong Zhai, Zhongkai Zhao, Yongshuo Zong, Xin Wen, Bingchen Zhao
arXiv /
Code /
Page
CVPR 2024
TL;DR: Vision-Language Models cannot handle counterfactual questions very well.
|
|
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Bingchen Zhao*, Haoqin Tu*, Chen Wei, Jieru Mei, Cihang Xie
arXiv /
Code
ICLR 2024 Spotlight
TL;DR: We explored a parameter efficient fine-tuning method that reduces the parameters by 10x than the LoRA fine-tuning while achieving similar performance.
|
|
Incremental Generalized Category Discovery
Bingchen Zhao, Oisin Mac Aodha
arXiv
Code /
Page
ICCV 2023
TL;DR: We propose a new setting named incremental generalized category discovery that requires the model to learn to discover novel categories at each new incremental stages, a simple baseline model is also proposed based on non-parametric classifiers.
|
|
Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery
Bingchen Zhao, Xin Wen, Kai Han
arXiv /
Code
ICCV 2023
TL;DR: We tackle GCD without knowing the class number, propose a semi-supervised variant of GMM with stochastic splitting and merging to dynamically determine prototypes, and leverage PCL for representation learning on partially labelled data.
|
|
Parametric Classification for Generalized Category Discovery: A Baseline Study
Xin Wen*, Bingchen Zhao*, Xiaojuan Qi
arXiv /
Code
ICCV 2023
TL;DR: We revisit the reason that makes previous parametric classifiers fail to recognise new classes for GCD, identify the prediction biases between and within seen and novel classes as the key issue, and propose a simple yet strong framework that addresses these limitations and achieves state-of-the-art performance in this field.
|
|
XCon: Learning with Experts for Fine-grained Category Discovery
Yixin Fei, Zhongkai Zhao, Siwei Yang, Bingchen Zhao
arXiv /
Slides /
Code
BMVC 2022 Oral (34/770=4.4%)
TL;DR: Learning to do category discovery within a fine-grained dataset is challenging, we present a method that learn to do that by partition the dataset into k sub-groups, and show improved performance on several fine-grained datasets.
|
|
Self-Supervised Visual Representation Learning with Semantic Grouping
Xin Wen, Bingchen Zhao, Anlin Zheng, Xiangyu Zhang, Xiaojuan Qi
arXiv /
Website /
Code
NeurIPS 2022
TL;DR: Our model can do scene decomposition and representation learning at the same time and shows strong generalization ability pretrained on scene-centric data.
|
|
OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images
Bingchen Zhao, Shaozuo Yu, Wufei Ma, Mingxin Yu, Shenxiao Mei, Angtian Wang, Ju He, Alan Yuille, Adam Kortylewski.
arXiv /
Website /
Download /
Slides /
TPAMI Verison
ECCV 2022 Oral (158/5803=2.7%) TPAMI 2024
TL;DR: We collected a dataset where we have the control over the individual OOD attribute in the test examples.
|
|
Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective
Quan Cui*, Bingchen Zhao*, Zhao-Min Chen, Borui Zhao, Renjie Song, Jiajun Liang, Boyan Zhou, Osamu Yoshie.
arXiv /
Code /
Slides
ECCV 2022
TL;DR: We study the transferability and the discriminability of deep representations and found a trade-off between these two properties.
|
|
Novel Visual Category Discovery with Dual Ranking Statistics and Mutual Knowledge Distillation
Bingchen Zhao, Kai Han.
arXiv /
Code /
Slides
NeurIPS 2021
TL;DR: We extend novel category discovery to discover fine-grained classes by leverging information from image parts.
|
|
Improving Contrastive Learning by Visualizing Feature Transformation
Rui Zhu*, Bingchen Zhao*, Jingen Liu, Zhenglong Sun, Chang Wen Chen.
arXiv /
Code /
Slides
ICCV 2021 Oral (210/6236=3.4%)
TL;DR: We explore the training dynamics of self-supervised contrastive learning, and proposed two simple method for improving the performance of the model.
|
|
Temporal Context Aggregation for Video Retrieval with Contrastive Learning
Jie Shao*, Xin Wen*, Bingchen Zhao, Xiangyang Xue.
arXiv /
Code /
Slides
WACV 2021
TL;DR: Video retrieval methods can be improved by modeling long-range temporal information with transformer and contrastive learning.
|
|
One Venue, Two Conferences: The Separation of Chinese and American Citation Networks
Bingchen Zhao*, Yuling Gu*, Jessica Zosa Forde, Naomi Saphra
arXiv
NeurIPS 2022 AI Cultures Workshop
TL;DR: At NeurIPS, American and Chinese institutions cite papers from each other's regions substantially less than they cite endogamously. We build a citation graph to quantify this divide, compare it to European connectivity, and discuss the causes and consequences of the separation.
|
|
Distilling Visual Priors from Self-Supervised Learning
Bingchen Zhao, Xin Wen
arXiv /
Code /
Slides
ECCV 2020 VIPriors Workshop
TL;DR: Learning a model self-supervisedly and then do self-distillation helps in the data-deficient domain.
|
2022 |
Top-Reviewer for NeurIPS 2022.
|
2020 |
First-place in the FGVC7 workshop iWildcam challenge track.
|
2020 |
Second-place in the ECCV 2020 VIPrior workshop image classification challenge track.
|
2020 |
Best Undergraduate Prize in the NeurIPS 2020 SpaceNet 7 challenge.
|
2016 |
Bronze medal in the Asia-Pacific Informatics Olympiad.
|
2015 |
First Prize in the National Olympiad in Informatics in Provinces.
|
I have been a reviewer for ICLR, NeurIPS, CVPR, ICCV, ECCV, WACV, FGVC, and SIGSPATIAL.
|