Yuki M. Asano

Computer Vision | Machine Learning | Complex Systems

I'm an assistant professor for computer vision and machine learning and Science Manager of QUVA lab at the University of Amsterdam, where I work with Cees Snoek, Max Welling and Efstratios Gavves. My PhD was at the Visual Geometry Group (VGG) at the University of Oxford where I worked with Andrea Vedaldi and Christian Rupprecht. Prior to this I studied physics at the University of Munich (LMU) and Economics in Hagen as well as a MSc in Mathematical Modelling and Scientific Computing at the Mathematical Institute in Oxford. Also, I love running, the mountains ⛰️ and their combination.

Email  /  Google Scholar  /  Github /  Twitter /  LinkedIn /  CV

profile photo
News

  • Welcome to my new MSc thesis students: Lukas, Luc, Apostolos and Alfonso!
  • I'm honored to serve as an Area Chair for CVPR'23!
  • If you are a MSc in AI student at the UvA and want to write your thesis with me, please contact me. I have some exciting projects.
  • Two papers accepted at ECCV'22!
  • One paper accepted at ICML'22!
  • I'm honored to serve as an Area Chair for ECCV'22!
  • Two papers accepted at CVPR'22!
  • One paper accepted at ICLR'22!
  • Qualcomm UvA Deep Vision public seminar talk in Dec. 2021. link
  • New preprint on single-image learning.
  • Starting as an Assistant Professor at the UvA from Oct 2021.
  • Two papers accepted at NeurIPS'21 (including the first as supervisor)
  • One paper accepted at NeurIPS'21-Datasets Track: the PASS dataset, incl. pretrained models.
  • Passed my PhD with "no corrections", my examiners were Phillip Isola and Philip Torr.
  • Two papers accepted to ICCV'21! (GDT and STiCA)
  • One paper I supervised accepted to ACL'21's Workshop on online abuse and harms. More details to follow.
  • One paper accepted to PNAS, my Erdös Number is now 3 via Jobst Heitzig.
  • New preprint: using clustering & contrastive SSL we find objects without any supervision
    • OxAI team I supervised has published its results at ICLR'21 SDG workshop
    • Our new preprint on intersectional occupational biases of GPT-2 is out
    • Our paper on video-text representation learning got accepted as a Spotlight into ICLR 2021!
    • I've started volunteering my time at OxAI to help interdisciplinary teams work on AI projects.
    • Our paper on Self-Labelling Videos (SeLaVi) was accepted as a paper to NeurIPS! Code
    • Starting my summer internship June 22nd at FAIR and working with Armand Joulin and Ishan Misra.
    • I am Co-PI on a Amazon Machine Learning Award project with Christian Rupprecht and Andrea Vedaldi.
    • Awarded with the 2020 Qualcomm Innovation Fellowship.
    • I'll be co-organizing a workshop at ECCV 2020: Self-Supervised Learning: What Is Next?
    • Two papers have been accepted into ICLR 2020 (incl. one Spotlight)

Teaching

I'm teaching the Deep Learning Course for the MSc in AI at the University of Amsterdam.

Research

I'm interested in computer vision, self-supervised and multi-modal learning as well as privacy and ethics in AI.

method Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers
Jochem Loedeman, Maarten C. Stol, Tengda Han, Yuki M. Asano
arXiv 2022
bibtex

We propose to adapt frozen vision transformers by providing input-dependent prompts, computed by a light-weight network. We surpass linear- & full-finetuning in multiple benchmarks.

method Self-Guided Diffusion Models
Vincent Tao Hu, David W Zhang, Yuki M. Asano, Gertjan J. Burghouts, Cees G. M. Snoek
arXiv 2022
bibtex

We propose to use self-supervision to provide diffusion models a guidance signal, this works better than label guidance.

method VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu, Yuki M. Asano, James Thewlis, Christian Rupprecht
ECCV 2022
bibtex

We propose to utlize the "comments" modality which is common for internet data and show that it can improve vision-language learning.

method Less than Few: Self-Shot Video Instance Segmentation
Pengwan Yang, Yuki M. Asano, Pascal Mettes, Cees G. M. Snoek
ECCV 2022
bibtex

We propose to tackle the task of video instance segmentation by leveraging self-supervised learning to generate support samples at inference time for improved performances.

method Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
Conrad Borchers, Dalia Sara Gala, Benjamin Gilburt, Eduard Oravkin, Wilfried Bounsi, Yuki M. Asano, Hannah Kirk
Workshop on Gender Bias in Natural Language Processing at NAACL 2022   (Oral), 2022
bibtex

We investigate bias and mitigation strategies when using GPT-3 for generating job-advertisements.

method CITRIS: Causal Identifiability from Temporal Intervened Sequences
Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M. Asano, Taco Cohen, Efstratios Gavves
ICML, 2022
bibtex

We do visual causal representation learning using videos. Our method is able to identify causal variables by intervening on them and observing their effects in time.

Unsupervised segmentation performance Self-Supervised Learning of Object Parts for Semantic Segmentation
Adrian Ziegler, Yuki M. Asano
CVPR, 2022
bibtex

We self-supervisedly learn how to detect objects by learning to detect and combine self-segmented object parts starting from SSL pretrained ViTs.

Detecting objects without supervision Self-supervised object detection from audio-visual correspondence
Triantafyllos Afouras* , Yuki M. Asano*, Francois Fagan, Andrea Vedaldi, Florian Metze
CVPR, 2022
bibtex

We detect objects without any supervisory signal by leveraging multi-modal signals from videos and combining self-supervised contrastive- and clustering-based learning. Our model learns from video and detects objects in images.

method Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing
Iro Laina, Yuki M. Asano, Andrea Vedaldi
ICLR, 2022
bibtex

We propose quantized reverse probing as a information-theoretic measure to assess the degree to which self-supervised visual representations align with human-interpretable concepts. his measure is also able to detect when the representation correlates with combinations of labelled concepts (e.g. "red apple") instead of just individual attributes ("red" and "apple" separately).

method Extrapolating from a Single Image to a Thousand Classes using Distillation
Yuki M. Asano*, Aaqib Saeed*
arxiv, 2021
website | code | bibtex

We show that it is possible to extrpolate to semantic classes such as those of ImageNet using just a single datum as visual inputs. We leverage knowledge distillation for this and achieve performances of 94%/74% on CIFAR-10/100, 59% on ImageNet and, by extending this method to audio, 84% on SpeechCommands.

trajectory attention Keeping Your Eye On the Ball: Trajectory Attention in Video Transformers
Mandela Patrick*, Dylan Campbell*, Yuki M. Asano*, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, João F. Henriques
NeurIPS, 2021   (Oral)
code | bibtex

We present trajectory attention, a drop-in self-attention block for video transformers that implicitly tracks space-time patches along motion paths. We set SOTA results on a number of action recognition datasets: Kinetics-400, Something-Something V2, and Epic-Kitchens.

predictions vs ground-truth (US) Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models
Hannah Kirk , Yennie Jun, Haider Iqbal, Elias Benussi, Filippo Volpin, Frederic A. Dreyer, Aleksandar Shtedritski, Yuki M. Asano
NeurIPS, 2021  
code | bibtex

We analyze the biases and distributions of GPT-2's output w.r.t. to occupations. Especially interesting as AI find its way into hiring and automated application assessments.

the pass dataset PASS: An ImageNet replacement for self-supervised pretraining without humans.
Yuki M. Asano, Christian Rupprecht, Andrew Zisserman, Andrea Vedaldi
NeurIPS Datasets and Benchmarks, 2021  
webpage | data | bibtex | pretrained models

We introduce PASS, a large-scale image dataset that does not include any humans, and show that it can be used for high-quality model pretraining while significantly reducing privacy concerns.

crops help training speed Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning.
Mandela Patrick*, Yuki M. Asano*, Bernie Huang*, Ishan Misra, Florian Metze, João F. Henriques, Andrea Vedaldi
ICCV, 2021  
code | bibtex

We better leverage latent time and space for video representation learning by computing efficient multi-crops in embedding space and using a shallow transformer to model time. This yields SOTA performance and allows for training with longer videos.

hierarchical transformations On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick*, Yuki M. Asano*, Polina Kuznetsova, Ruth Fong, João F. Henriques, Geoffrey Zweig, Andrea Vedaldi
ICCV, 2021  
code | bibtex

We give transformations the prominence they deserve by introducing a systematic framework suitable for contrastive learning. SOTA video representation learning by learning (in)variances systematically.

ramsey model Emergent inequality and business cycles in a simple behavioral macroeconomic model
Yuki M. Asano, Jakob J. Kolb, Jobst Heitzig, J. Doyne Farmer
Proceedings of the National Academy of Sciences (PNAS), 2021
code | bibtex

We build an agent-based version of a fundamental macroeconomic model and include simple decision making heuristics. We find highly complex behavior and business cycles.

Schematic of our method Privacy-preserving object detection
Peiyang He, Charlie Griffin, Krzysztof Kacprzyk, Artjom Joosen, Michael Collyer, Aleksandar Shtedritski, Yuki M. Asano
ICLR, 2021 SGD workshop
bibtex

We evaluate the potential of conducting object detection with blurred and GAN-swapped faces. It works well and can potentially even alleviate biases.

Schematic of our method Support-set bottlenecks for video-text representation learning
Mandela Patrick*, Po-Yao Huang*, Yuki M. Asano*, Florian Metze, Alexander Hauptmann, João F. Henriques, Andrea Vedaldi
ICLR, 2021   (Spotlight)
bibtex | talk

We use a generative objective to improve the instance discrimination limitations of contrastive learning to set new state-of-the-art results in text-to-video retrieval.

clustered videos Labelling unlabelled videos from scratch with multi-modal self-supervision
Yuki M. Asano*, Mandela Patrick*, Christian Rupprecht, Andrea Vedaldi
NeurIPS, 2020
code | homepage | bibtex | talk

Unsupervisedly clustering videos via self-supervision. We show clustering videos well does not come for free from good representations. Instead, we learn a multi-modal clustering function that treats the audio and visual-stream as augmentations.

learned clusters Self-labelling via simultaneous clustering and representation learning
Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
ICLR, 2020   (Spotlight)
code | blog | bibtex | ICLR talk

We propose a self-supervised learning formulation that simultaneously learns feature representations and useful dataset labels by optimizing the common cross-entropy loss for features and labels, while maximizing information.

ameyoko A critical analysis of self-supervision, or what we can learn from a single image
Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
ICLR, 2020
bibtex | code | ICLR talk

We evaluate self-supervised feature learning methods and find that with sufficient data augmentation early layers can be learned using just one image. This is informative about self-supervision and the role of augmentations.

recipes Rising adoption and retention of meat-free diets in online recipe data
Yuki M. Asano* and Gesa Biermann*
Nature Sustainability , 2019
PDF | code | bibtex

We investigate dietary transitions by analysing a large scale dataset of recipes and user ratings. We detect a consistent increase in the number of users switching to vegetarian diets, and maintaining them. We show that the transition is eased by initially switching to vegetarian diets

protonCT Monte Carlo Study of the Precision and Accuracy of Proton CT Reconstructed Relative Stopping Power Maps
G. Dedes, YM. Asano, N. Arbor, D. Dauvergne, J. Letang, E. Testa, S. Rit, K. Parodi
Medical Physics, 2016
bibtex

In my BSc thesis, I investigated how we can model proton computation tomography (pCT) using Monte-Carlo based software. We simulated an ideal pCT scanner and scans of several cylindrical phantoms with various tissue equivalent inserts of different sizes.

Other activities
In Munich, I was the founder and president of a student-run management consultancy for non-profits, 180DC Munich. With great interdisciplinary colleagues, we have already helped more than 30 NGOs improve their impact measurement and effectivity.
internships I am a curious person.
I got the chance to gain some valuable experiences in consulting and more recently in the technology sector, including internships at Facebook AI Research and Transferwise.
More to come.

Great template from Jon Barron