Yuki M. Asano

Computer Vision | Machine Learning | Complex Systems

I'm an assistant professor for computer vision and machine learning at the QUVA lab at the University of Amsterdam, where I work with Cees Snoek, Max Welling and Efstratios Gavves. My PhD was at the Visual Geometry Group (VGG) at the University of Oxford where I worked with Andrea Vedaldi and Christian Rupprecht. Prior to this I studied physics at the University of Munich (LMU) and Economics in Hagen as well as a MSc in Mathematical Modelling and Scientific Computing at the Mathematical Institute in Oxford. Also, I love running, the mountains ⛰️ and their combination.

Email  /  Google Scholar  /  Github /  Twitter /  LinkedIn /  CV

profile photo

  • Qualcomm UvA Deep Vision public seminar talk in Dec. 2021. link
  • New preprint on single-image learning.
  • Starting as an Assistant Professor at the UvA from Oct 2021.
  • Two papers accepted at NeurIPS'21 (including the first as supervisor)
  • One paper accepted at NeurIPS'21-Datasets Track: the PASS dataset, incl. pretrained models.
  • Passed my PhD with "no corrections", my examiners were Phillip Isola and Philip Torr.
  • Two papers accepted to ICCV'21! (GDT and STiCA)
  • One paper I supervised accepted to ACL'21's Workshop on online abuse and harms. More details to follow.
  • One paper accepted to PNAS, my Erdös Number is now 3 via Jobst Heitzig.
  • New preprint: using clustering & contrastive SSL we find objects without any supervision
  • OxAI team I supervised has published its results at ICLR'21 SDG workshop
  • Our new preprint on intersectional occupational biases of GPT-2 is out
  • Our paper on video-text representation learning got accepted as a Spotlight into ICLR 2021!
  • I've started volunteering my time at OxAI to help interdisciplinary teams work on AI projects.
  • Our paper on Self-Labelling Videos (SeLaVi) was accepted as a paper to NeurIPS! Code
  • Starting my summer internship June 22nd at FAIR and working with Armand Joulin and Ishan Misra.
  • I am Co-PI on a Amazon Machine Learning Award project with Christian Rupprecht and Andrea Vedaldi.


I'm interested in computer vision, self-supervised learning and multi-modal learning. More specifically, I want to understand the necessity and scope of prior knowledge and supervision for good neural networks. To this effect, I work with self-supervised learning and try to understand what makes things work and how far we can go without labels. I'm excited about what we can learn from data alone, from data augmentation, and videos.

method Extrapolating from a Single Image to a Thousand Classes using Distillation
Yuki M. Asano*, Aaqib Saeed*
arxiv, 2021
website | code | bibtex

We show that it is possible to extrpolate to semantic classes such as those of ImageNet using just a single datum as visual inputs. We leverage knowledge distillation for this and achieve performances of 94%/74% on CIFAR-10/100, 59% on ImageNet and, by extending this method to audio, 84% on SpeechCommands.

trajectory attention Keeping Your Eye On the Ball: Trajectory Attention in Video Transformers
Mandela Patrick*, Dylan Campbell*, Yuki M. Asano*, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, João F. Henriques
NeurIPS, 2021   (Oral)
code | bibtex

We present trajectory attention, a drop-in self-attention block for video transformers that implicitly tracks space-time patches along motion paths. We set SOTA results on a number of action recognition datasets: Kinetics-400, Something-Something V2, and Epic-Kitchens.

predictions vs ground-truth (US) Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models
Hannah Kirk , Yennie Jun, Haider Iqbal, Elias Benussi, Filippo Volpin, Frederic A. Dreyer, Aleksandar Shtedritski, Yuki M. Asano
NeurIPS, 2021  
code | bibtex

We analyze the biases and distributions of GPT-2's output w.r.t. to occupations. Especially interesting as AI find its way into hiring and automated application assessments.

the pass dataset PASS: An ImageNet replacement for self-supervised pretraining without humans.
Yuki M. Asano, Christian Rupprecht, Andrew Zisserman, Andrea Vedaldi
NeurIPS Datasets and Benchmarks, 2021  
webpage | data | bibtex | pretrained models

We introduce PASS, a large-scale image dataset that does not include any humans, and show that it can be used for high-quality model pretraning while significantly reducing privacy concerns.

crops help training speed Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning.
Mandela Patrick*, Yuki M. Asano*, Bernie Huang*, Ishan Misra, Florian Metze, João F. Henriques, Andrea Vedaldi
ICCV, 2021  
code | bibtex

We better leverage latent time and space for video representation learning by computing efficient multi-crops in embedding space and using a shallow transformer to model time. This yields SOTA performance and allows for training with longer videos.

hierarchical transformations On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick*, Yuki M. Asano*, Polina Kuznetsova, Ruth Fong, João F. Henriques, Geoffrey Zweig, Andrea Vedaldi
ICCV, 2021  
code | bibtex

We give transformations the prominence they deserve by introducing a systematic framework suitable for contrastive learning. SOTA video representation learning by learning (in)variances systematically.

ramsey model Emergent inequality and business cycles in a simple behavioral macroeconomic model
Yuki M. Asano, Jakob J. Kolb, Jobst Heitzig, J. Doyne Farmer
Proceedings of the National Academy of Sciences (PNAS), 2021
code | bibtex

We build an agent-based version of a fundamental macroeconomic model and include simple decision making heuristics. We find highly complex behavior and business cycles.

Detecting objects without supervision Self-supervised object detection from audio-visual correspondence
Triantafyllos Afouras* , Yuki M. Asano*, Francois Fagan, Andrea Vedaldi, Florian Metze

We detect objects without any supervisory signal by leveraging multi-modal signals from videos and combining self-supervised contrastive- and clustering-based learning. Our model learns from video and detects objects in images.

Schematic of our method Privacy-preserving object detection
Peiyang He, Charlie Griffin, Krzysztof Kacprzyk, Artjom Joosen, Michael Collyer, Aleksandar Shtedritski, Yuki M. Asano
ICLR, 2021 SGD workshop

We evaluate the potential of conducting object detection with blurred and GAN-swapped faces. It works well and can potentially even alleviate biases.

Schematic of our method Support-set bottlenecks for video-text representation learning
Mandela Patrick*, Po-Yao Huang*, Yuki M. Asano*, Florian Metze, Alexander Hauptmann, João F. Henriques, Andrea Vedaldi
ICLR, 2021   (Spotlight)
bibtex | talk

We use a generative objective to improve the instance discrimination limitations of contrastive learning to set new state-of-the-art results in text-to-video retrieval.

clustered videos Labelling unlabelled videos from scratch with multi-modal self-supervision
Yuki M. Asano*, Mandela Patrick*, Christian Rupprecht, Andrea Vedaldi
NeurIPS, 2020
code | homepage | bibtex | talk

Unsupervisedly clustering videos via self-supervision. We show clustering videos well does not come for free from good representations. Instead, we learn a multi-modal clustering function that treats the audio and visual-stream as augmentations.

learned clusters Self-labelling via simultaneous clustering and representation learning
Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
ICLR, 2020   (Spotlight)
code | blog | bibtex | ICLR talk

We propose a self-supervised learning formulation that simultaneously learns feature representations and useful dataset labels by optimizing the common cross-entropy loss for features and labels, while maximizing information.

ameyoko A critical analysis of self-supervision, or what we can learn from a single image
Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
ICLR, 2020
bibtex | code | ICLR talk

We evaluate self-supervised feature learning methods and find that with sufficient data augmentation early layers can be learned using just one image. This is informative about self-supervision and the role of augmentations.

recipes Rising adoption and retention of meat-free diets in online recipe data
Yuki M. Asano* and Gesa Biermann*
Nature Sustainability , 2019
code | bibtex

We investigate dietary transitions by analysing a large scale dataset of recipes and user ratings. We detect a consistent increase in the number of users switching to vegetarian diets, and maintaining them. We show that the transition is eased by initially switching to vegetarian diets

protonCT Monte Carlo Study of the Precision and Accuracy of Proton CT Reconstructed Relative Stopping Power Maps
G. Dedes, YM. Asano, N. Arbor, D. Dauvergne, J. Letang, E. Testa, S. Rit, K. Parodi
Medical Physics, 2016

In my BSc thesis, I investigated how we can model proton computation tomography (pCT) using Monte-Carlo based software. We simulated an ideal pCT scanner and scans of several cylindrical phantoms with various tissue equivalent inserts of different sizes.

Other activities
In Munich, I was the founder and president of a student-run management consultancy for non-profits, 180DC Munich. With great interdisciplinary colleagues, we have already helped more than 30 NGOs improve their impact measurement and effectivity.
internships I am a curious person.
I got the chance to gain some valuable experiences in consulting and more recently in the technology sector, including internships at Facebook AI Research and Transferwise.
More to come.

Great template from Jon Barron