Yuki M. Asano

Computer Vision | Machine Learning | Complex Systems

I'm a PhD student in the Visual Geometry Group (VGG) working with Andrea Vedaldi and Christian Rupprecht at the University of Oxford. Prior to this I studied physics at the University of Munich (LMU) and Economics in Hagen as well as a MSc in Mathematical Modelling and Scientific Computing at the Mathematical Institute in Oxford. Also, I love running, the mountains ⛰️ and their combination.

Email  /  Google Scholar  /  Github /  Twitter /  LinkedIn /  CV

profile photo

  • One paper I supervised accepted to ACL's Workshop on online abuse and harms. More details to follow.
  • One paper accepted to Proceedings of the National Academy of Sciences (PNAS). More details to follow.
  • New preprint: using clustering & contrastive SSL we find objects without any supervision
  • OxAI team I supervised has published its results at ICLR'21 SDG workshop
  • Our new preprint on intersectional occupational biases of GPT-2 is out
  • Our paper on video-text representation learning got accepted as a Spotlight into ICLR 2021!
  • I've started volunteering my time at OxAI to help interdisciplinary teams work on AI projects.
  • Our paper on Self-Labelling Videos (SeLaVi) was accepted as a paper to NeurIPS! Code
  • Starting my summer internship June 22nd at FAIR and working with Armand Joulin and Ishan Misra.
  • I am Co-PI on a Amazon Machine Learning Award project with Christian Rupprecht and Andrea Vedaldi.
  • Awarded with the 2020 Qualcomm Innovation Fellowship.
  • I'll be co-organizing a workshop at ECCV 2020: Self-Supervised Learning: What Is Next?
  • Two papers have been accepted into ICLR 2020 (incl. one Spotlight)


I'm interested in computer vision, self-supervised learning and multi-modal learning. More specifically, I want to understand the necessity and scope of prior knowledge and supervision for good neural networks. To this effect, I work with self-supervised learning and try to understand what makes things work and how far we can go without labels. I'm excited about what we can learn from data alone, from data augmentation, and videos.

Detecting objects without supervision Self-supervised object detection from audio-visual correspondence
Triantafyllos Afouras* , Yuki M. Asano*, Francois Fagan, Andrea Vedaldi, Florian Metze


We detect objects without any supervisory signal by leveraging multi-modal signals from videos and combining self-supervised contrastive- and clustering-based learning. Our model learns from video and detects objects in images, without the need for audio during inference.

Schematic of our method Privacy-preserving object detection
Peiyang He, Charlie Griffin, Krzysztof Kacprzyk, Artjom Joosen, Michael Collyer, Aleksandar Shtedritski, Yuki M. Asano
ICLR , 2021 SGD workshop


We evaluate the potential of conducting object detection with blurred and GAN-swapped faces. It works well and can potentially even alleviate biases.

predictions vs ground-truth (US) How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases
Hannah Kirk , Yennie Jun, Haider Iqbal, Elias Benussi, Filippo Volpin, Frederic A. Dreyer, Aleksandar Shtedritski, Yuki M. Asano


We analyze the biases and distributions of GPT-2's output w.r.t. to occupations. Especially interesting as AI find its way into hiring and automated application assessments.

Schematic of our method Support-set bottlenecks for video-text representation learning
Mandela Patrick*, Po-Yao Huang*, Yuki M. Asano*, Florian Metze, Alexander Hauptmann, João F. Henriques, Andrea Vedaldi
ICLR , 2021   (Spotlight)


We use a generative objective to improve the instance discrimination limitations of contrastive learning to set new state-of-the-art results in text-to-video retrieval.

clustered videos Labelling unlabelled videos from scratch with multi-modal self-supervision
Yuki M. Asano*, Mandela Patrick*, Christian Rupprecht, Andrea Vedaldi
NeurIPS , 2020

code | homepage | bibtex

Unsupervisedly clustering videos via self-supervision. We show clustering videos well does not come for free from good representations. Instead, we learn a multi-modal clustering function that treats the audio and visual-stream as augmentations.

hierarchical transformations Multi-modal Self-Supervision from Generalized Data Transformations
Mandela Patrick*, Yuki M. Asano*, Polina Kuznetsova, Ruth Fong, João F. Henriques, Geoffrey Zweig, Andrea Vedaldi


We give transformations the prominence they deserve, by introducing a systematic framework suitable for contrastive learning. SOTA video representation learning by learning (in)variances systematically.

learned clusters Self-labelling via simultaneous clustering and representation learning
Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
ICLR , 2020   (Spotlight)
code | blog | bibtex | ICLR presentation

We propose a self-supervised learning formulation that simultaneously learns feature representations and useful dataset labels by optimizing the common cross-entropy loss for features and labels, while maximizing information.

ameyoko A critical analysis of self-supervision, or what we can learn from a single image
Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
ICLR , 2020
bibtex | ICLR presentation

We evaluate self-supervised feature learning methods and find that with sufficient data augmentation early layers can be learned using just one image. This is informative about self-supervision and the role of augmentations.

recipes Rising adoption and retention of meat-free diets in online recipe data
Yuki M. Asano* and Gesa Biermann*
Nature Sustainability , 2019
code | bibtex

We investigate dietary transitions by analysing a large scale dataset of recipes and user ratings. We detect a consistent increase in the number of users switching to vegetarian diets, and maintaining them. We show that the transition is eased by initially switching to vegetarian diets

ramsey model Emergent inequality and endogenous dynamics in a simple behavioral macroeconomic model
Yuki M. Asano, Jakob J. Kolb, Jobst Heitzig, J. Doyne Farmer
ArXiv , 2019

We build an agent-based version of a fundamental macroeconomic model and include simple decision making heuristics. We find highly complex behavior and business cycles.

protonCT Monte Carlo Study of the Precision and Accuracy of Proton CT Reconstructed Relative Stopping Power Maps
G. Dedes, YM. Asano, N. Arbor, D. Dauvergne, J. Letang, E. Testa, S. Rit, K. Parodi
Medical Physics , 2016

In my BSc thesis, I investigated how we can model proton computation tomography (pCT) using Monte-Carlo based software. We simulated an ideal pCT scanner and scans of several cylindrical phantoms with various tissue equivalent inserts of different sizes.

Other activities
In Munich, I was the founder and president of a student-run management consultancy for non-profits, 180DC Munich. With great interdisciplinary colleagues, we have already helped more than 30 NGOs improve their impact measurement and effectivity.
internships I am a curious person.
I got the chance to gain some valuable experiences in consulting and more recently in the technology sector, including internships at Facebook AI Research and Transferwise.
More to come.

Great template from Jon Barron