Research
I'm interested in computer vision, self-supervised and multi-modal learning as well as causality and privacy in AI.
|
|
Self-Guided Diffusion Models
Vincent Tao Hu*, David W Zhang*, Yuki M. Asano, Gertjan J. Burghouts, Cees G. M. Snoek
CVPR 2023
bibtex
We propose to use self-supervision to provide diffusion models a guidance signal, this works better than label guidance.
|
|
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano*, Aaqib Saeed*
ICLR, 2023
website | code | talk| bibtex
We show that it is possible to extrpolate to semantic classes such as those of ImageNet using just a single datum as visual inputs. We leverage knowledge distillation for this and achieve performances of 94%/74% on CIFAR-10/100, 66% on ImageNet and, by extending this method to audio, 84% on SpeechCommands.
|
|
Causal Representation Learning for Instantaneous and Temporal Effects
Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M. Asano, Taco Cohen, Efstratios Gavves
ICLR, 2023
bibtex
A causal representation learning method that can identify causal variables with instantaneous effects and their graph from temporal sequences with interventions.
|
|
Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers
Jochem Loedeman, Maarten C. Stol, Tengda Han, Yuki M. Asano
arXiv 2022
bibtex
We propose to adapt frozen vision transformers by providing input-dependent prompts, computed by a light-weight network. We surpass linear- & full-finetuning in multiple benchmarks.
|
|
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu, Yuki M. Asano, James Thewlis, Christian Rupprecht
ECCV 2022
bibtex
We propose to utlize the "comments" modality which is common for internet data and show that it can improve vision-language learning.
|
|
Less than Few: Self-Shot Video Instance Segmentation
Pengwan Yang, Yuki M. Asano, Pascal Mettes, Cees G. M. Snoek
ECCV 2022
bibtex
We propose to tackle the task of video instance segmentation by leveraging self-supervised learning to generate support samples at inference time for improved performances.
|
|
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
Conrad Borchers, Dalia Sara Gala, Benjamin Gilburt, Eduard Oravkin, Wilfried Bounsi, Yuki M. Asano, Hannah Kirk
Workshop on Gender Bias in Natural Language Processing at NAACL 2022   (Oral), 2022
bibtex
We investigate bias and mitigation strategies when using GPT-3 for generating job-advertisements.
|
|
CITRIS: Causal Identifiability from Temporal Intervened Sequences
Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M. Asano, Taco Cohen, Efstratios Gavves
ICML, 2022
bibtex
We do visual causal representation learning using videos. Our method is able to identify causal variables by intervening on them and observing their effects in time.
|
|
Self-Supervised Learning of Object Parts for Semantic Segmentation
Adrian Ziegler, Yuki M. Asano
CVPR, 2022
bibtex
We self-supervisedly learn how to detect objects by learning to detect and combine self-segmented object parts starting from SSL pretrained ViTs.
|
|
Self-supervised object detection from audio-visual correspondence
Triantafyllos Afouras* , Yuki M. Asano*, Francois Fagan, Andrea Vedaldi, Florian Metze
CVPR, 2022
bibtex
We detect objects without any supervisory signal by leveraging multi-modal signals from videos and combining self-supervised contrastive- and clustering-based learning. Our model learns from video and detects objects in images.
|
|
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing
Iro Laina, Yuki M. Asano, Andrea Vedaldi
ICLR, 2022
bibtex
We propose quantized reverse probing as a information-theoretic measure to assess the degree to which self-supervised visual representations align with human-interpretable concepts. his measure is also able to detect when the representation correlates with combinations of labelled concepts (e.g. "red apple") instead of just individual attributes ("red" and "apple" separately).
|
|
Keeping Your Eye On the Ball: Trajectory Attention in Video Transformers
Mandela Patrick*, Dylan Campbell*, Yuki M. Asano*, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, João F. Henriques
NeurIPS, 2021   (Oral)
code | bibtex
We present trajectory attention, a drop-in self-attention block for video transformers that implicitly tracks space-time patches along motion paths. We set SOTA results on a number of action recognition datasets: Kinetics-400, Something-Something V2, and Epic-Kitchens.
|
|
Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models
Hannah Kirk , Yennie Jun, Haider Iqbal, Elias Benussi, Filippo Volpin, Frederic A. Dreyer, Aleksandar Shtedritski, Yuki M. Asano
NeurIPS, 2021  
code | bibtex
We analyze the biases and distributions of GPT-2's output w.r.t. to occupations. Especially interesting as AI find its way into hiring and automated application assessments.
|
|
PASS: An ImageNet replacement for self-supervised pretraining without humans.
Yuki M. Asano, Christian Rupprecht, Andrew Zisserman, Andrea Vedaldi
NeurIPS Datasets and Benchmarks, 2021  
webpage | data | bibtex | pretrained models
We introduce PASS, a large-scale image dataset that does not include any humans, and show that it can be used for high-quality model pretraining while significantly reducing privacy concerns.
|
|
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning.
Mandela Patrick*, Yuki M. Asano*, Bernie Huang*, Ishan Misra, Florian Metze, João F. Henriques, Andrea Vedaldi
ICCV, 2021  
code | bibtex
We better leverage latent time and space for video representation learning by computing efficient multi-crops in embedding space and using a shallow transformer to model time. This yields SOTA performance and allows for training with longer videos.
|
|
On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick*, Yuki M. Asano*, Polina Kuznetsova, Ruth Fong, João F. Henriques, Geoffrey Zweig, Andrea Vedaldi
ICCV, 2021  
code | bibtex
We give transformations the prominence they deserve by introducing a systematic framework suitable for contrastive learning. SOTA video representation learning by learning (in)variances systematically.
|
|
Emergent inequality and business cycles in a simple behavioral macroeconomic model
Yuki M. Asano, Jakob J. Kolb, Jobst Heitzig, J. Doyne Farmer
Proceedings of the National Academy of Sciences (PNAS), 2021
code | bibtex
We build an agent-based version of a fundamental macroeconomic model and include simple decision making heuristics. We find highly complex behavior and business cycles. |
|
Privacy-preserving object detection
Peiyang He, Charlie Griffin, Krzysztof Kacprzyk, Artjom Joosen, Michael Collyer, Aleksandar Shtedritski, Yuki M. Asano
ICLR, 2021 SGD workshop
bibtex
We evaluate the potential of conducting object detection with blurred and GAN-swapped faces. It works well and can potentially even alleviate biases.
|
|
Support-set bottlenecks for video-text representation learning
Mandela Patrick*, Po-Yao Huang*, Yuki M. Asano*, Florian Metze, Alexander Hauptmann, João F. Henriques, Andrea Vedaldi
ICLR, 2021   (Spotlight)
bibtex | talk
We use a generative objective to improve the instance discrimination limitations of contrastive learning to set new state-of-the-art results in text-to-video retrieval.
|
|
Labelling unlabelled videos from scratch with multi-modal self-supervision
Yuki M. Asano*, Mandela Patrick*, Christian Rupprecht, Andrea Vedaldi
NeurIPS, 2020
code | homepage | bibtex | talk
Unsupervisedly clustering videos via self-supervision. We show clustering videos well does not come for free from good representations. Instead, we learn a multi-modal clustering function that treats the audio and visual-stream as augmentations.
|
|
Self-labelling via simultaneous clustering and representation learning
Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
ICLR, 2020   (Spotlight)
code | blog | bibtex | ICLR talk
We propose a self-supervised learning formulation that simultaneously learns feature representations and useful dataset labels by optimizing the common cross-entropy loss for features and labels, while maximizing information.
|
|
A critical analysis of self-supervision, or what we can learn from a single image
Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
ICLR, 2020
bibtex | code | ICLR talk
We evaluate self-supervised feature learning methods and find that with sufficient data augmentation early layers can be learned using just one image. This is informative about self-supervision and the role of augmentations.
|
|
Rising adoption and retention of meat-free diets in online recipe data
Yuki M. Asano* and Gesa Biermann*
Nature Sustainability , 2019
PDF | code | bibtex
We investigate dietary transitions by analysing a large scale dataset of recipes and user ratings. We detect a consistent increase in the number of users switching to vegetarian diets, and maintaining them. We show that the transition is eased by initially switching to vegetarian diets |
|
Monte Carlo Study of the Precision and Accuracy of Proton CT Reconstructed Relative Stopping Power Maps
G. Dedes, YM. Asano, N. Arbor, D. Dauvergne, J. Letang, E. Testa, S. Rit, K. Parodi
Medical Physics, 2016
bibtex
In my BSc thesis, I investigated how we can model proton computation tomography (pCT) using Monte-Carlo based software. We simulated an ideal pCT scanner and scans of several cylindrical phantoms with various tissue equivalent inserts of different sizes.
|
 |
In Munich, I was the founder and president of a student-run management consultancy for non-profits, 180DC Munich. With great interdisciplinary colleagues, we have already helped more than 30 NGOs improve their impact measurement and effectivity.
|
|
I am a curious person.
I got the chance to gain some valuable experiences in consulting and more recently in the technology sector, including internships at Facebook AI Research and Transferwise.
More to come.
|
|