DINOv2: Learning Robust Visual Features without Supervision

编辑：映维 | 分类：HCI / XR | 2023年10月19日

Note: We don't have the ability to review paper

PubDate: Apr 2023

Teams: Meta AI Research；Inria

Writers: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski

PDF: DINOv2: Learning Robust Visual Features without Supervision

Project: DINOv2: Learning Robust Visual Features without Supervision

DINOv2: Learning Robust Visual Features without Supervision

Abstract

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.

本文链接：https://paper.nweon.com/14847

DINOv2: Learning Robust Visual Features without Supervision

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

DINOv2: Learning Robust Visual Features without Supervision

您可能还喜欢...

A Low-cost Approach Towards Streaming 3D Videos of Large-scale Sport Events to Mixed Reality Headsets in Real-time

A Point-to-Distribution Joint Geometry and Color Metric for Point Cloud Quality Assessment

Prototyping of glove that turns metallic tableware into electric taste devices

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘