RA-Swin: A RefineNet Based Adaptive Model Using Swin Transformer for Monocular Depth Estimation

编辑：广东客 | 分类：CV | 2022年11月25日

Note: We don't have the ability to review paper

PubDate: Aug 2022

Teams: Yunnan Normal University

Writers: Mengnan Chen; Jiatao Liu; Yaping Zhang; Qiaosheng Feng

PDF: RA-Swin: A RefineNet Based Adaptive Model Using Swin Transformer for Monocular Depth Estimation

Abstract

Transformer-based deep learning networks have achieved extraordinary success in the field of natural language processing (NLP) in recent years. However, Transformer faces practical challenges due to the differences in the fields of NLP and visual dense prediction. This paper employs a layered Transformer as a feature extraction encoder for monocular depth estimation to overcome these differences. The encoder takes the original image size as input and performs self-attention computation on non-overlapping local regions of the feature map by shifting the window. It enables the cross-window information to interact. Different variants of the encoder are followed by an adaptable decoder based on the spatial resampling module and Refine Net. The adaptable decoder can better fuse the multi-scale output features of the encoder while keeping the number of parameters low, combined with skip connections. Experiments show that the encoder-decoder structure in this paper, fine-tuned on the NYU Depth v2 dataset, can also yield substantial improvements for monocular depth estimation. The experimental results show that compared with the current advanced Transformer model DPT -Hybrid, the root mean square error (RMS) of the Swin-B and Swin-L based models are reduced by 1.12% and 2.97%, achieving better depth estimation results.

本文链接：https://paper.nweon.com/13569

RA-Swin: A RefineNet Based Adaptive Model Using Swin Transformer for Monocular Depth Estimation

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

RA-Swin: A RefineNet Based Adaptive Model Using Swin Transformer for Monocular Depth Estimation

您可能还喜欢...

Full Body Video-Based Self-Avatars for Mixed Reality: from E2E System to User Study

SqueezeMe: Mobile-Ready Distillation of Gaussian Full-Body Avatars

Real-Time Instance Segmentation Tracking Algorithm in Mixed Reality

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘