空 挡 广 告 位 | 空 挡 广 告 位

Designing Parameter and Compute Efficient Diffusion Transformers using Distillation

Note: We don't have the ability to review paper

PubDate: Feb 2025

Teams:University of Illinois Urbana Champaign

Writers:Vignesh Sundaresha

PDF:Designing Parameter and Compute Efficient Diffusion Transformers using Distillation

Abstract

Diffusion Transformers (DiTs) with billions of model parameters form the backbone of popular image and video generation models like DALL.E, Stable-Diffusion and SORA. Though these models are necessary in many low-latency applications like Augmented/Virtual Reality, they cannot be deployed on resource-constrained Edge devices (like Apple Vision Pro or Meta Ray-Ban glasses) due to their huge computational complexity. To overcome this, we turn to knowledge distillation and perform a thorough design-space exploration to achieve the best DiT for a given parameter size. In particular, we provide principles for how to choose design knobs such as depth, width, attention heads and distillation setup for a DiT. During the process, a three-way trade-off emerges between model performance, size and speed that is crucial for Edge implementation of diffusion. We also propose two distillation approaches - Teaching Assistant (TA) method and Multi-In-One (MI1) method - to perform feature distillation in the DiT context. Unlike existing solutions, we demonstrate and benchmark the efficacy of our approaches on practical Edge devices such as NVIDIA Jetson Orin Nano.

您可能还喜欢...

Paper