Video Self-distillation for Single-image Encoders: Learning Temporal Priors from Unlabeled Video
This blog post was originally published at Nota AI’s website. It is reprinted here with the permission of Nota AI. Proposes a simple next-frame prediction task using unlabeled video to enhance single-image encoders. Injects 3D geometric and temporal priors into image-based models without requiring optical flow or object tracking. Outperforms state-of-the-art self-supervised methods like DoRA […]










