[논문 리뷰] Tokens-to-Token ViT
이번에 소개할 논문은 Tokens-to-Token ViT라는 논문이다. https://arxiv.org/abs/2101.11986 Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Transformers, which are popular for language modeling, have been explored for solving vision tasks recently, e.g., the Vision Transformer (ViT) for image classification. The ViT model splits each image into a sequence of tokens with fixed length and ..
2022.05.05