Given that a higher-resolution graphic may contain millions of pixels, chunked into thousands of patches, the attention map quickly turns into tremendous. For that reason, the level of computation grows quadratically since the resolution on the graphic increases.The Vision Transformer marks an important improvement in the field of computer vision,