Posted 2025-01-09Updated 2025-01-15Notea few seconds read (About 13 words)AN IMAGE IS WORTH 16X16 WORDS- TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE https://www.youtube.com/watch?v=j3VNqtJUoz0&t=16s
Posted 2024-09-27Updated 2025-01-15Reviewa minute read (About 153 words)Attention Is All You Need概要Transformer是一种基于注意力机制,完全不需要递归或卷积网络的序列预测模型,且更易于训练 背景介绍了Gated-RNN/LSTM的基本逻辑[[Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling]],指出:这种固有的顺序性质阻碍了训练示例中的并行化,这在较长的序列长度上变得至关重要,因为内存限制限制了示例之间的批处理,虽然后续有相关工作优化了一些性能,但是基本的限制并没有解除。