Chen Yulin's Blog

Posted 2024-09-29Updated 2025-07-24Reviewa minute read (About 185 words)

背景

LSTM主要是用于解决递归网络中梯度指数级消失或者梯度爆炸的问题

https://www.youtube.com/watch?v=YCzL96nL7j0&t=267s
LSTM和RNN主要的区别就在于：LSTM有两条记忆链，一条短期记忆，一条长期记忆。

主要分成三个模块 - Forget Gate: 决定遗忘多少长期记忆 - Input Gate: 决定将多少当前输入存入长期记忆 - Output Gate: 基于短期记忆和输入决定输出的百分比，乘上长期记忆激活后的值，获得新的短期记忆，也就是输出。

这里gate的概念启发了grConv[[On the Properties of Neural Machine Translation= Encoder–Decoder Approaches]]

Posted 2024-09-27Updated 2025-07-24Review3 minutes read (About 376 words)

On the Properties of Neural Machine Translation= Encoder–Decoder Approaches

概要

对比了 RNN Encoder-Decoder 和 GRU(new proposed)之间的翻译能力，发现GRU更具优势且能够理解语法。

背景

RNN Encoder–Decoder

因为会把要翻译的语句映射到固定长度的vector所以训练需要的内存空间是固定的且很小，500M和几十G形成对比。
但也有问题：

As this approach is relatively new, there has not been much work on analyzing the properties and behavior of these models. For instance: What are the properties of sentences on which this approach performs better? How does the choice of source/target vocabulary affect the performance? In which cases does the neural machine translation fail?

不够Fancy的地方：

随着源句长度的增加，神经机器翻译模型的性能迅速下降。
词汇量的大小对翻译效果有很大的影响。

Encoder For Variable-Length Sequences

RNN

递归神经网络(RNN)在变长序列x = ( x1 , x2, … , xT)上通过保持隐藏状态h随时间变化而工作

grConv

这是本文提出的用于替换RNN Encoder-Decoder 中的Encoder的一种新的神经网络，文中称为：gated recursive convolutional neural network (grConv)

如图a为Recursive convolutional NN (这是啥？) #question 图b为grConv grConv则是让隐藏层通过训练w参数可以从三个输入中挑选：

其中 $\omega_c+\omega_l+\omega_r=1$ 由此便获得了如图c,d所示的自主学习语法结构的能力。

非常直观的图 #paradigm

Posted 2024-09-27Updated 2025-07-24Reviewa minute read (About 112 words)

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Background: RNN

首先介绍了RNN通过hidden state来实现记忆力功能

但指出RNN的训练有梯度消失/爆炸的现象，且记忆会沿序列长度的增加而指数下降，缺乏长期记忆能力。解决梯度消失/爆炸目前有梯度裁剪和二阶梯度的方法，但成效并不显著

Gated RNN

[[On the Properties of Neural Machine Translation= Encoder–Decoder Approaches]]

Posted 2024-09-27Updated 2025-07-24Reviewa minute read (About 203 words)

Attention Is All You Need

概要

Transformer是一种基于注意力机制，完全不需要递归或卷积网络的序列预测模型，且更易于训练

背景

介绍了Gated-RNN/LSTM的基本逻辑[[Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling]]，指出:
这种固有的顺序性质阻碍了训练示例中的并行化，这在较长的序列长度上变得至关重要，因为内存限制限制了示例之间的批处理，虽然后续有相关工作优化了一些性能，但是基本的限制并没有解除。