Posted 2024-09-29Updated 2025-08-15Reviewa minute read (About 185 words)

背景

LSTM主要是用于解决递归网络中梯度指数级消失或者梯度爆炸的问题

https://www.youtube.com/watch?v=YCzL96nL7j0&t=267s
LSTM和RNN主要的区别就在于：LSTM有两条记忆链，一条短期记忆，一条长期记忆。

主要分成三个模块 - Forget Gate: 决定遗忘多少长期记忆 - Input Gate: 决定将多少当前输入存入长期记忆 - Output Gate: 基于短期记忆和输入决定输出的百分比，乘上长期记忆激活后的值，获得新的短期记忆，也就是输出。

这里gate的概念启发了grConv[[On the Properties of Neural Machine Translation= Encoder–Decoder Approaches]]

Posted 2024-09-27Updated 2025-08-15Review3 minutes read (About 376 words)

On the Properties of Neural Machine Translation= Encoder–Decoder Approaches

概要

对比了 RNN Encoder-Decoder 和 GRU(new proposed)之间的翻译能力，发现GRU更具优势且能够理解语法。

背景

RNN Encoder–Decoder

因为会把要翻译的语句映射到固定长度的vector所以训练需要的内存空间是固定的且很小，500M和几十G形成对比。
但也有问题：

As this approach is relatively new, there has not been much work on analyzing the properties and behavior of these models. For instance: What are the properties of sentences on which this approach performs better? How does the choice of source/target vocabulary affect the performance? In which cases does the neural machine translation fail?

不够Fancy的地方：

随着源句长度的增加，神经机器翻译模型的性能迅速下降。
词汇量的大小对翻译效果有很大的影响。

Encoder For Variable-Length Sequences

RNN

递归神经网络(RNN)在变长序列x = ( x1 , x2, … , xT)上通过保持隐藏状态h随时间变化而工作

grConv

这是本文提出的用于替换RNN Encoder-Decoder 中的Encoder的一种新的神经网络，文中称为：gated recursive convolutional neural network (grConv)

如图a为Recursive convolutional NN (这是啥？) #question 图b为grConv grConv则是让隐藏层通过训练w参数可以从三个输入中挑选：

其中 $\omega_c+\omega_l+\omega_r=1$ 由此便获得了如图c,d所示的自主学习语法结构的能力。

非常直观的图 #paradigm

Posted 2024-09-27Updated 2025-08-15Reviewa minute read (About 112 words)

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Background: RNN

首先介绍了RNN通过hidden state来实现记忆力功能

但指出RNN的训练有梯度消失/爆炸的现象，且记忆会沿序列长度的增加而指数下降，缺乏长期记忆能力。解决梯度消失/爆炸目前有梯度裁剪和二阶梯度的方法，但成效并不显著

Gated RNN

[[On the Properties of Neural Machine Translation= Encoder–Decoder Approaches]]

Posted 2023-05-10Updated 2025-08-153 minutes read (About 408 words)

Human_motion_prediction

Materials

compilations

Dataset

H3.6m

Human3.6M

Parse

Paper

Depth camera

Doc

Human motion prediction using recurrent neural networks

Repository
Paper

Input

raw data : for each subjects(S1,S2 …) , each action(walking, waiting, smoking …), each sub sequence(1/2):
$(n) \times 99$ (np.ndarray, float32)

From `data_utils.load_data()` used by `translate.read_all_data()`

train data: the composed dictionary ((suject_id, action, subaction_id, ‘even’) as key) of raw data (just even rows), with one hot encoding columns for action type, if action is specified (normal case), just append an all 1 column to rawdata. Size of each dictionary value:
$(n/2) \times (99 + actions;count)$

complete data: all data joint together, from different subjects, actions, sub sequences:
$(n) \times 99$

From `translate.read_all_data()` used by `translate.train()`

train set : normalized train data, throw out data with $std < 1e-4$ (accroding to complete data). Size of each dictionary value:
$(n/2) \times ((99-used;dimension;count) + actions;count)$

Human Dimension

After the analyzztion of the complete data, human dimension has been fixed to $54$.

From `Seq2SeqModel.get_batch()` used by `translate.train()`

total_seq: $60$ ($[0,59]$)

source_seq_len: $50$
target_seq_len: $10$

batch_size: $16$

encoder_inputs: $16\times 49\times (54+actions;count)$
Interpretation: [batch,frame,dimension]
frame range: $[0,48]$

decoder_inputs: $16\times 10\times (54+actions;count)$
frame range: $[49,58]$

decoder_outputs: $16\times 10\times (54+actions;count)$
frame range: $[50,59]$

Model prediction input

encoder_inputs: Tensor form of encoder_inputs from Seq2SeqModel.get_batch()

1	torch.from_numpy(encoder_inputs).float()

decoder_inputs: Tensor form of decoder_inputs from Seq2SeqModel.get_batch()

Example

For detailed usage, please see [Adopted] human-motion-prediction-pytorch\src\predict.ipynb

Reminder

The kinect camera’s output is not guaranteed to be consistent with the input of this model (some features are cut off), so further research is needed.

Pykinect

Run pyKinectAzure\examples\exampleBodyTrackingTransformationComparison to get the camera output record in pyKinectAzure\saved_data, saved as .npy

背景

概要