背景
LSTM主要是用于解决递归网络中梯度指数级消失或者梯度爆炸的问题
https://www.youtube.com/watch?v=YCzL96nL7j0&t=267s
LSTM和RNN主要的区别就在于:LSTM有两条记忆链,一条短期记忆,一条长期记忆。
LSTM主要是用于解决递归网络中梯度指数级消失或者梯度爆炸的问题
https://www.youtube.com/watch?v=YCzL96nL7j0&t=267s
LSTM和RNN主要的区别就在于:LSTM有两条记忆链,一条短期记忆,一条长期记忆。
On the Properties of Neural Machine Translation= Encoder–Decoder Approaches
对比了 RNN Encoder-Decoder 和 GRU(new proposed)之间的翻译能力,发现GRU更具优势且能够理解语法。
因为会把要翻译的语句映射到固定长度的vector所以训练需要的内存空间是固定的且很小,500M和几十G形成对比。
但也有问题:
As this approach is relatively new, there has not been much work on analyzing the properties and behavior of these models. For instance: What are the properties of sentences on which this approach performs better? How does the choice of source/target vocabulary affect the performance? In which cases does the neural machine translation fail?
不够Fancy的地方:
递归神经网络(RNN)在变长序列x = ( x1 , x2, … , xT)上通过保持隐藏状态h随时间变化而工作
这是本文提出的用于替换RNN Encoder-Decoder 中的Encoder的一种新的神经网络,文中称为:gated recursive convolutional neural network (grConv)
如图a为Recursive convolutional NN (这是啥?) #question 图b为grConv grConv则是让隐藏层通过训练w参数可以从三个输入中挑选: 其中 $\omega_c+\omega_l+\omega_r=1$ 由此便获得了如图c,d所示的自主学习语法结构的能力。 非常直观的图 #paradigmraw data : for each subjects(S1,S2 …) , each action(walking, waiting, smoking …), each sub sequence(1/2):
$(n) \times 99$ (np.ndarray, float32)
data_utils.load_data()
used by translate.read_all_data()
train data: the composed dictionary ((suject_id, action, subaction_id, ‘even’) as key) of raw data (just even rows), with one hot encoding columns for action type, if action is specified (normal case), just append an all 1 column to rawdata. Size of each dictionary value:
$(n/2) \times (99 + actions;count)$
complete data: all data joint together, from different subjects, actions, sub sequences:
$(n) \times 99$
translate.read_all_data()
used by translate.train()
train set : normalized train data, throw out data with $std < 1e-4$ (accroding to complete data). Size of each dictionary value:
$(n/2) \times ((99-used;dimension;count) + actions;count)$
After the analyzztion of the complete data, human dimension has been fixed to $54$.
Seq2SeqModel.get_batch()
used by translate.train()
total_seq: $60$ ($[0,59]$)
source_seq_len: $50$
target_seq_len: $10$
batch_size: $16$
encoder_inputs: $16\times 49\times (54+actions;count)$
Interpretation: [batch,frame,dimension]
frame range: $[0,48]$
decoder_inputs: $16\times 10\times (54+actions;count)$
frame range: $[49,58]$
decoder_outputs: $16\times 10\times (54+actions;count)$
frame range: $[50,59]$
encoder_inputs: Tensor form of encoder_inputs from Seq2SeqModel.get_batch()
1 | torch.from_numpy(encoder_inputs).float() |
decoder_inputs: Tensor form of decoder_inputs from Seq2SeqModel.get_batch()
For detailed usage, please see [Adopted] human-motion-prediction-pytorch\src\predict.ipynb
The kinect camera’s output is not guaranteed to be consistent with the input of this model (some features are cut off), so further research is needed.
Run pyKinectAzure\examples\exampleBodyTrackingTransformationComparison
to get the camera output record in pyKinectAzure\saved_data
, saved as .npy