To solve the difficulty of weights optimization due to vanishing gradient problem, the paper proposes an autoencoder based framework for structural damage identification.

1. vanishing gradient issue. Namely vanishing gradient problem. The problem is that, in some cases, when using gradient-based methods on deep learning training, gradients will be vanishing small in weights updating process.

SCG: Scaled Conjugate Gradient. 量化共轭梯度法 SCG is a fast and efficient training. SCG is based on a class of optimization algorithms called Conjugate Gradient Methods (CGM). SCG is not applicable for all data sets. When it is used within its applicability, it is quite efficient. Like RPROP, SCG is at an advantage as there are no parameters that must be set.

2. There are two components in the proposed framework, dimensionality reduction and relationship learning.

Dimensionality reduction
Relationship learning: to learn relationship between reduced inputs (the reduced inputs are the outputs of dimensionality reduction section) and stiffness parameters

As for the autoencoder, input and output are:
Input: Vibration characteristics - frequencies and mode shapes
Output: Structural damage - stiffness reduction

3. fine-tuning 微调
一般用于对已有模型进行修改，固定前几层的参数，只修改最后一层。

The training of autoencoders is usually performed in two stages: pre-training and fine-tuning. The pre-training is usually perfomed layer by layer and multiple simple autoencoders are used to initialize the layer weights that are close enough to a good solution. The fine-tuning is performed to optimize the multiple layers of the whole network together with respect to the final objective function.

4. FRF
Frequency Response Function 频响函数
这个一直有时候能搞懂有时候搞不懂，明天单独写一篇频响函数、傅里叶变换和自相关函数之间的联系的文章。

5. PCA
Principal Components Analysis 主成分分析 原理是通过变更坐标轴，利用投影来对数据进行降维。详见《深度学习》P30。

6. Support Vector Machine 支持向量机，留坑。

7. How to do the pre-training?

8. How to do the fine-tuning?

9. Traditional autoencoder Both of encoder and decoder are single hidden layer.

10. Encoder
$f(\overline x)$ can transform $d$-dimensional input vector $\overline x \in R^d$ to $r$-dimensional vector $\overline h \in R^r$

11. $\overline { h } = f ( \overline x ) = \Phi ( W\overline x +\overline b )$
Is encoder just an activation function?

12. Decoder Transforming $\overline h$ back into $d$-dimensional vector.

13. How is the reduction done?

14. Autoencoder和普通的Deep Learning一样么？都是用部分数据训练，剩下的数据进行验证？

15. Demension reduction
(传统的Autoencoder只有一层)
1st layer: perform feature fushion of both the frequencies and mode shapes.
2nd - $k$th layers: compress features

16. Cost function
Note: both of Dimension Reduction and Relationship Learning have their own cost function.

17. Relationship learning
Output: predicted stuctural stiffness reduction parameters.

18. Fig. 2只是encoder的部分，没含decoder

19. layer-wise 逐层

20. Numerical study

• regularization techniques
• How to avoid model from overfitting?
• first order algorithm 一阶算法，指用梯度(即一阶导数)来寻优
• MSE
• R-values
21. Experimental verification

• Traing data generation
Using updated finite element model to generate input and output data for training.
22. Why the paper never mention decoder in main body?