又是一篇关于变分的文章-回顾一下吧
这一片投了ICLR16 放在arxiv上面被我瞅到 所以赶紧弄下来学习学习
============================
首先上来就说我们的生成式模型包括图模型为了对一个序列进行建模,都是假设有一个更为基本的隐含单元在其中作用,所以我们为了这个目的就对这个因变量进行建模。
The core approach is to assume that latent variables z1 , . . . , zT ∈ Rn, which are correlated across t, underlie correlated observations x1 , . . . , xT ∈ Rm,所以我们通用的东西比如 linear dynamical system (LDS)and hidden Markov models都是为了这个隐变量进行建模的,
先看看标准的模型的样子
我们为了这个模型必须要刻画出一个参数theta
但是上面的z有无穷种类 所以为了一般的目的我们可以将z进行一个模型的表达 就是把z也是一个函数生成的
The point of this notation is to make clear that g φ (x,·)is a deterministic function: all randomness in q comes from the random variable epsilong. This allows us to approximate the gradient using the simple estimator,
有一种叫做高斯后验逼近(GAUSSIAN APPROXIMATE POSTERIOR)
so that z is distributed as multivariate normal with mean µ φ (x) and covariance Σ φ (x) =R φ (x)R φ (x)T
所以最上面那个式子就变成
但是其中n是x的维度 比如词向量大小 T是序列的长度
所以我们需要对这个进行进一步的处理
A simple workaround is to consider Σ with a special structure that reduces the effective number of parameters. Possible examples include using diagonal covariance (fully factorized, or “mean field” approximation)
所以mean-field approximation其实就是将那个高斯对角化了
============================================================================
============================================================================
这篇工作的重点就是在这个协方差矩阵的设计上面,MFA的不好之处就是忽略了序列各个元素之间的依赖关系,所以我们必须将这些以来关系考虑进去,但是考虑进去之后这个协方差矩阵又太大,所以作者设计的是让这个协方差矩阵还是好好的只是随着T线性的增长
To do so, we borrow from the toolkit of the standard Kalman filter. In an LDS model with Gaussian observations, the posterior is a multivariate Gaussian with a block tri-diagonal inverse covariance. This block-tridiagonal structure results from (and expresses) the conditional independence properties of the LDS prior.
这就是这篇文章的核心!!!!!!!!!!!!!!!
In short, for learning φ and θ we need never explicitly represent any part of Σ. For data analysis
and model comparison, however, it may be useful to compute the covariance cov(z t , z t+1 ). These
covariances correspond to the block-diagonal and first block off-diagonals of Σ, and may also be
computed efficiently
所以作者就是仅仅考虑两个相邻元素的影响
----------------------------------------------
就是这样 哈哈所以最后他没中~~~~~
回复列表: