(aside image)

In the following, we describe briefly deep learning. Our results are explained in more detail here.

Deep learning is an area of machine learning research that concentrates on finding hierarchical representations of data, starting from observations towards more and more abstract representations. We presented some early work in that direction in (Valpola et al., 2001, Raiko et al., 2007) using Variational Bayesian learning in directed graphical models (see figure).

Nowadays, a typical building block of deep networks is restricted Boltzmann machine (RBM). It is based on undirected connections between the visible and the hidden layer each consisting of a binary vector. Learning of such models has been rather cumbersome, but we have proposed several improvements to the learning algorithm in (Cho et al. 2010, Cho et al. 2011a) that make the algorithm stable and robust against learning parameters. Gaussian-Bernoulli restricted Boltzmann machine (GRBM) is a version of RBM for continuous-valued data. We have improved its learning algorithm in (Cho et al. 2011b). We have published software packages implementing our new algorithms. The improvements are also applicable to deep models (Cho et al. 2011c).

It is also possible to use multilayer perceptron (MLP) networks as auto-encoders for finding hierarchical representations of data. In an auto-encoder MLP network, the input and output vectors are the same, and the network includes a middle bottleneck layer with lower dimensionality. Learning of such models has been difficult or impossible, but in (Raiko et al. 2012) we present transformations that make the optimization problem much easier.

There are also two master's theses written on the topic: