What “Dictionary Learning” actually is?

Published in

Analytics Vidhya

4 min readJan 14, 2022

Also, known as sparse dictionary learning, sparse encoding. However, the concept of dictionary learning of one of the misunderstood concepts in the ML world despite the frequent presence in research papers.

What is dictionary learning ?

First of all, this is not related to the dictionary data structure whatsoever. The idea of ‘dictionary’ comes from the building bases for a given vector space in linear algebra. The matrix of bases is also called the dictionary matrix, and the building algorithm that is intended to generate a dictionary matrix gives the name of “dictionary learning”.

Dictionary learning is a sparse encoding schema. Where encoding schema is lossless from an analytical implementation perspective. And it is guaranteed global optimum encoding from a numerical implementation perspective because the analytical re-formulation is a convex optimization problem.

Where does dictionary learning use?

Since the concept of dictionary learning is a well-defined analytical solution for vector space encoding, the concept of dictionary learning is used from purely mathematical derivation to neural network training that can analyze EEG brain imageries. Fields of signal processing, remote sensing, data compression, image processing, video processing, neural network training are popular areas that use dictionary learning widely.

The theoretical background of dictionary learning

If you are familiar with concepts of bases, spanning sets, and ranking in linear algebra it would be helpful to get a deeper understanding of the concept we are going to discuss. If not familiar or out of touch view this video before going on.

Simply put dictionary learning is a numerical solution for calculating the `bases` on a given set of vectors. So, Why we cannot use the analytical solution? that’s simply because it is NP-complete problem from the computational complexity perspective. Hence, there is no guarantee we could have the solution, especially with a large matrix.

Dictionary ( bases matrix ) consists of atoms ( bases ), atoms do not need to be orthogonal explicitly and maybe an over-complete spanning set ( violating the linearly independent property of bases )

Dictionary learning trains under constraint optimization problem where the constrains is the increase the sparsity of coefficient above given threshold and optimization done for minimizing the difference of multiplication of learned matrix ‘D’ and its coefficient matrix ‘R’ from input data matrix ‘X’.

In the Example x_1 = [2, -1, 0, 1] where dimensionality ‘d’ being 4. And, bases ‘D’ of input vector space ‘X’ here I numerical calculated by hand and its coefficient matrix ‘R’. As illustrated the encoding R matrix have lower dimensionality than the X input vector space (4 < 3). So, in operation here use R encoding space instead of X.

Analytically, there have been built several methods to solve the above constraint optimization problem stated and each method has its’ pros and cons. For example, the Method of Optimal Direction (MOD) is efficient in low dimensional space, guarantee optimal solution when n > minimal sample size. However, it is computationally heavy on higher dimensional input vector spaces because it is calculating pseudo-inverse of R. Also, K-SVD is another method that is based on the concept of k-mean.

practical example of dictionary learning

For implementation purposes, here I used sk-learn in-built DictionaryLearning module. Also, sk-learn module support a lot more configurations on dictionary learning than here I implemented, Therefore I encourage you to follow this link to get more insight on the module.

The encoding vector space generated by our implementation is more sparse than the R matrix I illustrated above. Also, now you could glimpse, with the sparsity increases the encoding vectors be much simpler & precise.

Why does sparsity important at all? broadly speaking, the higher the sparsity of encoding, the higher the resulting dictionary resilience to the noise. Therefore, the algorithm is fed by this vector space could easily increase the accuracy while decreasing processing time. However, the “importance of sparsity in ML” is a topic to discuss on another day. Meanwhile, I encourage you to follow this article on sparse coding if you seek further knowledge.

why it is used in machine learning?

Simply, because Dictionary learning is being a sparse encoding algorithm . And in machine learning almost all the time the algorithms are backed by a representation learning system or data encoding system. However, Dictionary learning comes with some bonuses as well. if the `k` > minimal sample size and vectors weren’t affected by the noise it’s guaranteed that the resulting dictionary could give the optimal encoding to any point in the trained vector space, even for vectors that haven’t been seen in the training time. also, being encodings sparser and orthogonal makes it optimal to use in a NN or a general ML algorithm.

I hope you would have now a good understanding of basics of Dictionary learning. However, there is a lot to explore in dictionary learning from here, especially in the field of Deep learning. And I hope this knowledge would helpful on that journey!

Finally, Thank you for reading this article !!

What “Dictionary Learning” actually is?

Written by nipun deelaka