### Chaining Mutual Information

Using information-theoretic ideas, in this paper (for a quick-to-read summary, see this), we extend the probabilistic technique of * chaining * into the technique of * chaining mutual information * to obtain * multiscale * and * algorithm-dependent * generalization bounds in machine learning. Roughly speaking, this is achieved by replacing * metric entropy* of chaining with mutual information.

Chaining - originated from the work of Kolmogorov - is a powerful multiscale technique in high-dimensional probability used for bounding the expected suprema of random processes.
Though multiscale and elegant, due to its nature, the technique's applications to generalization theory of machine learning has been mostly limited to uniform bounding such as the VC-dimension bound which is known to be vacous for neural networks. Much more recently, advances in generalization theory
such as PAC-Bayes and mutual information bounds have given rise to algorithm-dependent bounds on generalization error based on notions and tools from information theory. Combining these probabilistic and information-theoretic ideas,
we show how to extend chaining into the technique of chaining mutual information to obtain generalization bounds which are both multiscale and algorithm-dependent. We show how the resultant bound can be much tighter than each of the chaining or mutual information bounds.