Chaining mutual information

Chaining Mutual Information

Using information-theoretic ideas, in this paper (for a quick-to-read summary, see this), we extend the probabilistic technique of chaining into the technique of chaining mutual information to obtain multiscale and algorithm-dependent generalization bounds in machine learning. Roughly speaking, this is achieved by replacing metric entropy of chaining with mutual information.
Chaining - originated from the work of Kolmogorov - is a powerful multiscale technique in high-dimensional probability used for bounding the expected suprema of random processes. Though multiscale and elegant, due to its nature, the technique's applications to generalization theory of machine learning has been mostly limited to uniform bounding such as the VC-dimension bound which is known to be vacous for neural networks. Much more recently, advances in generalization theory such as PAC-Bayes and mutual information bounds have given rise to algorithm-dependent bounds on generalization error based on notions and tools from information theory. Combining these probabilistic and information-theoretic ideas, we show how to extend chaining into the technique of chaining mutual information to obtain generalization bounds which are both multiscale and algorithm-dependent. We show how the resultant bound can be much tighter than each of the chaining or mutual information bounds.

Multiscale Gibbs distribution

Multiscale-Entropic Regularization of Neural Networks

By substituting the well-known entropic regularization of machine learning with multiscale-entropic regularization, in this paper, we extend the celebrated Gibbs-Boltzmann distribution into a multiscale setting and show how it naturally applies to the multilevel architecture of neural networks.
The key notion used in the powerful technique of chaining in high dimensional probability is successive refinement of the index set. In a seemingly different world, neural networks are machine learning models that operate based on the notion of succesive processing of the input data. Can one use the intrinsic power of chaining into devising training algorithms for neural networks along with performance guarantees? We provide a solution to this problem by using a fundamental tool of information theory - the entropic chain rule - and show how this leads to generalizing the well-known Gibbs-Boltzmann distribution to a multiscale setting. This approach towards training neural networks is based on the entropic chain rule rather than the chain rule of derivatives (as in backpropogation).

Maximum multiscale entropy

Maximum Multiscale Entropy

In this paper, we extend the widely known maximum entropy problem into maximum multiscale entropy by incorporating the notion of scale. In particular, we show that for various notions of entropy and scale, given a mean constraint, the distribution that maximizes entropies at multiple scales simultaneously is precisely obtained with a procedure with an anology to the renormalization group of theoretical physics and a Bayesian expansion thereof. We further show applications for neural networks.
A well-known result across information theory, machine learning, and statistical physics shows that the maximum entropy distribution under a mean constraint has an exponential form called the Gibbs-Boltzmann distribution. This fact, which dates back to the work of Jaynes in 1957, has many applications such as in statistical inference and density estimation. We investigate a generalization of this result into a multiscale setting. One of the most conspicous properties of our world is the great variety of length scales within its structure. The data that one obtains from nature and its accuracy depends on the scale through which one observes the world. We blend the notions of entropy (uncertainty) and scale by defining multiscale entropies as the linear combination of entropies of a system at different length scales. Then, for different entropies and arbitrary scale transformations, we show that the distribution which maximizes multiscale entropies under a mean constraint is characterized by a procedure with analogy to the renormalization group procedure of statistical physics. Renormalization group is a powerful multiscale apparatus used in the study of problems such as critical phenomena in statistical physics as well as problems in probability theory such as universality. These results extend our previous results on the multiscale Gibbs distribution. For the application of neural networks we further discuss excess risk bounds for that distribution.

CH-divergence and Chernoff information

Community Detection and Compression

In this paper, we investigate the fundamental limits for compressing data on graphs while exploiting dependencies due to community structures in the graph. This is derived in part by obtaining the threshold for exact recovery in stochastic block models with strong side-information, a result of independent interest, which extends the CH-divergence threshold using Chernoff information.
We show that the CH-divergence, which has been shown to be fundamental in characterizing the exact recovery threshold of stochastic block models, is equivalent to Chernoff information between multivariate Poisson distributions. We then show how the exact recovery threshold with side-information can be expressed with Chernoff information.


  • Asadi, A. R. & Abbe, E. (2021). A Self-similarity Approach to Neural Network Learning. (In Preparation)
  • Asadi, A. R. , Abbe, E. & Verdú, S. (2021). Information-Theoretic Chaining Techniques. (In Preparation).
  • Asadi, A. R. & Abbe, E. (2020). Maximum Multiscale Entropy and Neural Network Regularization. arXiv preprint arXiv:2006.14614. [Link]
  • Asadi, A. R. & Abbe, E. (2020). Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Networks. Journal of Machine Learning Research , 21(139), 1-32. [Link]
  • Asadi, A. R. , Abbe, E., & Verdú, S. (2018). Chaining Mutual Information and Tightening Generalization Bounds. Advances in Neural Information Processing Systems (NeurIPS) , pp. 7245-7254 [Link]
  • Asadi, A. R. , Abbe, E., & Verdú, S. (2017). Compressing data on graphs with clusters. IEEE International Symposium on Information Theory (ISIT) (pp. 1583-1587) [Link]
  • Asadi, M. Asadi, A. R. (2014) On the Failure Probability of Used Coherent Systems. Communications in Statistics, Theory and Methods , Vol. 43, pp. 2468-2475. [Link]
  • Asadi, A. R. (2013), Problem 96.J with solution. The Mathematical Gazette , Vol. 97, No. 539, pp. 345-346, United Kingdom. [Link]


Sharif University of Technology logo

B.Sc. in Electrical Engineering (Communications)

B.Sc. in Mathematics

September 2010 - August 2015

Princeton University logo

M.A. in Electrical Engineering

September 2015 - July 2017

Ph.D. in Electrical and Computer Engineering

August 2017 - Present


  • Department of Electrical Engineering Teaching Assistant Award, Princeton University (2019)
  • Anthony Ephremides Fellowship in Electrical Engineering, Princeton University (2016)
  • Iranian Mathematical Olympiad Bronze Medal (2009)
  • Winner of the Tournament of Towns: International mathematical contest certified by the Russian Academy of Sciences (2009)
  • Membership of the Iranian National Elite Foundation (2009-present)


  • Department of Computer Science, ETH Zürich, Switzerland, Feb. 2021
  • NSF-Simons Collaboration on the Theoretical Foundations of Deep Learning, Dec. 2020
  • Department of EECS, Massachusetts Institute of Technology, Dec. 2020
  • Center for Data Science, New York University, June 2020
  • Laboratoire de Physique, École Normale Supérieure, Paris, May 2020
  • Department of Statistical Sciences, University of Toronto, Canada, Apr. 2020
  • Department of Engineering, University of Cambridge, UK, Mar. 2020
  • Institute for Advanced Study, Princeton, New Jersey, Oct. 2019 [YouTube Link]
  • Microsoft Research AI, Redmond, Washington, Sep. 2019

Teaching Assistantships

Course Name Semester Code Instructor
Transmission and Compression of Information Spring 2017-2018 ELE\APC 486 Prof. Emmanuel Abbe
Probability in High Dimension Fall 2018-2019 ORF\APC 550 Prof. Ramon van Handel


I like exercising and watching sports, especially football (soccer), Formula 1 and tennis. I play table tennis quite well. I enjoy reading Persian literature such as poetry by Khayyam, Hafez, and Saadi (whom the father of thermodynamics, Sadi Carnot, is named after).