Department of Electrical and Computer Engineering, Princeton University
My name is Amir Reza Asadi, and I am a final year Ph.D. candidate in the Department of Electrical and Computer Engineering at Princeton University. I am fortunate to be advised by Emmanuel Abbe. Prior to Princeton, I received B.Sc. degrees in Electrical Engineering and in Mathematics from Sharif
University of Technology, both in 2015. My research is on the interface of machine learning, information theory and high-dimensional probability.
Starting fall 2021, I will join the Statistical Laboratory in the Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge as a Research Associate.
Using information-theoretic ideas, in this paper (for a quick-to-read summary, see this), we extend the probabilistic technique of chaining into the technique of chaining mutual information to obtain multiscale and algorithm-dependent generalization bounds in machine learning. Roughly speaking, this is achieved by replacing metric entropy of chaining with mutual information.
Chaining - originated from the work of Kolmogorov - is a powerful multiscale technique in high-dimensional probability used for bounding the expected suprema of random processes.
Though multiscale and elegant, due to its nature, the technique's applications to generalization theory of machine learning has been mostly limited to uniform bounding such as the VC-dimension bound which is known to be vacous for neural networks. Much more recently, advances in generalization theory
such as PAC-Bayes and mutual information bounds have given rise to algorithm-dependent bounds on generalization error based on notions and tools from information theory. Combining these probabilistic and information-theoretic ideas,
we show how to extend chaining into the technique of chaining mutual information to obtain generalization bounds which are both multiscale and algorithm-dependent. We show how the resultant bound can be much tighter than each of the chaining or mutual information bounds.
Multiscale-Entropic Regularization of Neural Networks
By substituting the well-known entropic regularization of machine learning with multiscale-entropic regularization, in this paper, we extend the celebrated Gibbs-Boltzmann distribution into a multiscale setting
and show how it naturally applies to the multilevel architecture of neural networks.
The key notion used in the powerful technique of chaining in high dimensional probability is successive refinement of the index set. In a seemingly different world, neural networks are machine learning models that operate
based on the notion of succesive processing of the input data. Can one use the intrinsic power of chaining into devising training algorithms for neural networks along with performance guarantees? We provide a solution to this
problem by using a fundamental tool of information theory - the entropic chain rule - and show how this leads to generalizing the well-known Gibbs-Boltzmann distribution to a multiscale setting. This approach towards training neural networks is based on the entropic chain rule rather than the chain rule of derivatives (as in backpropogation).
Maximum Multiscale Entropy
In this paper, we extend the widely known maximum entropy problem into maximum multiscale entropy by incorporating the notion of scale. In particular, we show that for various notions of entropy and scale, given a mean constraint, the distribution that maximizes entropies at multiple scales simultaneously is precisely obtained with a procedure with an anology
to the renormalization group of theoretical physics and a Bayesian expansion thereof. We further show applications for neural networks.
A well-known result across information theory, machine learning, and statistical physics shows that the maximum entropy distribution under a mean constraint has an exponential form called the Gibbs-Boltzmann distribution.
This fact, which dates back to the work of Jaynes in 1957, has many applications such as in statistical inference and density estimation. We investigate a generalization of this result
into a multiscale setting. One of the most conspicous properties of our world is the great variety of length scales within its structure. The data that one obtains from nature and its accuracy depends on the scale through which one observes the world. We blend the notions of entropy (uncertainty) and scale by defining multiscale entropies as the linear combination of entropies of a system at different length scales. Then, for different entropies and arbitrary scale transformations, we show that the distribution which maximizes multiscale entropies under a mean constraint is characterized by a procedure with analogy to the renormalization group procedure of statistical physics.
Renormalization group is a powerful multiscale apparatus used in the study of problems such as critical phenomena in statistical physics as well as problems in probability theory such as universality. These results extend our previous results on the multiscale Gibbs distribution. For the application of neural networks we further discuss excess risk bounds for that distribution.
Community Detection and Compression
In this paper, we investigate the fundamental limits for compressing data on graphs while exploiting dependencies due to community structures in the graph.
This is derived in part by obtaining the threshold for exact recovery in stochastic block models with strong side-information, a result of independent interest, which extends the CH-divergence threshold using Chernoff information.
We show that the CH-divergence, which has been shown to be fundamental in characterizing the exact recovery threshold of stochastic block models, is equivalent to Chernoff information between multivariate Poisson distributions. We then show how the exact recovery
threshold with side-information can be expressed with Chernoff information.
Asadi, A. R. & Abbe, E. (2021). A Self-similarity Approach to Neural Network Learning. (In Preparation)
Asadi, A. R. , Abbe, E. & Verdú, S. (2021). Information-Theoretic Chaining Techniques. (In Preparation).
Asadi, A. R. & Abbe, E. (2020). Maximum Multiscale Entropy and Neural Network Regularization. arXiv preprint arXiv:2006.14614. [Link]
Asadi, A. R. & Abbe, E. (2020). Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Networks. Journal of Machine Learning Research , 21(139), 1-32. [Link]
Asadi, A. R. , Abbe, E., & Verdú, S. (2018). Chaining Mutual Information and Tightening Generalization Bounds. Advances in Neural Information Processing Systems (NeurIPS) , pp. 7245-7254 [Link]
Asadi, A. R. , Abbe, E., & Verdú, S. (2017). Compressing data on graphs with clusters. IEEE International Symposium on Information Theory (ISIT) (pp. 1583-1587) [Link]
Asadi, M. Asadi, A. R. (2014) On the Failure Probability of Used Coherent Systems. Communications in Statistics, Theory and Methods , Vol. 43, pp. 2468-2475. [Link]
Asadi, A. R. (2013), Problem 96.J with solution. The Mathematical Gazette , Vol. 97, No. 539, pp. 345-346, United Kingdom. [Link]
Department of Electrical Engineering Teaching Assistant Award, Princeton University (2019)
Anthony Ephremides Fellowship in Electrical Engineering, Princeton University (2016)
Iranian Mathematical Olympiad Bronze Medal (2009)
Winner of the Tournament of Towns: International mathematical contest certified by the Russian Academy of Sciences (2009)
Membership of the Iranian National Elite Foundation (2009-present)
Department of Computer Science, ETH Zürich, Switzerland, Feb. 2021
NSF-Simons Collaboration on the Theoretical Foundations of Deep Learning, Dec. 2020
Department of EECS, Massachusetts Institute of Technology, Dec. 2020
Center for Data Science, New York University, June 2020
Laboratoire de Physique, École Normale Supérieure, Paris, May 2020
Department of Statistical Sciences, University of Toronto, Canada, Apr. 2020
Department of Engineering, University of Cambridge, UK, Mar. 2020
Institute for Advanced Study, Princeton, New Jersey, Oct. 2019 [YouTube Link]
Microsoft Research AI, Redmond, Washington, Sep. 2019
Transmission and Compression of Information
Prof. Emmanuel Abbe
Probability in High Dimension
Prof. Ramon van Handel
I like exercising and watching sports, especially football (soccer), Formula 1 and tennis. I play table tennis quite well. I enjoy reading Persian literature such as poetry by Khayyam,
Hafez, and Saadi (whom the father of thermodynamics, Sadi Carnot, is named after).