Convex Optimization Over Probability SpacesBrief OverviewA large body of problems in information theory, estimation theory, finance and machine learning can be formulated as \[ \max_{P_X \in \mathcal{F}} \mathsf{G}\left( P_X \right) \, (*) \] where \(\mathsf{G}(\cdot)\) is some convex operator and \(\mathcal{F}\) is as set of feasible input distributions. Examples of such an optimization problem include finding capacity in information theory, finding least favorable distributions in statistics and finding best and worst case arrival times in queuing theory to name a few. Surprisingly, for a large class of such problems optimizing distributions turn out to be discrete. For example, in information theory, in his seminal paper (Smith’ 71), Smith has shown that an optimizing distribution for a Gaussian noise channel with an amplitude constraint an the input \(X\), is discrete with finitely many mass points. A similar results in context of least favorable distributions has been shown by Ghosh in (Ghosh ’64). Regrettably, the proof methods rely on the argument towards a contradiction, which is not constructive and does not lead to a characterization of the number of mass points in the capacityachieving input distribution. In fact, very little is known about the structure of the least favorable or capacity achieving distribution even in a simple setting of an amplitude constraint on the input \(X\). Our recent progress allows us to characterize many new properties of maximizers in \((*)\). (I) Our new methods can identify whether a maximizer is a discrete distribution or not. These methods are inherently different from the classical and popular approach of (Smith’ 71) for showing discreteness of optimizing distributions and are simpler. Instead, the tools for our methods are based on oscillation theorems for positive definite functions. (II) Once a maximizer is shown to be discrete our new methods can provide tight bounds on the number of mass points. To the best of our knowledge, this is the only approach available in the theory of optimizing nonlinear functionals over probability spaces that can provide bounds on the number of mass points of extremizing distributions. We have fruitfully applied these methods to study CAID in pointtopoint Gaussian and Poisson channels, optimal reconstruction distributions in ratedistortion theory, and LFPDs for the MMSE. Tutorials
Journals
Conference
