Modern data science applications—ranging from graphical model learning to image registration to inference of gene regulatory networks—frequently involve pipelines of exploratory analysis requiring accurate inference of a property of the distribution governing the data rather than the distribution itself. Notable examples of properties include Shannon entropy, mutual information, Kullback-Leibler divergence, and total variation distance, among others.
This talk will focus on recent progress in the performance, structure, and deployment of near-minimax-optimal estimators for a large variety of properties in high-dimensional and nonparametric settings. We present general methods for constructing information theoretically near-optimal estimators, and identify the corresponding limits in terms of the parameter dimension, the mixing rate (for processes with memory), and smoothness of the underlying density (in the nonparametric setting). We employ our schemes on the Google 1 Billion Word Dataset to estimate the fundamental limit of perplexity in language modeling, and to improve graphical model and classification tree learning. The estimators are efficiently computable and exhibit a "sample size boosting” phenomenon, i.e., they attain with n samples what prior methods would have needed n log(n) samples to achieve.
Published on March 28th, 2018
Last updated on March 22nd, 2018