Statistical Interference of Properties of Distribution: Theory, Algorithms, and Applications - USC Viterbi | Ming Hsieh Department of Electrical and Computer Engineering

Jiantao Jiao

Stanford University

Abstract
Bio

Modern data science applications—ranging from graphical model learning to image registration to inference of gene regulatory networks—frequently involve pipelines of exploratory analysis requiring accurate inference of a property of the distribution governing the data rather than the distribution itself. Notable examples of properties include Shannon entropy, mutual information, Kullback-Leibler divergence, and total variation distance, among others.

This talk will focus on recent progress in the performance, structure, and deployment of near-minimax-optimal estimators for a large variety of properties in high-dimensional and nonparametric settings. We present general methods for constructing information theoretically near-optimal estimators, and identify the corresponding limits in terms of the parameter dimension, the mixing rate (for processes with memory), and smoothness of the underlying density (in the nonparametric setting). We employ our schemes on the Google 1 Billion Word Dataset to estimate the fundamental limit of perplexity in language modeling, and to improve graphical model and classification tree learning. The estimators are efficiently computable and exhibit a "sample size boosting” phenomenon, i.e., they attain with n samples what prior methods would have needed n log(n) samples to achieve.

Jiantao Jiao is a Ph.D. student in the Department of Electrical Engineering at Stanford University. He received the B.Eng. degree in Electronic Engineering from Tsinghua University, Beijing, China in 2012, and the M.Eng. degree in Electrical Engineering from Stanford University in 2014. He is a recipient of the Presidential Award of Tsinghua University and the Stanford Graduate Fellowship. He was a semi-plenary speaker at ISIT 2015 and a co-recipient of the ISITA 2016 Student Paper Award. He co-designed and co-taught the graduate course EE378A (Statistical Signal Processing) at Stanford University in 2016 and 2017, with his advisor Tsachy Weissman. His research interests are in statistical machine learning, high-dimensional and nonparametric statistics, information theory, and their applications in medical imaging, genomics, and natural language processing. He is a co-founder of Qingfan, an online platform that democratizes technical training and job opportunities for anyone with access to the internet.

Published on March 28th, 2018Last updated on April 13th, 2025