Thomas Kailath and Guanghan Xu Professor Stanford University
Genome sequencing is one of the biggest breakthroughs in science in the past two decades. Modern sequencing methods use linking data at multiple scales to reconstruct pertinent information about the genome. Many such reconstruction problems can be formulated as maximum likelihood sequence decoding from noisy linking data. We discuss two in this talk: haplotype phasing, the problem of sequencing genomic variations on each of the maternal and paternal chromosomes, and genome sca olding, the problem of nishing genome assembly using long-range 3D contact data. While maximum likelihood sequence decoding is NP-hard in both of these problems, spectral and linear programming relaxations yield e cient approximation algorithms that can provably achieve the information theoretic limits and perform well on real data. These results parallel the biggest success of information theory: e ciently achieving the fundamental limits of communication.