Stanford Webinar: When Your Big Data Seems Too Small
Автор: Stanford Online
Загружено: 2017-03-17
Просмотров: 24920
A Stanford Webinar presented by: Stanford's Databases and the Foundations in Computer Science graduate certificate programs
"When Your Big Data Seems Too Small - Accurate Inferences Beyond the Empirical Distribution"
Speaker: Gregory Valiant, Stanford University
Many of the techniques and algorithms that are used in machine learning and data sciences assume that the empirical distribution of the available data is an accurate approximation of the primary phenomena being investigated. However, when dealing with complex or high dimensional distributions, even large datasets can fail to accurately represent its core. As examples, in large genomic datasets many rare genetic variants are unobserved, and in a large natural language corpus, many reasonable sequences of five words might not be observed.
Join Stanford’s Dr. Gregory Valiant as he discusses the difficulties of and solutions for making accurate inferences in this challenging regime, in which the empirical distribution of the available data is misleading. Learn how to extract accurate information about the underlying distribution, including information about the portion that has not been observed in the given dataset.
You will learn:
An intuitive approach for reasoning about the distribution that underlies a given dataset
Techniques that leverage this intuition, and reveal the structure of the underlying distribution---including the structure of the unseen portion of it from which no datapoints have been observed
Practical implications of these techniques for the analysis of genomic datasets, including how to estimate the value of sequencing additional human genomes
About the Speaker:
Gregory Valiant, PhD is an Assistant Professor in Stanford's Computer Science Department. Some of his recent projects focus on designing algorithms for accurately inferring information about complex distributions, when given surprisingly little data. More broadly, his research interests are in algorithms, learning, applied probability, and statistics, and evolution. Prior to joining Stanford, Dr. Valiant was a postdoc at Microsoft Research, New England, and received his PhD from Berkeley in Computer Science, and BA in Math from Harvard.
0:00 Introduction
0:11 Today's Speaker
6:33 Beyond the Empirical Distribution (Part 1)
14:16 Estimation Beyond the Empirical Distribution
16:29 R.A. Fisher's Butterflies
26:22 Reasoning Beyond the Empirical Distribution
29:11 Recovering "frequency spectrum"
30:17 Learning the distribution, up to relabeling
33:28 GWAS inferences, predictions from 60k genomes
33:54 GWAS inferences (validation)
35:45 Estimating Covariance Spectrum
36:40 Empirical Approach
40:05 Main Theorem (informal)
40:09 Summary and Final Thoughts We discussed three different settings. In all three
41:42 Q&A
47:29 Empirical Results
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: