Testing Distributions for Goodness of fit, Homogeneity, and Independence
published: Feb. 25, 2008, recorded: December 2007, views: 4126
Related content
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
In this talk, I will describe algorithms for several fundamental statistical inference tasks. The main focus of this research is the sample complexity of each task as a function of the domain size for the underlying discrete probability distributions. The algorithms are given access only to i.i.d. samples from the distributions and make inferences based on these samples. The inference tasks studied here are: (i) similarity to a fixed distribution (i.e., goodness-of-fit); (ii) similarity between two distributions (i.e., homogeneity); (iii) independence of joint distributions; and (iv) entropy estimation. For each of these tasks, an algorithm with sublinear sample complexity is presented (e.g., a goodness-of-fit test on a discrete domain of size $n$ is shown to require $O(\sqrt{n}polylog(n))$ samples). Accompanying lower bound arguments show that all but one of these algorithms achieve a near-optimal performance. Given some extra information on the distributions (such as the distribution is monotone or unimodal with respect to a fixed total order on the domain), the sample complexity of these tasks become polylogarithmic in the domain size.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !
Write your own review or comment: