Sampling With Replacement / Sampling Without Replacement

What do DevOps engineers really do?
September 17, 2017
6 of my favorite Jenkins plugins
September 22, 2017
Show all

Sampling With Replacement / Sampling Without Replacement

Sampling is a popular Spark RDD operation. Sampling With Replacement and sampling without replacement are different ways of doing sampling. This article explains the difference between them.

Sampling with replacement:

Consider a population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly one sack with each number. So the whole population has seven sacks. If I sample two with replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I replace it. Then I pick another. Every one of them still has 1/7 probability of being chosen. And there are exactly 49 different possibilities here (assuming we distinguish between the first and second.) They are: (12,12), (12,13), (12, 14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,13), (13,14), etc.

Sampling without replacement:

Consider the same population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly one sack with each number. So the whole population has seven sacks. If I sample two without replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I pick another. At this point, there are only six possibilities: 12, 13, 15, 16, 17, and 18. So there are only 42 different possibilities here (again assuming that we distinguish between the first and the second.) They are: (12,13), (12,14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,14), (13,15), etc.

What’s the Difference?

When we sample with replacement, the two sample values are independent. Practically, this means that what we get on the first one doesn’t affect what we get on the second. Mathematically, this means that the covariance between the two is zero.

In sampling without replacement, the two sample values aren’t independent. Practically, this means that what we got on the for the first one affects what we can get for the second one. Mathematically, this means that the covariance between the two isn’t zero. That complicates the computations. In particular, if we have a SRS (simple random sample) without replacement, from a population with variance , then the covariance of two of the different sample values is , where N is the population size. (A brief summary of some formulas is provided here. For a discussion of this in a textbook for a course at the level of M378K, see the chapter on Survey Sampling in Mathematical Statistics and Data Analysis by John A. Rice, published by Wadsworth & Brooks/Cole Publishers. There is an outline of an slick, simple, interesting, but indirect, proof in the problems at the end of the chapter.)

Population size — Leading to a discussion of “infinite” populations.

When we sample without replacement, and get a non-zero covariance, the covariance depends on the population size. If the population is very large, this covariance is very close to zero. In that case, sampling with replacement isn’t much different from sampling without replacement. In some discussions, people describe this difference as sampling from an infinite population (sampling with replacement) versus sampling from a finite population (without replacement).


Reprinted from https://www.ma.utexas.edu

 

James Lee
James Lee
James Lee is a passionate software wizard working at one of the top Silicon Valley-based startups specializing in big data analysis. In the past, he has worked on big companies such as Google and Amazon In his day job, he works with big data technologies such as Cassandra and ElasticSearch, and he is an absolute Docker technology geek and IntelliJ IDEA lover with strong focus on efficiency and simplicity.

6 Comments

  1. This site is absolutely fabulous!

  2. Keep up the great work guyz.

  3. I have been exploring for a bit for any high-quality articles or weblog posts in this kind of area . Exploring in Yahoo I ultimately stumbled upon this website. Reading this information So i am satisfied to exhibit that I’ve a very just right uncanny feeling I found out exactly what I needed. I such a lot unquestionably will make sure to don’t disregard this site and provides it a look regularly.

  4. Hi there! I know this is kinda off topic but I was wondering if you knew where I could locate a captcha plugin for my comment form? I’m using the same blog platform as yours and I’m having trouble finding one? Thanks a lot!

  5. I hope you all are having a great weekend. I have a new list for you. Read the latest update on how I compiled the list. I’m still surprised by the results.

  6. Thanks for sharing your thoughts on meta_keyword. Regards

Leave a Reply

Your email address will not be published.

LEARN HOW TO GET STARTED WITH DEVOPS

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!

Level Up Big Data Pdf Book

LEARN HOW TO GET STARTED WITH BIG DATA

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!

Jenkins Level Up

Get started with Jenkins!!!

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!