English

Random problems with R

Mathematical Software 2018-11-14 v3 Computation

Abstract

R (Version 3.5.1 patched) has an issue with its random sampling functionality. R generates random integers between 11 and mm by multiplying random floats by mm, taking the floor, and adding 11 to the result. Well-known quantization effects in this approach result in a non-uniform distribution on {1,,m}\{ 1, \ldots, m\}. The difference, which depends on mm, can be substantial. Because the sample function in R relies on generating random integers, random sampling in R is biased. There is an easy fix: construct random integers directly from random bits, rather than multiplying a random float by mm. That is the strategy taken in Python's numpy.random.randint() function, among others. Example source code in Python is available at https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py (see functions getrandbits() and randbelow_from_randbits()).

Keywords

Cite

@article{arxiv.1809.06520,
  title  = {Random problems with R},
  author = {Kellie Ottoboni and Philip B. Stark},
  journal= {arXiv preprint arXiv:1809.06520},
  year   = {2018}
}