Prior Belief, Probability, and a simple simulation

Somehow I can’t resist interesting problems. So here’s another one that led me down a simulation rabbit hole.


Given an urn (yeah right! always urns) with balls of several colours N_i such that \sum_{i} N_i  = 50. Now, Imagine that 20 balls are picked at random, and we discover that  all 20 balls are comprised only of three colors. What does that tell us about the number of colors of the different balls in the urn


Right off the bat, we can simply say that the urn most likely has balls of just three different colors. And that would be absolutely correct!

But here’s a wild idea, the urn could’ve been composed of 29 different colours. How?, you might ask. imagine we have 18 red balls, 1 blue ball, 1 pink ball and the other 30 balls share the remainder 27 different colors in any magnitude. Now, if we picked the 18 red balls, 1 blue ball, and 1 pink ball. VOILA! 20 balls, 3 colours!

The upper limit for the colors stops at the comfortable number of 33. How did I arrive at 33 colours as the limit. Pretty simple! Imagine we had 18 balls of say colour red, and the remainder of 32 balls had 32 distinct colours. We could still end up with our beautiful scenario of 20 balls and 3 colours – because 18 red, 1 blue, 1 black or 18 red, 1 pink, and 1 maroon still fits well in our hypothetical scenario.

Now let’s imagine 34 colours. We go down the same route of 17 red balls, and 33 other colours. In this scenario it would be physically impossible to select 20 balls with just three colors. since 17 + 1 + 1 would be a pitiful 19!

Now that we have used common sense to establish our upper limit on the number of colours at 33 and the lower limit at 3, is it still possible to have a more concise range? Obviously because the chances of picking 18 red, 1 blue, 1 black amidst 33 colours seems very very very unlikely.

    \begin{align*} p(18 \:colors\:and\:any\:2\:colors|X) = \dfrac{\mathlarger{\sum_{m_i}\prod_{i}\binom{N_i}{m_i} } }{\mathlarger{\binom{N}{m}}} \\    = \dfrac{\mathlarger{\binom{18}{18} \binom{1}{1} \binom{1}{1} \binom{32}{2} } }{\mathlarger{\binom{50}{20}}}  = 0.0000000000106 \end{align*}

Now, that is one pitiful probability. The probability that I’ll become a billioanire is even more than this. So it’s more obvious that there has to be a smaller bound that would make the selection of 3 colours in 20 draws seem more plausible. Right off the bat, it seems sensible to begin hammering out equations to start solving this problem. But the myriad of scenarios makes this quite intractable.


Say we had 7 colours. We could have 13 balls of colour 1, 1 ball of colour 2, 8 balls of colour 3 etc or perhaps we could have 7 balls of colour 1-6 and 8 balls of colour 7. The possibilities for just a 7 colour scheme look quite endless. And then remember that we have to consider 3,4,5,6,7…,33 colours.


Thankfully, programming and the beautiful art of simulation exist. So instead of amassing permuations the size of the gobi desert we’d just simulate random colour scheme variations with random selection over a very wide range of scenarios.

For the simulation, I have employed the humble service of C++. And the proceedure is quite simple.

  1. For each colour scheme, generate 200 different distributions to fit a 50 ball urn
  2. For each of these 200 distributions, randomly draw 20 balls, 20,000 times.
  3. Take note of the draws that result in 3 colours.

Here’s a snippet from the simulation result, and then the final result

4,4,4 colors : colors and draws => [ 1: 6 2: 5 3: 5 4: 4 ]

4,4,4 colors : colors and draws => [ 1: 6 2: 5 3: 4 4: 5 ]

4,4,4 colors : colors and draws => [ 1: 6 2: 7 3: 4 4: 3 ]

4,4,4 colors : colors and draws => [ 1: 4 2: 8 3: 3 4: 5 ]

The first digit is the number of colours (colour scheme), the next digit is the number of colours from the 20 draws, and the last string is the composition of the draw. e.g in the first line 6 balls of colour 1 was drawn, followed by 5 balls of colour 2 and then 5 balls of colour 3 and finally 4 balls of colour 4.

The results are written to a csv file, and then analyzed with python which gives the final figures at

3  colours =>    200115
4  colours =>    2711
5  colours =>      73
6  colours =>       4

beyond 6 colours, the simulation failed to produce a 3 colours in 20 draws scenario. So with a bit of confidence we can say

    \begin{align*}  \mathlarger{3 \leq k \leq 6} \end{align*}

Seems very easy right! and seems very much in line with common sense.

Complete code – https://github.com/igbokwedaniel/urn

#include<utility>
#include<map>
#include<string>
#include<fstream>
#include<algorithm>
#include<vector>
#include "rand.hpp"


#ifndef _URN_UTIL
#define _URN_UTIL

static const std::size_t URNSIZE = 50;
typedef std::pair<int, std::map<int, int>> DRAWN_URN;
typedef std::pair<int, std::array<int, URNSIZE>> URN;


/**
 * Utility Functions
 * 1. Functions for simulating random draws from the box
 * 2. Getting the basket for all different scenarios accross all color combinations
 * 3. Getting the basket for different scenarios for just one color
 * 4. getting single scenario for one one coloe
 * Fn 2-4 defined to prevent complicated nested loops
 * 5.Output stream for urn
 * 6. Simulate the draws fom the urn 
**/


std::ostream& operator<<(std::ostream &strm, const URN &urn);

std::ostream& operator<<(std::ostream &strm, const DRAWN_URN &drawn_urn);


class urn_sampler{
  
public:
  void get_all_draws(int per_cases, int min_color, int max_color);
  urn_sampler();
  
  
private:
  
  std::string file_name{"test_result.csv"};
  
  std::ofstream test_file_strm;

  int draw_num{20};
  
  rand_gen gen;
  
  void random_draws(URN &urn, int draws);

  void get_all_cases(int per_cases, int min_color, int max_color);

  void get_single_cases(int num_cases, int colors);

  void get_one_cases(int colors);

  void sample_many_draws(URN &urn, int draws);


  
};

#endif
#include "util.hpp"
#include "rand.hpp"
#include <stdexcept>
#include<iostream>


urn_sampler::urn_sampler()
{
    gen = rand_gen{};
    test_file_strm = std::ofstream{file_name};
    
}

void urn_sampler::random_draws(URN &urn, int draws)
{
    if (draws > URNSIZE)
    {
        throw std::invalid_argument("Nunber of Draws Greater than Urn size!");
    }

    std::map<int, int> draw_bag{};
    std::map<int, bool> drawn{};


    while(draws > 0)
    {
        
        int pick{gen.get_next(0,URNSIZE-1)};
        if(drawn[pick])
            continue;

        drawn[pick] = true;
        ++draw_bag[urn.second[pick]];
        --draws;

    }
    
    DRAWN_URN drawn_urn{urn.first, draw_bag};
    
    test_file_strm << urn.first << "," << draw_bag.size() << ","  << drawn_urn << std::endl;
}


void urn_sampler::sample_many_draws(URN &urn, int draws){
    for(int i  = 0; i <= 1000; i++)
        random_draws(urn,draws);
}

void urn_sampler::get_one_cases(int colors)
{
    std::array<int, URNSIZE> urn{};

    for(auto it = urn.begin(); it != urn.end(); it++)
        *it = gen.get_next(1,colors);
    URN urn_container  = URN{colors,urn};

    sample_many_draws(urn_container, draw_num);
}

void urn_sampler::get_single_cases(int num_cases, int colors)
{
    for(int i  = 0; i <num_cases ; i++)
        get_one_cases(colors);
}

void urn_sampler::get_all_cases(int per_cases, int min_color, int max_color)
{


     for(int i  = min_color; i <= max_color ; i++)
     {
         get_single_cases(per_cases,i);
     }

 
        
}

void urn_sampler::get_all_draws(int per_cases, int min_color, int max_color)
{
    get_all_cases(per_cases, min_color, max_color);
    test_file_strm.close();
      
}

std::ostream& operator<<(std::ostream &strm, const URN &urn) {
    std::string res{""};
    res +=  std::to_string(urn.first) + " colors => [ ";
    for(auto& ball : urn.second)
        res += std::to_string(ball) + " ";
    res += "]";
    return strm << res << std::endl ;

}

std::ostream& operator<<(std::ostream &strm, const DRAWN_URN &drawn_urn)
{
     std::string res{""};
    res +=  std::to_string(drawn_urn.first) + " colors : colors and draws => [ ";
    for(auto it = drawn_urn.second.begin(); it != drawn_urn.second.end(); it++)
    {
        res += std::to_string(it->first) + ": " + std::to_string(it->second) + " ";
    }

    res += "]";
    return strm << res << std::endl ;


}
#include<random>
#include "rand.hpp"

rand_gen::rand_gen(){
   
    std::default_random_engine rd;
    std::random_device random_device;

    random_num_engine   = std::mt19937(random_device());
}

int rand_gen::get_next(int min_val, int max_val){
    auto distribution = std::uniform_int_distribution<std::mt19937::result_type>(min_val, max_val);
    return distribution(random_num_engine);
}
#include<cmath>
#include<random>


#ifndef _RANDOM_GENERATOR
#define _RANDOM_GENERATOR

/**
 * Random Numbr generator base class with default
 * Mersene Twister Engine and Uniform Distribution  function
 **/
class rand_gen
{
    public:
 
        rand_gen();
        int get_next(int min_val, int max_val); 
    
    protected:
        std::mt19937 random_num_engine;
        
};

#endif
#include <iostream>
#include <fstream>
#include "rand.hpp"
#include "util.hpp"

void test();

int main(int argc, char *argv[])
{
	
	test();
	return 0;
}

void test(){
	urn_sampler sampler{};
	sampler.get_all_draws(200,3,33);

}

One thought on “Prior Belief, Probability, and a simple simulation

Leave a Reply

Your email address will not be published. Required fields are marked *