Concepts: Data Simulation

Data prediction consists of many advanced mathematical concepts; however, these functions are abstracted in an array of data analysis software. Whether using one of the many libraries of R or clicking through the GUI of RapidMiner a user does not have to appreciate the math behind a predictive modeler. Knowing the literal mathematical proofs might not be of any help to the business user but understanding the concepts at a practical level might.

The stochastic methods used in business involve 4-steps:
1. Specifiy the domain
2. Generate random inputs of a specified probability distribution to occupy said domain
3. Computate an ouput based on the selected inputs
4. Repeat this process MANY times aggreagting the outputs.

In most curriculum the default method of demonstrating stochastic simulation is by estimating pi (╥)  by brute strength computing. Let’s establish some things about the image above:

1. The area of the circle above is pi times r squared or  (╥) *r^2.

2. The area of the square is 2r*2r = 4r^2, that is to say you need two lengths r to equal one “a” (one side of the square) and since the area of a square is a^2  you need to multiple 2r times 2r.

3. ╥/4 equals the ratio of the two shapes.

The stochastic simulation would randomly generate millions of points within the domain of the square. The portion of these millions of points that falls within the circle should approximate ╥ when multiplied by 4, outputting something like 3.134 compared to true pi of 3.141.

With scarce data, the idea is to perpetually narrow down the range of possibilities for an event to occur as the steps are repeated and more data is populated we systematically increase the confidence of occurances within a particular range.

Below is a great link to Eve Astrid Andersson’s Java app that demonstrates the calculation of Pi Using the Monte Carlo Method: