Monte Carlo Simulations in R

Monte Carlo Simulations

What is Monte Carlo Simulations?

One of the main motivations to switch from spreadsheet-type tools (such
as Microsoft Excel) to a program like R is for simulation modeling. R
allows us to repeat the same (potentially complex and detailed)
calculations with different random values over and over again.

Within the same software, we can then summarize and plot the results of
these replicated calculations. Monte Carlo methods are used to perform
this type of analysis they randomly sample from a set of values in order
to generate and summarize a distribution of some statistic related to
the sampled quantities.

Randomness

Random processes are an important aspect of simulation modeling. A
random process is one that produces a different result each time it is
run according to some rules. They are inextricably tied to the concept
of uncertainty, you have no idea what will happen the next time the
process is run.

There are two basic ways to introduce randomness in R

Random deviattes

Resampling

Random Deviates

Each person alive at the start of the year has the option of living or
dying at the conclusion of the year. There are two possible endings
here, and each person has an 80% probability of surviving. survive is
the outcome of a binomial random process in which there were n
individuals alive at the start of this year and p is the probability
that any one of them would live to the next year.

In R, we can simulate a binomial random process with p=0.8 and n=100.

rbinom(n= 1, size =100,
       prob= 0.8)
## [1] 80

At this time I got 73, but we almost certainly get different number than
this one.

With a little tinkering, we can also plot it

survivors = rbinom(1000,
                   100, 0.8)
hist(survivors,
  col = "skyblue")

We could also used other processes like log normal

The log normal process is another random process. It creates random
numbers using a log of the values that is regularly distributed, with a
mean of log mean and a standard deviation of log sd.

hist(rlnorm(1000,0,0.1),col="skyblue")

Need for sampling

There are several situations in probability, and more broadly in machine
learning, where an analytical solution cannot be calculated immediately.
In Machine Learning, a problem of class imbalance exists. In fact, some
would argue that for most practical probabilistic models, accurate
inference is impossible.

The desired calculation is usually a sum of discrete distributions or an
integral of continuous distributions, and thus is computationally
difficult. For a variety of reasons, such as the huge number of random
variables, the stochastic nature of the domain, noise in the data, a
shortage of observations, and more, the calculation may be intractable.

Resampling

Using random deviates to generate fresh random numbers is excellent, but
what if we already have a set of numbers to which we want to add
randomness? We can utilize resampling techniques to do this. To sample
size elements from the vector x in R, use the sample() function.

Resampling of 1 to 10

sample(x = 1:10, size =5)
## [1]  4  3 10  9  2

Sample with replacement

sample(x = c("a","b","c"), size = 10, replace = T)
##  [1] "a" "a" "a" "b" "a" "b" "a" "c" "b" "b"

Sample with set probalilities

sample(x = c("live","die"),size = 10, replace = T, prob = c(0.8,0.2))
##  [1] "live" "live" "die"  "die"  "live" "live" "live" "die"  "live" "live"

Reproducing Randomness

We may want to receive the same precise random integers each time we run
our script for reproducibility. To do so, we must first set the random
seed, which is the starting point of our computer’s random number
generator.

set.seed(1234)
rnorm(1)
## [1] -1.207066

Let’s try without random seed

rnorm(1)
## [1] 0.2774292

Each time we get different result.

Replication

To use Monte Carlo methods, we need to be able to replicate some random
process many times. There are two main ways this is commonly done,
either with replicate()or with for()loops.

The replicate() functions executes same expression many times and
returns the output from each execution. Say we have a vector x, which
represents 40 observations of an animal length(mm).

x = rnorm(30, 500,40)

We want to create the mean length sampling distribution “by hand.” We
can take a random sample, determine the mean, and then repeat the
process as many times as necessary.

Replication with “for” loop

A loop is a command in programming that repeats itself until it reaches
a specified point. R has a few types of loops, repeat(), while(), and
for(). for() loops are among the most common in simuation modeling.For
each value in a vector, a for() loop performs an operation for the
number of times you specify.

For loop syntax

for(var in seq){ expression(var) }

for( i in 1:5){
print(i^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
nt = 100
N= NULL
N[1] = 1000
for(t in 2:nt){
  N[t] = (N[t-1]*1.1*rlnorm(1,0,0.1))*(1-0.08)
}

Let’s plot it

plot(N, type= "l", pch = 15, xlab = "Year", ylab = "Abundance")

Summarization of simulation

After replicating a calculation many times, we will need to summarize
the results.

Simulating Based Learning

mu = 500
sig = 30
random = rnorm(100,mu,sig)
p = seq(0.01, 0.99, 0.01)
random_q = quantile(random,p)
normal_q = qnorm(p,mu,sig)
plot(normal_q~random_q)
abline(c(0,1))

q = seq(400,600,10)
random_cdf = ecdf(random)
random_p =random_cdf(q)
normal_p = pnorm(q,mu,sig)
plot(normal_p~q, type= "l", col = "blue")
points(random_p~q,col = "red")

Leave a Reply

Scroll to top
Subscribe to our Newsletter

Hello surfer, thank you for being here. We are as excited as you are to share what we know about data. Please subscribe to our newsletter for weekly data blogs and many more. If you’ve already done it, please close this popup.



No, thank you. I do not want.
100% secure.