Stochastic Diffusion
SEE PAGE: Brownian Motion for updated notes on stochastic diffusion
early notes on MSD calculations
In statistical mechanics, the mean squared displacement (MSD or average squared displacement) is the most common measure of the spatial extent of random motion; one can think of MSD as the amount of the system "explored" by a random walker.
More Background
The Probability Density Function (PDF) for a particle in one dimension is found by solving the onedimensional Diffusion equation. (This equation states that the position probability density diffuses out over time  this is the method used by Einstein to describe a Brownian particle.
As a particle travels, the molecule is jostled by collisions with other molecules which prevent it from following a straight line. If the path is examined in close detail, it will be seen to be a good approximation to a random walk. Mathematically, a random walk is a series of steps, one after another, where each step is taken in a completely random direction from the one before. This kind of path was famously analyzed by Albert Einstein in a study of Brownian motion and he showed that the mean square of the distance traveled by particle following a random walk is proportional to the time elapsed. In two dimensions this relationship can be written as:
r²=4•D•t
In 1D, since both forward and backward steps are equally probable, we come to the surprising conclusion that the probable distance travelled sums up to zero! This is clearly a useless property to calculate. If however, instead of adding the distance of each step we added the square of the distance, we realise that we will always be adding positive quantities to the total. In this case the sum will be some positive number, which grows larger with every step. This obviously gives a better idea about the distance (squared in this case) that a particle moves. If we assume each step happens at regular time intervals, we can easily see how the square distance grows with time, and Einstein showed that it grows linearly with time.
In a molecular system a molecule moves in three dimensions, but the same principle applies. Also, since we have many molecules to consider we can calculate a square displacement for all of them. The average square distance, taken over all molecules, gives us the mean square displacement. This is what makes the mean square displacement (or MSD for short) significant in science: through its relation to diffusion it is a measurable quantity, one which relates directly to the underlying motion of the molecules.
In molecular dynamics the MSD is easily calculated by adding the squares of the distance. The linear (i.e. straight line) dependence of the MSD plot is apparent. If the slope of this plot is taken, the diffusion coefficient D may be readily obtained.
At very short times however, the plot is not linear. This is because the the path a molecule takes will be an approximate straight line until it collides with its neighbour. Only when it starts the collision process will its path start to resemble a random walk. Until it makes that first collision, we may say it moves with approximately constant velocity, which means the distance it travels is proportional to time, and its MSD is therefore proportional to the time squared. Thus at very short time, the MSD resembles a parabola. This is of course a simplification  the collision between molecules is not like the collision between two pebbles, it is not instantaneous in space or time, but is `spread out' a little in both. This means that the behaviour of the MSD at short time is sometimes more complicated than this MSD plot shows.
Albert Einstein in a study of Brownian motion and he showed that the mean square of the distance traveled by particle following a random walk is proportional to the time elapsed. This relationship can be written as:
r²=2d•D•t
 r² MSD
 d dimensions
 D diffusion coefficient (diffusion rate)
 t time step
The next important equation is the standard deviation of the diffusion rate distribution. It would be more convenient to call this the step size distribution but the two values are not exactly proportional. That equation is given as:
k = sqrt(d * D * t)
 k σ or standard deviation of the diffusion rate distribution
 d dimensions
 D diffusion coefficent
 t time step
Knowing k we can then generate random steps from a normal distribution with a mean µ = 0 and sd σ = k pulling a value from this distribution for both the X and Y dimensions for each step. In Matlab this would look like:
xyds = k * randn(2,Ndots);
This will generate a random walk with a diffusion rate = D and MSD = 2d•D. We might also be interested in the average step size that is being generated from this distribution to create such a diffusion rate. This can be done by taking the mean absolute value of the generated steps:
mStep ≈ mean(mean(abs(xyds),2))
The analytical solution to this problem however is the mean µ of the half normal distribution, with a standard deviation σ = k :
λ = k * √2 / √π

mS = k*sqrt(2)/sqrt(pi)
However, this value cannot be used as the X and Y step size, if you're expecting to get back to D. What you need is the Pythagorean length of the XY combined step. Einstein solved this too:
λ = (4 * D)^.5;
mSt = sqrt((2*ld^2))/2;
I'm simulating Brownian motion. Each particle (dot) in the simulation take a random size step in a random direction. This is in 2D. Each iteration, a dot gets passed 2 values, a random step size for the X direction and Y direction. Voila! Brownian motion.
Now to characterize this motion. This wasn't an easy task, until this really smart dude came along and provided a few theorems. Albert Einstein showed that the mean square of the distance traveled by particle following a random walk is proportional to the time elapsed. This relationship can be written as (and hopefully the unicode looks like r^2 and not r2.. I won't put integers directly after variables):
 r² = 2d*D*t
 r² = mean squared displacement MSD (µm²)
 d = number of dimensions
 D = diffusion coefficient (µm²/s)
 t = time elapsed between each measurement (1 s)
Why is the mean squared displacement good to know? Well, creating Brownian motion is easy (like I mentioned above), but simulating Brownian motion based on reallife observations is not so easy. This is because we can't directly observe the step size (the distance traveled between particle collisions), but we can observe the MSD. Ultimately I'd like to estimate the average step size, but having a good estimate of the diffusion rate (D) would suffice for making a simulation based on reallife observations.
Say we're observing a particle through a microscope diffusing along a flat surface (d=2), and each second (t = 1) we write down its displacement (for both X and Y), squaring each value (**see footnote). When we're done, we can calculate the mean of these values and find that MSD=8 µm². We can now find D using the equation above.
 8 µm² = (2*2) * D µm²/s * 1 s
 8 µm²/s = 4 * D µm²/s
 2 µm²/s = D µm²/s
Sweet, we have the diffusion rate. We can now use this value to make a simulation of our empirical observations.... But how? The simulation needs random step sizes, not a diffusion rate. Einstein figured this out too. He determined that the standard deviation (σ=k) of the distribution of random steps is:
k = sqrt(d * D * t)
From our observations:
k = sqrt(2*2*1) = 2
Now all we have to do is pull random values from a normal distribution with µ=0 σ=2 for each X and Y step size. If I do that and run a simulation  lets do 100 steps for 100 particles. boom:
Sweet, it worked! D = 2.0
We could stop there. But we're curious... What was the average step size it was creating to get that D? So we make a subroutine to track that information. We find that the average of the absolute values of the step sizes for each iteration are coming out one by one:
1.48, 1.61, 1.56, 1.51, 1.68
The average of all those values is around 1.59
Why? We can take a more analytical approach. Since we are using the absolute value of each step to calculate the mean step size, we are essentially sampling from a half normal distribution. The mean of a half normal distribution is:
µ = σ * sqrt(2) / sqrt(pi)
Our standard deviation was 2.
2*sqrt(2) / sqrt(pi) = 1.5958
Nice, that's pretty much what we got (1.59).
So, to recap, we started with a diffusion rate D=2, which, using Einstein's equation, helped us determine we needed to generate a bunch of steps from a distribution µ=0 σ=2 which resulted in a bunch of steps that were +/ 1.59 on average. Then as a check, we took the mean square of all 100 steps for each particle and calculated D which turned out to be 2.02 so everything worked as planned.
We could stop there, but will we? Of course not. Let's do one last check, to see if we completely understand what's going on. Say we don't know D or MSD. In fact, we don't have any reallife empirical information to go on. Perhaps all we want to do is generate some random walks, and if we're interested, we can use those walks to calculate the diffusion coefficient (D). Let's make 100 particles take 100 steps with a step size of 1.5958 (randomly negative or positive), aka the average step size from our experiment above.
Before we do this, any guess what D is gunna turn out to be?
D = 1.27
Wrong!!
Duh! This value cannot be used as the X and Y step size, if you're expecting to get back to D. What you need is the Pythagorean length of the XY combined step. Einstein solved this too. But first, a recap:

m = 2d*D

k = sqrt(d * D)

λ = k * √2 / √π 
λ = (2d * D)^{0.5}
 m is mean squared displacement
 d is dimensions (2)
 D is diffusion rate
 k is σ of sampling distribution that yields D (µ=0)

λ is mean of half normal distribution with σ = k  λ is the mean length of the XY step
See our new equation there: λ = (2d * D). Now, all we have to do to get the X and Y step length values, is to calculate what two values equals λ when using the Pythagorean theorem:
 a²+b²=λ²
since 'a' and 'b' can be the same value, lets just use:
 2a² = λ²
and solve for 'a'

λᵃᵇ = √(2λ²)/2
so the step size for both X and Y is λᵃᵇ