Stan is a modeling language for Bayesian data analysis4 Stan 1.0 was released in 2012.. The actual work is done in C++, but the Stan language specifies the necessary aspects of the model. It uses a variety of inference procedures, including standard optimization techniques commonly found elsewhere, but the primary Bayesian-specific approach regards Hamiltonian Monte Carlo5 Originally called *Hybrid* Monte Carlo, such a name is a bit too vague for most.. There are actually a number of ways in which Bayesian estimation/sampling can take place and this is one that has, like the others, advantages in some areas, and disadvantages in others6 For many types of problems, relative to Gibbs and some other samplers, HMC will be more efficient in the sense it will take fewer iterations to describe the posterior distribution.. It is however applicable in a wide variety of problems and will often do better than other approaches.

To use Stan directly, one just needs to know their model intimately, which should be the case if you’re going to spend so much time in collecting, processing and analyzing data in the first place. I personally find the *style* of the Stan language clear, and it might be seen as a combination of C++ and R programming styles, though mostly the latter. The following will take you through the components of the Stan language.

To get started with Stan (in R), one needs to install the rstan package. While this will proceed as with any other package, additional steps are required. First, you will need a C++ compiler, and this process will be different depending on your operating system. It won’t take much, there’s a chance you already have one even, but steps are clearly defined on the RStan wiki. After going through that process, you may then install the rstan package and dependencies. You’ll be ready to go at that point.

The basic workflow you’ll engage in to run a Stan program within R is as follows:

- Write the Stan program
- Create a data list
- Run a debug model to check compilation etc.
- Run the full model
- Summarize the model
- Check diagnostics, including posterior predictive inspection

In this part we’ll consider the elements of the Stan program.

The following shows the primary parts of a Stan program for a standard linear model. We’ll go through each component in turn.

```
data { // Data block
int<lower=1> N; // Sample size
int<lower=1> K; // Dimension of model matrix
matrix[N, K] X; // Model Matrix
vector[N] y; // Target variable
}
transformed data { // Transformed data block.
}
parameters { // Parameters block
vector[K] beta; // Coefficient vector
real<lower=0> sigma; // Error scale
}
transformed parameters { // Transformed parameters block.
}
model { // Model block
vector[N] mu;
mu = X * beta; // Creation of linear predictor
// priors
beta ~ normal(0, 10);
sigma ~ cauchy(0, 5);
// likelihood
y ~ normal(mu, sigma);
}
generated quantities { // Generated quantities block.
}
```

For a reference, the following is from the Stan manual, variables of interest and the associated blocks where they would be declared:

Variable Kind | Declaration Block |
---|---|

modeled, unmodeled data | data, transformed data |

modeled parameters, missing data | parameters, transformed parameters |

unmodeled parameters | data, transformed data |

generated quantities | transformed data, transformed parameters, generated quantities |

loop indices | loop statement |

```
data { // Data block
int<lower=1> N; // Sample size
int<lower=1> K; // Dimension of model matrix
matrix[N, K] X; // Model Matrix
vector[N] y; // Target variable
}
```

The first section is the data block, where we tell Stan the data it should be expecting from the data list. It is useful to put in bounds as a check on the data input, and that is what is being done between the < > (e.g. we should at least have a sample size of 1). The first two variables declared are N and K, both as integers. Next the code declares the model matrix and target vector respectively. As you’ll note here and for the next blocks, we declare the type and dimensions of the variable and then its name. In Stan, everything declared in one block is available to subsequent blocks, but those declared in a block may not be used in earlier blocks. Even within a block, anything declared, such as N and K, can then be used subsequently, as we did to specify dimensions.

```
transformed data { // Transformed data block
vector[N] logX;
logX = log(X);
}
```

The transformed data block is where you could do such things as log or center variables and similar, i.e. you can create new data based on the input data or just in general. If you are using R though, it would almost always be easier to do those things in R first and just include them in the data list. You can also declare any unmodeled parameters here.

```
parameters { // Parameters block
vector[K] beta; // Coefficient vector
real<lower=0> sigma; // Error scale
}
```

The primary parameters of interest that are to be estimated go in the parameters block. As with the data block you can only declare these variables, you cannot make any assignments. Here we note the \(\beta\) and \(\sigma\) to be estimated, with a lower bound of zero on the latter. In practice you might prefer to split out the intercept or other coefficients to be modeled separately if they are on notably different scales.

```
transformed parameters { // Transformed parameters block
real newpar;
newpar = exp(oldpar);
}
```

The transformed parameters block is where optional parameters of interest might be included. What might go here is fairly open, but for efficiency’s sake you will typically want to put things only of specific interest that are dependent on the parameters block. These are evaluated along with the parameters, so if they are not of special interest you can generate them in the model or generated quantities block to save time and space.

```
model { // Model block
vector[N] mu;
mu = X * beta; // Creation of linear predictor
// priors
beta ~ normal(0, 10);
sigma ~ cauchy(0, 5);
// likelihood
y ~ normal(mu, sigma);
}
```

The model block is where your priors and likelihood are specified, along with the declaration of any variables necessary. As an example, the linear predictor is included here, as it will go towards the likelihoodThe position within the model block isn’t crucial. I tend to like to do all the variable declarations at the start, but others might prefer to have them under the likelihood heading at the point they are actually used.. Note that we could have instead put the linear predictor in the transformed parameters section, but this would slow down the process, and again, we’re not so interested in those specific values.

```
generated quantities {
vector[N] yhat; // linear predictor
real<lower=0> rss; // residual sum of squares
real<lower=0> totalss; // total SS
real Rsq; // Rsq
yhat = X * beta;
rss = dot_self(y-yhat);
totalss = dot_self(y-mean(y));
Rsq = 1 - rss/totalss;
}
```

The generated quantities block is a fantastical place where anything in your noggin can spring forth to life. *Anything* that you can think of that can be calculated based on the model results can be assessed here. What’s more, it will have a distribution just like everything else. The above calculates the typical R^{2} as an example. Because of the priors, it is already ‘adjusted’, but we also have an interval estimate for it.

As another example, to get a sense of how well you’re capturing the tails of the distribution for `y`

, you could start by calculating your minimum mean prediction, and see how often it falls below the true minimum. From the Bayesian approach we could not only get an interval of the minimum prediction, we’d get the probability for how often it is above or below the true minimum, which is simply the proportion of samples for which this takes place. Ideally we’d like a symmetric distribution for the estimated minimum around the true minimum with no general tendency to be above or below. If it was a very high probability of being lower than the true minimum, perhaps the model over compensates for the lower tail of the distribution. If it was very low, it would perhaps signal we are not capturing lower extremes very well. We could do this for the maximum or any other value of interest.

Now that you have a Stan program in place, you’re ready to proceed.