This chapter is a grab-bag of various techniques that have a latent variable interpretation to the models. Only brief descriptions are provided at present, though more may be added in the future. In addition, you can see some more techniques in the associated notes that were used to give a workshop on factor analysis and related techniques, though the bulk of it is covered in this document
Practically everyone has been exposed to recommender systems such as collaborative filtering and related models. That’s how Netflix, Amazon and others make their recommendations to you given the information you’ve provided about likes and dislikes, what other similar people have provided, and how similar the object of interest is to others.
The following image, taken from Wikipedia (click the image to go there), conceptually shows how a user-based collaborative filtering method would work, where a recommendation is given based on what other similar users have given.
Let’s go with movies as an example. You might only rate a handful, and indeed most people will not rate most movies. But at some point most movies will have been rated. So how can one provide a recommendation for some movie you haven’t seen? If we group similar movies into genres, and similar people into demographic categories and based on taste, one can recommend something from a similar genre of movies that you like, that people in the same demographic category seem to like as well.
If you think of genres as latent variables for movies, you can employ the factor analytic techniques we’ve talked about. Similarly, we can find clusters of people using cluster analytic techniques. In short, collaborative filtering/recommender systems are using latent variable techniques to a specific type of data, e.g. ratings. More modern approaches will incorporate user and item characteristics, recommendations from other systems, and additional information. The following provides some code for you to play with, using a straightforward singular value decomposition on movie ratings, which is the same technique used in Base R’s default prcomp function for PCA. You might compare it with
method = 'POPULAR'.
After running it try some predictions.
# predicted ratings for two users recom <- predict(recom_svd, MovieLense[2:3], type="ratings") recom as(recom, "matrix")[,1:10] # comparison model recom_popular <- Recommender(MovieLense, method = "POPULAR") getModel(recom_popular)$topN recom <- predict(recom_popular, MovieLense[2:3], type="topNList") recom as(recom, "list")
All in all, thinking about your data in terms of a recommendation system might not be too far-fetched, especially if you’re already considering factor analysis of some sort.
Aside from mixture models, when people use the term ‘cluster analysis’ they are typically referring to distance-based methods. Given a distance matrix that informs how dissimilar observations are from one another, the methods try to create clusters of observations that are similar to one another, and clusters that are more distinct from other clusters.
K-means cluster analysis is probably the most commonly used clustering method out there. Conceptually it’s fairly straightforward- find \(k\) clusters that minimize the variance of its members from the mean of its members. As such it’s easy to implement in standard data settings.
K-means can actually be seen as a special case of the Gaussian mixture model described in a previous chapter, and it also has connections to PCA and ICA. The general issue is trying to determine just how many clusters one should retain. The following plot shows both a two and three cluster solution using the kmeans function in base R, e.g.
Other methods can be thought of as a clustering the data in a hierarchical fashion. These can start at the bottom of the hierarchy (agglomerative), allowing every observation into its own cluster, and successively combining them. For example, first choose a measure of dissimilarity, and combine the two observations that are most alike, then add one to those or if another pair are closer, make a new cluster. Conversely, one can start with every observation in one cluster (divisive), and take the most dissimilar and split it off, continuing on until every observation is in its own cluster.
Practically at every turn you’re faced with multiple options for settings to choose (distance, linkage method, cluster determination, general approach), and most decisions will be arbitrary . While these are actually still commonly used, you always have better alternatives. They are fine to use in a quick visualization to sort the data more meaningfully though, as above.
The latent linear model versions of PCA and factor analysis assume the observed variables are normally distributed (even standard PCA won’t work nearly as well if the data aren’t. This is not required, and independent components analysis (ICA) does not. This visualization duplicates that seen in Murphy (2012) where we have two (uniform) independent sources. We can see that the ICA correctly recovers those components.
If you believe that truly independent sources of signal underlie your data, ICA would be an option. It is commonly applied to deal with images or sound.