Text Analysis in R
Introduction
Overview
Goals
Prerequisites
Initial Steps
String Theory
Basic data types
Character strings
Factors
Analysis
Characters vs. Factors
Basic Text Functionality
Base R
Useful packages
Other
Summary of basic text functionality
Regular Expressions
Typical Uses
dplyr helper functions
Text Processing Examples
Example 1
Example 2
Exercises
Sentiment Analysis
Basic idea
Issues
Context, sarcasm, etc.
Lexicons
Sentiment Analysis Examples
The first thing the baby did wrong
Romeo & Juliet
Sentiment Analysis Summary
Exercise
Step 0: Install the packages
Step 1: Initial inspection
Step 2: Data prep
Step 3: Get sentiment
Step 4: Visualize
Part of Speech Tagging
Basic idea
POS Examples
Barthelme & Carver
More taggin’
Tagging summary
POS Exercise
Topic modeling
Basic idea
Steps
Topic Model Example
Shakespeare
Extensions
Topic Model Exercise
Movie reviews
Associated Press articles
Word Embeddings
Shakespeare example
Wikipedia
Summary
Shakespeare Start to Finish
ACT I. Scrape MIT and Gutenberg Shakespeare
Scene I. Scrape main works
Scene II. Sonnets
Scene III. Save and write out
Scene IV. Read text from files
Scene V. Add additional works
ACT II. Preliminary Cleaning
Scene I. Remove initial text/metadata
Scene II. Miscellaneous removal
Scene III. Classification of works
ACT III. Stop words
Scene I. Character names
Scene II. Old, Middle, & Modern English
Scene III. Remove stopwords
ACT IV. Other fixes
ACT V. Fun stuff
Scene I. Count the terms
Scene II. Stemming
Scene III. Exploration
Scene IV. Topic model
Appendix
Texts
Donald Barthelme
Raymond Carver
Billy Dee Shakespeare
R
Python
A Faster LDA
An Introduction to Text Processing and Analysis with R
An Introduction to Text Processing and Analysis with R
In the beginning was the word ...
Michael Clark
m-clark.github.io
2018-09-09