• Text Analysis in R
  • Introduction
    • Overview
      • Goals
      • Prerequisites
    • Initial Steps
  • String Theory
    • Basic data types
      • Character strings
      • Factors
      • Analysis
      • Characters vs. Factors
    • Basic Text Functionality
      • Base R
      • Useful packages
      • Other
      • Summary of basic text functionality
    • Regular Expressions
      • Typical Uses
      • dplyr helper functions
    • Text Processing Examples
      • Example 1
      • Example 2
    • Exercises
  • Sentiment Analysis
    • Basic idea
    • Issues
      • Context, sarcasm, etc.
      • Lexicons
    • Sentiment Analysis Examples
      • The first thing the baby did wrong
      • Romeo & Juliet
    • Sentiment Analysis Summary
    • Exercise
      • Step 0: Install the packages
      • Step 1: Initial inspection
      • Step 2: Data prep
      • Step 3: Get sentiment
      • Step 4: Visualize
  • Part of Speech Tagging
    • Basic idea
    • POS Examples
      • Barthelme & Carver
      • More taggin’
    • Tagging summary
    • POS Exercise
  • Topic modeling
    • Basic idea
    • Steps
    • Topic Model Example
      • Shakespeare
    • Extensions
    • Topic Model Exercise
      • Movie reviews
      • Associated Press articles
  • Word Embeddings
    • Shakespeare example
    • Wikipedia
  • Summary
  • Shakespeare Start to Finish
    • ACT I. Scrape MIT and Gutenberg Shakespeare
      • Scene I. Scrape main works
      • Scene II. Sonnets
      • Scene III. Save and write out
      • Scene IV. Read text from files
      • Scene V. Add additional works
    • ACT II. Preliminary Cleaning
      • Scene I. Remove initial text/metadata
      • Scene II. Miscellaneous removal
      • Scene III. Classification of works
    • ACT III. Stop words
      • Scene I. Character names
      • Scene II. Old, Middle, & Modern English
      • Scene III. Remove stopwords
    • ACT IV. Other fixes
    • ACT V. Fun stuff
      • Scene I. Count the terms
      • Scene II. Stemming
      • Scene III. Exploration
      • Scene IV. Topic model
  • Appendix
    • Texts
      • Donald Barthelme
      • Raymond Carver
      • Billy Dee Shakespeare
    • R
    • Python
    • A Faster LDA
  • MC logo

An Introduction to Text Processing and Analysis with R

An Introduction to Text Processing and Analysis with R

In the beginning was the word ...

Michael Clark m-clark.github.io University of Michigan: CSCAR University of Michigan: Advanced Research Computing

2018-09-09