Knoxville, TN: R for Text Analysis Workshop

The Knoxville R Users Group is presenting a workshop on text analysis using R by Bob Muenchen. The workshop is free and open to the public. You can join the group at A description of the workshop follows.

Seeking Cloud

R for Text Analysis

When analyzing text using R, it’s hard to know where to begin. There are 37 packages available and there is quite a lot of overlap in what they can do. This workshop will demonstrate how to do three popular approaches: dictionary-based content analysis, latent semantic analysis, and latent Dirichlet allocation. We will spend much of the time on the data preparation steps that are important to all text analysis methods including data acquisition, word stemming/lemmatization, removal of punctuation and other special characters, phrase discovery, tokenization, and so on. While the examples will focus on the automated extraction of topics in the text files, we will also briefly cover the analysis of sentiment (e.g. how positive is customer feedback?) and style (who wrote this? are they telling the truth?)

The results of each text analysis approach will be the topics found, and a numerical measure of each topic in each document. We will then merge that with numeric data and do analyses combining both types of data.

The R packages used include quanteda, lsa, topicmodels, tidytext and wordcloud; with brief coverage of tm and SnowballC. While the workshop will not be hands-on due to time constraints, the programs and data files will be available afterwards.

Where: University of Tennessee Humanities and Social Sciences Building, room 201. If the group gets too large, the location may move and a notice will be sent to everyone who RSVPs on or who registers at the UT workshop site below. You can also verify the location the day before via email with Bob at

When: 9:05-12:05 Friday 1/27/17

Prerequisite: R language basics

Members of UT Community register at: under Researcher Focused

Members of other groups please RSVP on your respective sites so I can bring enough handouts.

This entry was posted in Analytics, Data Science, R, Statistics, Text Analysis, Uncategorized. Bookmark the permalink.

9 Responses to Knoxville, TN: R for Text Analysis Workshop

  1. Seems like a great workshop, text analytics is such a valuable tool in market research. A little too far for our team, so I’m wondering if anything will be posted online?

    • Bob Muenchen says:

      Hi Drive Research,

      Sorry, there won’t be anything posted online, but I do frequently teach workshops onsite for corporate clients (see the Workshops menu). This one isn’t quite at that level of quality yet. I need to add a set of slides, exercises, other data sets to practice with, etc. Getting that done is a high priority for me, which I expect to have finished in around six weeks. The final version will make fairly heavy use of dplyr and the tidytext package, so if your team isn’t familiar with that, you’d want to have a day of that first (which I also teach). The “final” workshop version (as far as anything is final in R!) will also include some comparison of R packages with commercial text analysis software, a few non-R open source alternatives, and how to best migrate if you’re already using a commercial package.


      • George Kuhn says:

        Thanks Bob, just came across another list of yours which will be helpful. Let me know when this is compiled. I’d love to take a look as we have several companies we work with who utilize these packages.

        • Bob Muenchen says:

          Hi George,

          This reminds me that I need to add the newest text analysis packages to the SAS / SPSS Add-ons list! I did find a good market research example to use in the workshop, but if you have any others, please let me know.


  2. Pingback: Knoxville, TN: R for Text Analysis Workshop - Use-R!Use-R!

  3. Thomas Egan says:

    Any advice on where to park?

  4. Elham says:

    Hi. Do you guys have the workshop in January 2018?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.