The Knoxville R Users Group is presenting a workshop on text analysis using R by Bob Muenchen. The workshop is free and open to the public. You can join the group at https://www.meetup.com/Knoxville-R-Users-Group. A description of the workshop follows.
R for Text Analysis
When analyzing text using R, it’s hard to know where to begin. There are 37 packages available and there is quite a lot of overlap in what they can do. This workshop will demonstrate how to do three popular approaches: dictionary-based content analysis, latent semantic analysis, and latent Dirichlet allocation. We will spend much of the time on the data preparation steps that are important to all text analysis methods including data acquisition, word stemming/lemmatization, removal of punctuation and other special characters, phrase discovery, tokenization, and so on. While the examples will focus on the automated extraction of topics in the text files, we will also briefly cover the analysis of sentiment (e.g. how positive is customer feedback?) and style (who wrote this? are they telling the truth?)
The results of each text analysis approach will be the topics found, and a numerical measure of each topic in each document. We will then merge that with numeric data and do analyses combining both types of data.
The R packages used include quanteda, lsa, topicmodels, tidytext and wordcloud; with brief coverage of tm and SnowballC. While the workshop will not be hands-on due to time constraints, the programs and data files will be available afterwards.
Where: University of Tennessee Humanities and Social Sciences Building, room 201. If the group gets too large, the location may move and a notice will be sent to everyone who RSVPs on Meetup.com or who registers at the UT workshop site below. You can also verify the location the day before via email with Bob at muenchen@utk.edu.
When: 9:05-12:05 Friday 1/27/17
Prerequisite: R language basics
Members of UT Community register at: http://workshop.utk.edu under Researcher Focused
Members of other groups please RSVP on your respective sites so I can bring enough handouts.
Seems like a great workshop, text analytics is such a valuable tool in market research. A little too far for our team, so I’m wondering if anything will be posted online?
Hi Drive Research,
Sorry, there won’t be anything posted online, but I do frequently teach workshops onsite for corporate clients (see the Workshops menu). This one isn’t quite at that level of quality yet. I need to add a set of slides, exercises, other data sets to practice with, etc. Getting that done is a high priority for me, which I expect to have finished in around six weeks. The final version will make fairly heavy use of dplyr and the tidytext package, so if your team isn’t familiar with that, you’d want to have a day of that first (which I also teach). The “final” workshop version (as far as anything is final in R!) will also include some comparison of R packages with commercial text analysis software, a few non-R open source alternatives, and how to best migrate if you’re already using a commercial package.
Cheers,
Bob
Thanks Bob, just came across another list of yours which will be helpful. Let me know when this is compiled. I’d love to take a look as we have several companies we work with who utilize these packages.
Hi George,
This reminds me that I need to add the newest text analysis packages to the SAS / SPSS Add-ons list! I did find a good market research example to use in the workshop, but if you have any others, please let me know.
Cheers,
Bob
Any advice on where to park?
Hi Thomas,
Check out the Volunteer Hall parking garage in K-2.5 location of this map:
http://parking.utk.edu/wp-content/uploads/sites/6/2014/06/UT-Campus-Parking-Map-2016-17_v6-9152016.pdf
See you soon!
Bob
Hi. Do you guys have the workshop in January 2018?
Hi Elham,
I’m afraid not. This semester is much busier than usual.
Cheers,
Bob