Baseball Data Wrangling with Vagrant, R, and Retrosheet

Analytics with the Chadwick tools, dplyr, and ggplot.

4.25 (163 reviews)
Data Science
Baseball Data Wrangling with Vagrant, R, and Retrosheet
2 hours
Jun 2015
last update
regular price

What you will learn

install VirtualBox and Vagrant

run a virtual Linux machine

install the Chadwick software tools

extract game and play-by-play baseball data from Retrosheet files

produce graphs with ggplot


This course is for those interested in doing baseball analytics with the Retrosheet game-by-game and play-by-play data. The main tools for working with such data are in the Chadwick software. We install a virtual Linux machine, on which we will install the Chadwick software. We will then learn how to extract baseball data with the Chadwick software, how to further filter the data with dplyr in R, and how to plot our results with ggplot.

For the first part of the course, in which we install the virtual Linux machine and learn how to work with the Chadwick software, there are no prerequisites. To follow the second part of the course, knowledge of dplyr is necessary. This can be obtained through my course "Baseball Database Queries with SQL and dplyr".

At a relaxed pace, the course should take two to three weeks to complete.


Setting up Vagrant

Installing VirtualBox
Installing Vagrant
Creating a Project Folder
Vagrant Up
Directory Structure

Installing and Working with the Chadwick Software

Downloading the Chadwick Software
Installing the Chadwick Software
The Retrosheet Files
cwevent and cwgame

Project #1: Mike Schmidt and Greg Luzinski

Data Extraction
Reading our data into R
The Result Column
The Date Column
The Date Column Part II
The Player Data Frames
ggplot Crash Course
Cumulative Home Run Plots
Colors and Legend

Project #2: Dykstra, Murray, and Brett

Project Description
Data Extraction
Reading the data into R
The Date Column
The Result and AB Columns
The Player Data Frames
The Plots
The Four-Hundred Line
The Marchi/Albert Book and Course Wrap-Up


Baseball Data Wrangling with Vagrant, R, and Retrosheet - Screenshot_01Baseball Data Wrangling with Vagrant, R, and Retrosheet - Screenshot_02Baseball Data Wrangling with Vagrant, R, and Retrosheet - Screenshot_03Baseball Data Wrangling with Vagrant, R, and Retrosheet - Screenshot_04


June 29, 2023
Its impossible to follow along because the course is so outdated. The ubuntu download link is dead, the Chadwick page looks nothing like the instructors and the link in which you're suppose download the data is no longer there, I did about 30% of the course and could not go beyond this point, nor could I find anything on Youtube to help.
January 21, 2023
Good course that gave hands-on using baseball statistics about analyzing data. I also got value on seeing how to install a Linux virtual machine on a Windows PC.
April 6, 2020
Easy course to introduce first steps in R and to know others tools (VM, Vagrant,...). It's very useful and funny.
March 24, 2019
I learned so much that I can't wait to delve into past statistics as well as present ones. Thank you so much!
January 22, 2019
Easily one of the best courses of all in Udemy. Way, way, way valuable and educational. The instructor is the best. Super high pedagogy, and he can really explain difficult concepts in an easy-going manner.
December 27, 2018
From my perspective there are no best practices for data scientists or programmers who work with the code. I can't stand the way that the instructor codes: a bit in a script, a bit in a console and he also erases the code. I expect install.package command to be commented after all instead of clicking. To check if the function as.Data works I'd use glimpse function from dplyr. However I like the way that instructor explains what he is doing.
May 15, 2018
Dr. Charles Redmond did an amazing job on developing this course. He has taught me so much. Prior to this course, I had tried to tackle the Retrosheet data in spreadsheets. I used both Excel and Libre Calc to get data, but R and the Chadwick tools make it amazingly to the point. Thanks Dr. Redmond! I would do things in the Linux side a bit different, such as joining records with the "join" command, and perhaps using "awk" or some other Linux thing to prepare the data prior to entering the RStudio. But
January 24, 2018
everything works january 2018 in window 10 1702 with no creator update.. very clever interface...easy transition between two environments for and linux...pleasantly surprised window user...thanks stephen
November 12, 2017
A lot of useful skills are covered in this course such as the Linux command line, setting up a virtual machine, and plotting things with R. If you're a Linux user or you have set up virtual machines before such as myself, the first part won't prove too useful. Retrosheet and Chadwick are both incredibly useful for baseball fans and Charles does a great job explaining how to take advantage of these incredible resources.
October 7, 2017
This was a pretty easy tutorial. That said I have taken two of the instructors other courses. If nothing else the end result is just cool. I'm pretty confident if you like baseball and you like stats than this is probably a great course to take.
August 24, 2017
Excellent guidance on the installs. Excellent tutorials with applications beyond baseball! I love this guy's teaching style, truly collegial in the best sense.
August 2, 2017
I love this professors' teachings. They are very detailed and step by step approach which makes a learner more confident of gaining grip on the subject. Look forward to learn more from Mr.Charles Redmond.
February 11, 2017
Good teacher, he explains everything simple and easy. I always enjoy his courses. We take all of his lectures.
August 4, 2015
Thanks for an awesome course, Charles! Hard to explain in words unless students go thru these courses. All your courses are simply awesome! Much appreciated. Keep it coming.
July 12, 2015
Explanations are simple and to the point. Gives appropriate needed information for the covered material. Instructor is knowledgeable and uses simple language to put the point across.



Baseball Data Wrangling with Vagrant, R, and Retrosheet - Price chart


Baseball Data Wrangling with Vagrant, R, and Retrosheet - Ratings chart

Enrollment distribution

Baseball Data Wrangling with Vagrant, R, and Retrosheet - Distribution chart

Related Topics

udemy ID
course created date
course indexed date
course submited by