Borderline Elementary

A collection of (scientific) thoughts

Data Carpentry Workshop at Iowa State University

It is always a great experience hosting and teaching data carpentry (and/or software carpentry) workshop at Iowa State University. Selfishly, this is because I am currently at Iowa State University, but also because we have many local instructors with great teach experiences and various teaching styles that I can learn from. Although it is a little bit delayed, I do want to summize some of the things we learned and maybe the experience could be useful for me as well as someone else in the future.

The worshop

The recent Data Carpentry workshop carried out at Iowa State University (August 11-12, 2016) was part of the training/orientation for P3 NSF fellows as well as current students in several inter-departmental program at Iowa State. Maryanne Moore has put lots of thoughts into the entire orientation program. Our audience ranged from fresh incoming graduate student (literally arrived a week beforehand) to existing graduate students experiencing the data monster. The workshop was led by Dr. Jin Choi and taught by Jin, Dr. Carolyn Lawrence-Dill, Renee Walton from Carolyn’s group, and myself. We were able to steal Julian Trachsel again for TA. It was a wonder team at the get-go!

The workshop was divided into Bash Shell session (the first day) and the R session (the second day). The course materials followed the 2015 Data Carpentry held at Iowa State University mostly (see here for last year’s materials). The changes include introduction to find and related exercises. For visualization part of the R, we shortened the introduction on ggplot2 and added a presentation on effective data visulation.

The feedback and thoughts

Overall, our audience loved it! They loved our enthusiasm and the materials. They also wanted more despite they had a 10 hour day on August 10th! Because of the broad background of our students, it was no surprise that we received many mixed feedbacks on the pace. We did notice that there were more polar opposite comments on the pace (i.e., some thought the pace was too slow, while some other thought it was too fast) for Shell session than R session, however. My hypothesis is that students have been exposed to R more due to course work (e.g., statistics) and research experience (e.g., figure making and literature searching), while Bash shell is frequently a background tool that were not mentioned. Because of the level of exposure, our audience is more adapted to the notation and jargon in R.

I also found it more challenging for the audience to truly appreciate the power of Bash shell in a day. I frequently think of Bash shell as a power tool that you can switch out parts (e.g., drill bits and screw drivers) for different purpose, while R is the ranch. This difference made the usage of R very apparent and purposeful at the same time, while to use Bash shell, one has to learn every single bits and know what project to do.

Because we had some extra time after the introduction of R, we introduced R function subset and merge and did a quick live coding to show how basic data rangling could be done in basic R. This was also later contrasted with the more complex functions in package dplyr. My thought was that it’s always good to introduce some functions come with R, because while they are extremely useful, the installation of high-end packages like dplyr could be tricky on some remote computers.

Like always, our audience truly enjoyed the power of ggplot2. However, because we cut the introduction short, the students thought it would be good to explain the ggplot layers more to overcome the steep learning curve. Due to time constraints, we also removed the capstone topic we introduced last year, where we would show the students how to use a shell script to automate R graphing using ggplot2 over 100 files. However, Jin and I still believe this exercise is a great wrap up and emphasis on the power of Bash shell.

As for the new material this year, Carolyn’s presentation on how to present data was a hit! Everyone, including myself, learned so much about how to let the data tell the story effectively. The important detail goes from color selection, line graph angles for effective comparison (banking to 45 degrees), and figure/table captions. We thought it was a great addition.


The whole workshop would not be so successful without the planning and help of Maryann Moore. We greatly appreciate the supports (and delicious lunches) from the College of Agricultural and life sciences. We also thank the Department of Agricultural and Biosystems Engineering for generously letting us use the conference room. Finally but not the least, we thank Dr. Tracy Teal and Maneesha Sane for all of the support and help!