Skip to main content

Statistical computing boot camp teaches crucial skills to life sciences undergraduates

Students learn to work with research data using R in a virtual workshop series created and run by a senior statistics major
26 May 2021

At the fore of any rapidly evolving field there’s always a gap between requisite skills and training, recent graduates entering the workforce with the best, and yet still suboptimal, preparation.

In science, big data has now flooded every corner and subfield, elevating statistical and computational skills to the must-have level for virtually any aspiring researcher.

This was the landscape pre-COVID, and the pandemic has only widened the gap as students now struggle to gain experience in labs where space has consequently been restricted, resources limited, and opportunities curbed.

David Chen, a senior Schreyer Scholar in Penn State’s Integrated Undergraduate/Graduate Degree Program in Statistics
David Chen, a senior Schreyer Scholar in ​​​Penn State’s Integrated
Undergraduate/Graduate Degree Program in Statistics

“These students can’t just go into the lab and get that experience, so the discussion started moving towards how we can support them given this remote environment,” said David Chen, a senior Schreyer Scholar in Penn State’s Integrated Undergraduate/Graduate Degree Program in Statistics.

In collaboration with the University Libraries and the Penn State Eberly College of Science’s Office of Science Engagement, Chen developed a boot camp-style series of virtual workshops with input from Eberly college faculty to teach undergraduates statistical and computational skills essential for life sciences research.

Connecting dots

“We hear from our industry connections that they are often looking for life sciences people with quantitative and programming skills straight out of undergrad, and they have a terrible time finding them,” said Tomalei Vess, director of the Office of Science Engagement, which supports undergraduate students’ career and professional development with connections to research, global experience, internships, and full-time employment.

Along with representatives from the college’s Office of Diversity and Inclusion, Academic Advising Center, and Alumni Society, Vess hosts an informal weekly meeting — “Coffee with College Characters” — to build a sense of community with students and hear their insights; and it was there that concerns were raised about this skills gap and the compounding effect of pandemic restrictions on students’ research experience.

In response, Chen pitched his idea for the workshops and with Vess began gathering feedback from Eberly college faculty on specific skills in statistics and computation that would help undergraduates secure research opportunities in the life sciences.

Chen then took that information back to the Libraries, where he was working a graduate assistantship as a research consultant in the Department of Research Informatics and Publishing; and there he used it to develop the workshops with his supervisor, Eric N. and Bonnie S. Prystowsky Early Career Science Libraries Professor Briana Ezray, research data librarian for STEM and manager of the Data Learning Center, which supports the University’s faculty, staff, postdoctoral scholars, and students with consultation, instruction, and other resources in research data skills.

“I’d been doing workshops for graduate students, staff, and faculty, but that didn’t reach undergraduates,” she said. “So I was super excited when David said he wanted to do this, because there is that gap — a demand for it.”

Bridging the gap

Chen decided to focus his workshops on the programming language R, which was designed specifically for statistical computing and graphics and is widely used by scientists in academia and industry.

“It encompasses any part of a project where you’re working with data,” he explained. “Whether it’s after your initial data collection and now you need to process it, you want to visualize it to create a report, or you want to run statistical models on it, R is very effective.”

To accommodate participants’ varying skill levels, Chen surveyed the students ahead of the workshops; and after each session, he gathered their feedback to tailor the subsequent workshop based on their mastery of the concepts presented.

The concepts themselves, including data wrangling and visualization, Chen selected with input from Eberly college faculty across the life sciences.

“If we're trying to support students and getting them into research positions, we have to understand what the research supervisors actually want,” he explained.

Chen also compiled real data sets to use in all of the exercises, “something the students would actually see. I didn't want this to be a situation where students couldn't apply what they learned to a real-world scenario,” he said.

To reinforce the material and ensure that the students were actively engaged and following along, Chen structured each session to alternate between presenting concepts and doing related exercises, certain of which were designed to teach through coding errors.

“Essentially with programming, people tend to get scared or worried when they try to run the code and it fails,” he explained. “A big point of my teaching was that errors are a good thing, because they really help you understand the language.”

Positive response

A testament to his workshops’ success, Chen was subsequently invited to present to a more advanced audience at the annual ASA DataFest competition hosted by the statistics department, a data science “hackathon” for undergraduates and master’s students.

“That was interesting because it was an entirely different group,” he said. “One hundred percent of the participants at DataFest had coding experience, and the majority considered themselves to be of intermediate skill, so it was a fun challenge converting all the material to accommodate that change in experience. Luckily, it turned out well.”

Although Chen graduated in May, Ezray said her plan is to continue offering the workshops for undergraduates along with those for graduate students, staff, and faculty.

“There's so much demand at all levels, and we're filling that gap,” she said. “These workshops have been some of our most successful.”

According to Vess, the success of Chen’s workshops also points to a positive shift in the educational landscape, brought on by the pandemic but hopefully much longer lasting.

“If there are any silver linings to COVID times, I think this is one of them,” she said. “It’s facilitated change and engagement. All of a sudden, the recommendation that students learn these skills became a necessity. So it created an openness that may not have been there before.”

As for Chen, the response to the workshops took even him by surprise.

“To see everyone jump on this opportunity, really participating and actively being involved,” he said, “that was amazing to me, and a really fun experience.”