2 Austin You
I had a few more electives left in my graduation requirements and was looking for ones that interested me, and I happened to come across a posting for the IST program in the spring of 2025. As I read the description of what the role would be like, I realized that it had many of the benefits of a co-op term but in a field that I would not have thought to apply - data science - lack of experience made me hesitant in previous co-op hiring terms to apply for data science roles. Seeing as I would get to learn technical skills over the summer and have chances to make an impact on a sports organization within my own school with those skills starting in the fall, upon receiving an offer to be an infrastructure member of the IST team, I readily accepted the opportunity.
Projects
Automated Data Pipeline for Balltime
For this project, I created a data extraction pipeline for the Balltime volleyball video platform, which relies on a proprietary computer vision system to classify actions in footage of volleyball games or practices. The platform makes the data available in JSON, but Balltime does not currently have an API endpoint that makes its data available to developers. Thus, I learned to build a web scraper using the popular web scraping library Selenium that automatically retrieves that data on a daily cadence determined by a Github Action. I used the medallion architecture to inform my approach throughout the project; after the logic for scraping raw (bronze) JSON data was complete, I wrote several scripts using the pandas library to split the data into CSV tables which could be further cleaned and processed into silver level data. Both the bronze and silver data are currently stored on our Balltime Github repository. This silver data can then be used to form gold level tables, containing only the data which is necessary for a given project. Throughout the project, I coordinated with another infrastructure IST member to create data dictionaries for the bronze and silver data which would help us keep track of changes to either level of data; she would draft the dictionaries in a pull request, and I would review them and make comments for changes if needed. This process was somewhat difficult as the format of the Balltime data would change every so often, and the Balltime platform itself does not have up-to-date documentation on the format of its data.
Our goal in the future is to be able to validate the data from Balltime for accuracy more fully. Balltime data is often processed faster than data uploaded to our other video solution, Perfbook, which relies on manual data labeling. We would like to incorporate Balltime data into our data solutions so long as the data it reports is sufficiently accurate for our needs.
Wellness & Jump Data Collection
During the 2025-26 winter break, the data collection for Wellness & Jump Data dashboard for the women’s volleyball team needed to be manually done. Jump metrics and wellness survey data would be available via the VALD and Qualtrics platforms, but as we had not automated the process of updating the data available to our dashboard, I needed to update the data on a weekly basis over the course of the winter break. I did so by manually running a script which pulled the latest wellness and jump data and outputted them to a CSV file, which I pushed to our dashboard’s Github repository. I also coordinated with our Sports Science IST student to ensure that each week’s update went smoothly. By the break’s end, the process was automated and my role on the project was no longer needed.
Reflections
I had to deal with several personal obstacles throughout the 2025-26 season as a IST. I am grateful to my team lead, David, for being willing to accommodate me while I was dealing with them and for motivating me to continue my IST work despite those challenges. That said, with the IST being the first time I learned to build a web scraper or use the Balltime platform, I did struggle in the fall term to acclimate to the technology we were using. This resulted in, from my point of view, stunted progress through the fall, which affected how much I could realistically do before the end of the season in February. I am still proud that, despite all the challenges I faced, I was able to learn basic web scraping and the process of automating it to the point where I can say that I built a functioning pipeline that scrapes and cleans over 100 files of game and practice footage.
One thing I believe I could have improved on personally if I were to do the IST again is to develop my oral presentation skills. I think it would have benefited the data solutions team, the coaches and myself if I were better able to concisely give reports on progress for my projects. I tended to ramble if I wasn’t completely confident in some point I was talking about, which might be fixed by taking the time to understand points I want to talk about but don’t fully understand, or simply admitting my lack of understanding in the moment and trying later to make up that knowledge deficit.