4  Kyu Min Shim

Author

Graduate Student
PhD Statistics
Men’s Volleyball (MVB)

This was my second year in the IST program as an Analyst for the Men’s Volleyball team. This role combines my love for volleyball with my passion for data science, and I have been able to work on projects that have a direct impact on the team. Over the last two years, I gained a lot of technical skills (e.g., creating data pipelines and dashboards, automating workflows, cloud computing) that I would otherwise have not been able to develop in a traditional academic setting, and I have been able to apply these skills in a real-world context. My favourite part about this program is working with a great group of supportive and talented people who are passionate about their sports and the success of the team. Overall, this has been an incredibly rewarding experience, and I look forward to continuing my journey in the IST next year.

Projects

Volleyball Statistics Dashboard

I developed a dashboard for visualizing key statistics of games available on Perfbook, which is a platform that provides detailed statistics for volleyball matches. I utilized the play-by-play data of all Canadian university volleyball games from Perfbook to create a dashboard that allows users to explore various statistics such as attack efficiency, serve efficiency, and block success rates. The dashboard was designed to be user-friendly and interactive, allowing coaches and players to filter an slice the data by different criteria such as season, team, opponent, and player. This tool has been used by the coaching staff to analyze our own performance as well as that of our opponents, providing valuable insights that have informed our game strategies.

Player Contribution Modelling

In this project, I developed a model to estimate the contribution of individual players to the overall team performance based on the work by Powers. Using the play-by-play data from Perfbook, I created a Markov chain model that estimates the probability of winning a point based on some sequence of actions. This model is then used to estimate the contribution of each player to the team’s success by analyzing how individual player’s actions (e.g., attacks, serves, blocks, digs) influence the probability of winning a point. This approach diverges from simple metrics like kill percentage or serve efficiency, which heavily favours attackers who are responsible for successfully ending a rally, and instead provides a more holistic view of a player’s contribution to the team’s performance by considering the context of their actions and how they influence the flow of the game.

Reflections

  • I am happy with the projects that I completed over the last year. The dashboard has been well-received by the coaching staff and has been used to inform game strategies. It is rewarding to see the impact of my work on the team’s performance and to know that the tools I have developed are being used to help the team succeed. The player contribution modelling project has yet to be deployed fully, but I plan on integrating it to the dashboard so that coaches can easily access the insights it provides. I am excited about the potential of this model to provide a more nuanced understanding of player performance and to help the coaching staff make more informed decisions about player development and game strategy.

  • Recently, I have been learning more about devops for data science. In school, being able to train a model with a good prediction score is sufficient. However, making a good model useful takes a lot of work. A useful model should be easily accessible to the end-users, and it should be able to handle new data and update itself accordingly. This requires setting up data pipelines, automating workflows, and deploying the model in a way that allows for easy access and maintenance. I have been learning about tools like Docker, Github Actions, and cloud platforms like AWS to help me with this process. I plan to lean into this area more in the future, and hone my skills as a data scientist who can not only build good models but make them useful by deploying and maintaining them.