The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
require(ggplot2)
Loading required package: ggplot2
require(tidyr)
Loading required package: tidyr
require(lubridate)
Loading required package: lubridate
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
require(tidyverse)
Loading required package: tidyverse
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ purrr 1.0.2 ✔ tibble 3.2.1
✔ readr 2.1.5
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
require(readr)library(DT)
Warning: package 'DT' was built under R version 4.4.3
Hart Memorial Trophy Consideration: Identifying the Top-Performing NHL Players to their Respective Teams
To identify the top-performing NHL players for the Hart Memorial Trophy consideration, we need to analyze player performance based on key metrics such as goals, assists, points, and plus-minus rating. The following R code snippet demonstrates how to load and analyze NHL player data to identify the top-performing players based on these metrics.
#|message: false#|warning: false# Load the datasetnhl_lines <-read.csv("lines.csv")
To do this, we created key metrics such as net value, net value per 60 minutes, team share, and “most valuable player” (MVP) score to evaluate player performance. We then filtered the dataset to include only players with a sufficient number of games played and calculated the MVP score for each player based on these metrics.
MVP Score measures the player’s overall contribution to the team’s success, considering their performance in key areas such as goals, assists, and time on ice. The formula for MVP Score is a combination of net value per 60 minutes, time on ice, and team contribution percentage. The higher the MVP Score, the more valuable the player is to their team.
The results shows that the pair Ekholm-Bouchard are the top-performing players based on the MVP score (0.705), followed by the triple Hyman-McDavid-Draisaitl for team Edmonton Oilers. The pairs Slavis-Burns, Gostisbehere-Walker, and Orlov-Chatfield also played an indispensable role in team CAR.
Meanwhile the Hartman Memorial trophy goes to the pair Ekholm-Bouchard for their outstanding performance in the recent season.
Vezina Trophy Consideration: Identifying the Top-Performing NHL Goalies
To find the best goalies in the NHL, we need to analyze their performance based on key metrics such as save percentage and goals against average (GAA). The following R code snippet demonstrates how to load and analyze NHL goalie data to identify the top-performing goalies based on these metrics for the Vezina Trophy consideration.
#|message: false#|warning: false# Load the datasetshots_data <-read.csv("shots_2024.csv")
We summarize the goalie performance based on the number of shots faced, goals allowed, saves made, save percentage, and games played. We then calculate the goals against average (GAA) for each goalie to further evaluate their performance.
goalie_stats <- shots_data %>%filter(shotWasOnGoal ==1) %>%# Only consider shots on goalgroup_by(goalieNameForShot) %>%summarise(Shots_On_Goal =n(),Goals_Allowed =sum(goal),Saves = Shots_On_Goal - Goals_Allowed,Save_Percentage = Saves / Shots_On_Goal,Games_Played =n_distinct(game_id) ) %>%arrange(desc(Save_Percentage))# View top goalieshead(goalie_stats)
Next, we calculate the goals against average (GAA) for each goalie by dividing the total goals allowed by the total games played. We then merge the two datasets to compare the goalies based on save percentage and GAA.
## Merge the two datasets and delete the duplicate colummnsgoalie_performance <-merge(goalie_stats, goalie_gaa, by="goalieNameForShot")# Rank by Save Percentage and GAAgoalie_performance <- goalie_performance %>%arrange(desc(Save_Percentage), GAA)# View top-ranked goalieshead(goalie_performance)
To filter out goalies who have faced a minimum number of shots and played a minimum number of games, we set thresholds for shots faced and games played. We then filter the dataset to include only goalies who meet these criteria. By the way, forgo the duplicate columns.
# Set a threshold for minimum shots facedmin_shots <-1000min_games_played <-50# Filter goalies who faced at least 'min_shots'goalie_performance_filtered <- goalie_performance %>%filter(Shots_On_Goal >= min_shots & Games_Played.x >= min_games_played) %>%arrange(desc(Save_Percentage), GAA)# View the updated rankingshead(goalie_performance_filtered)
Finally, we create a scatter plot to visualize the relationship between save percentage and goals against average for the top-performing goalies. We use point size to represent the number of shots faced by each goalie. The plot provides a clear comparison of goalie performance based on these key metrics.
ggplot(goalie_performance_filtered, aes(x = Save_Percentage, y = GAA, label = goalieNameForShot, size = Shots_On_Goal)) +geom_point(alpha =0.8) +# Semi-transparent points for better visualizationgeom_text(vjust =-1, hjust =0.5, size =3.5, fontface ="bold") +# Clearer labelsscale_size(range =c(3, 8)) +# Adjust point sizes for better distinctionlabs(title ="🏒 Goalie Performance: Save Percentage vs Goals Against Average",subtitle =paste("Minimum", min_shots, "shots faced required for inclusion"),x ="Save Percentage (Higher is Better)",y ="Goals Against Average (Lower is Better)",size ="Shots Faced" ) +theme_minimal()
We adjudge Connor Hellebuyck, Andrei Vasilevskiy, and Ilya Sorokin as the top-3-performing goalies based on save percentage, goals against average and games played, with Connor being the best goaltender for the Vezina Trophy.
James Norris Memorial Trophy Consideration: Identifying the “Best All-Around” Defenseman
Here, we used the NHL Draft Stats dataset to identify the top defensemen based on key metrics such as points per game, assists per game, goals per game, offensive score, and defensive score. We then filtered the dataset to include only defensemen who played more than 50 games and ranked them based on their offensive and defensive scores.
# Load the datasetNHLDraftStats <-read.csv("SkaterIndividualstats.csv")
Offensive score is calculated as the sum of points per game, assists per game, and goals per game, while defensive score is calculated as the sum of shots blocked, hits, takeaways, and penalized by giveaways per game. We then rank the defensemen based on their offensive and defensive scores to identify the top performers.
# Filter out defensemendefensemen <- NHLDraftStats %>%filter(Position =="D")# Compute key metricsdefensemen <- defensemen %>%mutate(PPG = Total.Points / GP, # points per gameAPG = Total.Assists / GP,# assists per gameGPG = Goals / GP, # goals per gameOffensive.Score = PPG + APG + GPG, # offensive scoreDefensive.Score = (Shots.Blocked + Hits + Takeaways - Giveaways) / GP )# Shortlisted to players who played more than 50 gamesdefensemen <- defensemen %>%filter(GP >50)# Rank players based on Defensive Score PPG and total pointstop_defensemen <- defensemen %>%arrange(desc(Defensive.Score + Offensive.Score)) %>%head(10)
Here is a visualization of the top 5 defensemen based on their total points, combining offensive and defensive scores. The bar chart provides a clear comparison of the top defensemen based on their overall performance.
# Bar chart of total pointsggplot(top_defensemen, aes(x =reorder(Player, -(Defensive.Score * Offensive.Score)), y = Defensive.Score * Offensive.Score, fill = Player)) +geom_bar(stat ='identity') +labs(title ='Top 5 Defensemen - Total Points', x ='Player', y ='Total Points') +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
We ranked Moritz Seider, MacKenzie Weegar, Cale Makar, Colton Parayko and Neal Pionk as the top 5 defensemen based on their overall performance. Moritz Seider is the best defenseman for the James Norris Memorial Trophy.
Calder Memorial Trophy Consideration: Identifying the Top Rookie Performer
# Load the datasetrookie_stats <-read.csv("RookieSkaterIndividual.csv")