4  Tidy and Transform

Author

Ben Koshy

5 Tidy and Transform

5.1 Setup

Required packages:

Code
#install.packages("rvest")
#install.packages("stringr")
#install.packages("tidyverse")
#install.packages("janitor")
#install.packages("gt")
#install.packages("reactable")
#library(pins)
#library(vetiver)
#library(plumber)
#library(aws.s3)
#library(arrow)

library(rvest)
library(stringr)
library(tidyverse)
library(janitor)
library(gt)
library(reactable)
library(pins)
library(vetiver)
library(plumber)
library(aws.s3)
library(arrow)

5.2 Tidying the Data

The process of Tidying involved putting data into a tidy form and cleaning it. This aims to make later analysis and processing much more convenient. We already cleaned up the column names in the import step itself. We see that the column headers are named well, and our data appears to be correctly specified which is great!

Code
gt(plays[1:10,])
game_id play_id play_description quarter down yards_to_go possession_team defensive_team yardline_side yardline_number game_clock pre_snap_home_score pre_snap_visitor_score play_nullified_by_penalty absolute_yardline_number pre_snap_home_team_win_probability pre_snap_visitor_team_win_probability expected_points offense_formation receiver_alignment play_clock_at_snap pass_result pass_length target_x target_y play_action dropback_type dropback_distance pass_location_type time_to_throw time_in_tackle_box time_to_sack pass_tipped_at_line unblocked_pressure qb_spike qb_kneel qb_sneak rush_location_type penalty_yards pre_penalty_yards_gained yards_gained home_team_win_probability_added visitor_team_win_probility_added expected_points_added is_dropback pff_run_concept_primary pff_run_concept_secondary pff_run_pass_option pff_pass_coverage pff_man_zone
2022102302 2655 (1:54) (Shotgun) J.Burrow pass short middle to T.Boyd to CIN 30 for 9 yards (J.Hawkins). 3 1 10 CIN ATL CIN 21 01:54:00 35 17 N 31 0.982017488 0.017982512 0.7193135 EMPTY 3x2 10 C 6 36.69 16.51 FALSE TRADITIONAL 2.40 INSIDE_BOX 2.990 2.990 NA FALSE FALSE FALSE 0 NA NA NA 9 9 0.0046338425 -0.0046338425 0.7027167 TRUE NA NA 0 Cover-3 Zone
2022091809 3698 (2:13) (Shotgun) J.Burrow pass short right to H.Hurst to CIN 12 for 4 yards (L.Vander Esch). 4 1 10 CIN DAL CIN 8 02:13:00 17 17 N 18 0.424356237 0.575643763 0.6077456 EMPTY 3x2 9 C 4 20.83 20.49 FALSE TRADITIONAL 1.14 INSIDE_BOX 1.836 1.836 NA FALSE FALSE FALSE 0 NA NA NA 4 4 0.0028469265 -0.0028469265 -0.2405086 TRUE NA NA 0 Quarters Zone
2022103004 3146 (2:00) (Shotgun) D.Mills pass short right to D.Pierce to HST 26 for 6 yards (D.Walker). 4 3 12 HOU TEN HOU 20 02:00:00 3 17 N 30 0.006291237 0.993708763 -0.2914852 SHOTGUN 2x2 12 C -4 26.02 17.56 FALSE TRADITIONAL 3.20 INSIDE_BOX 2.236 2.236 NA FALSE FALSE FALSE 0 NA NA NA 6 6 0.0002047173 -0.0002047173 -0.2184804 TRUE NA NA 0 Quarters Zone
2022110610 348 (9:28) (Shotgun) P.Mahomes pass short left to I.Pacheco to TEN 19 for 4 yards (Z.Cunningham). 1 2 10 KC TEN TEN 23 09:28:00 0 0 N 33 0.884223158 0.115776842 4.2493820 SHOTGUN 2x2 11 C -6 38.95 14.19 FALSE TRADITIONAL 3.02 INSIDE_BOX 2.202 2.202 NA FALSE FALSE FALSE 0 NA NA NA 4 4 -0.0013082474 0.0013082474 -0.4277486 TRUE NA NA 0 Quarters Zone
2022102700 2799 (2:16) (Shotgun) L.Jackson up the middle to TB 28 for -1 yards (R.Nunez-Roches). 3 2 8 BAL TB TB 27 02:16:00 10 10 N 37 0.410371426 0.589628574 3.9284126 PISTOL 3x1 8 NA NA NA NA TRUE DESIGNED_RUN 2.03 NA NA NA NA NA NA NA 0 FALSE INSIDE_LEFT NA -1 -1 0.0271411501 -0.0271411501 -0.6389118 FALSE MAN READ OPTION 0 Cover-1 Man
2022100205 2314 (14:15) Ja.Williams up the middle to DET 32 for 3 yards (C.Barton; U.Nwosu). 3 2 6 DET SEA DET 29 14:15:00 15 31 N 39 0.138289124 0.861710876 1.0669310 SINGLEBACK 3x1 15 NA NA NA NA FALSE NA NA NA NA NA NA NA NA NA 0 FALSE INSIDE_RIGHT NA 3 3 -0.0242097043 0.0242097043 -0.4425174 FALSE MAN NA 0 Cover 6-Left Zone
2022110605 3861 (:29) (Shotgun) J.Wilkins up the middle to IND 45 for 5 yards (J.Uche). 4 1 10 IND NE IND 40 00:29:00 26 3 N 50 0.997811435 0.002188565 0.9911694 SHOTGUN 2x2 18 NA NA NA NA FALSE NA NA NA NA NA NA NA NA NA 0 FALSE OUTSIDE_LEFT NA 5 5 0.0021885647 -0.0021885647 -0.9911694 FALSE INSIDE ZONE NA 0 Cover-2 Zone
2022100203 3994 (:35) K.Murray kneels to CAR 29 for -1 yards. 4 3 12 ARI CAR CAR 28 00:35:00 16 26 N 82 0.005252870 0.994747130 2.1546908 NA NA 2 NA NA NA NA FALSE NA NA NA NA NA NA NA NA NA 1 FALSE UNKNOWN NA -1 -1 -0.0052528698 0.0052528698 0.0000000 FALSE UNDEFINED NA 0 NA NA
2022091104 3662 (12:51) (Shotgun) J.Hurts pass incomplete short right to D.Smith (A.Bryant) [J.Cominsky]. 4 3 12 PHI DET PHI 35 12:51:00 28 38 N 45 0.078610925 0.921389075 -0.1411298 SHOTGUN 3x1 3 I -6 44.69 10.53 FALSE TRADITIONAL 1.78 INSIDE_BOX 1.568 1.568 NA TRUE TRUE FALSE 0 NA NA NA 0 0 0.0123612862 -0.0123612862 -1.1616206 TRUE NA NA 0 Cover-0 Man
2022100204 1422 (5:22) (Shotgun) C.Rush pass short left to M.Gallup to 50 for 15 yards (K.Fuller). 2 3 8 DAL WAS DAL 35 05:22:00 6 7 N 45 0.539959710 0.460040290 0.4480018 SHOTGUN 3x1 12 C 15 59.63 40.74 FALSE TRADITIONAL 5.16 INSIDE_BOX 3.203 3.203 NA FALSE FALSE FALSE 0 NA NA NA 15 15 0.0584896356 -0.0584896356 2.1947601 TRUE NA NA 0 Quarters Zone
Code
gt(pp[1:10,])
game_id play_id nfl_id team_abbr had_rush_attempt rushing_yards had_dropback passing_yards sack_yards_as_offense had_pass_reception receiving_yards was_targetted_receiver yardage_gained_after_the_catch fumbles fumble_lost fumble_out_of_bounds assisted_tackle forced_fumble_as_defense half_sack_yards_as_defense pass_defensed quarterback_hit sack_yards_as_defense safety_as_defense solo_tackle tackle_assist tackle_for_a_loss tackle_for_a_loss_yardage had_interception interception_yards fumble_recoveries fumble_recovery_yards penalty_yards penalty_names was_initial_pass_rusher caused_pressure time_to_pressure_as_pass_rusher get_off_time_as_pass_rusher in_motion_at_ball_snap shift_since_lineset motion_since_lineset was_running_route route_ran blocked_player_nfl_id1 blocked_player_nfl_id2 blocked_player_nfl_id3 pressure_allowed_as_blocker time_to_pressure_allowed_as_blocker pff_defensive_coverage_assignment pff_primary_defensive_coverage_matchup_nfl_id pff_secondary_defensive_coverage_matchup_nfl_id
2022090800 56 35472 BUF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE FALSE NA NA 47917 NA NA 0 NA NA NA NA
2022090800 56 42392 BUF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE FALSE NA NA 47917 NA NA 0 NA NA NA NA
2022090800 56 42489 BUF 0 0 0 0 0 1 6 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE TRUE 1 IN NA NA NA NA NA NA NA NA
2022090800 56 44875 BUF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE FALSE NA NA 43335 NA NA 0 NA NA NA NA
2022090800 56 44985 BUF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE FALSE 1 OUT NA NA NA NA NA NA NA NA
2022090800 56 46076 BUF 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE FALSE NA NA NA NA NA NA NA NA NA NA
2022090800 56 47857 BUF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE FALSE NA NA NA NA NA NA NA NA NA NA
2022090800 56 47879 BUF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE FALSE 1 IN NA NA NA NA NA NA NA NA
2022090800 56 48512 BUF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE FALSE NA NA 41239 NA NA 0 NA NA NA NA
2022090800 56 52536 BUF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA FALSE NA NA FALSE FALSE FALSE 1 GO NA NA NA NA NA NA NA NA
Code
gt(players[1:10,])
nfl_id height weight birth_date college_name position display_name
25511 6-4 225 1977-08-03 Michigan QB Tom Brady
29550 6-4 328 1982-01-22 Arkansas T Jason Peters
29851 6-2 225 1983-12-02 California QB Aaron Rodgers
30842 6-6 267 1984-05-19 UCLA TE Marcedes Lewis
33084 6-4 217 1985-05-17 Boston College QB Matt Ryan
33099 6-6 245 1985-01-16 Delaware QB Joe Flacco
33107 6-4 315 1985-08-30 Virginia Tech T Duane Brown
33130 5-10 175 1986-12-01 California WR DeSean Jackson
33131 6-8 300 1986-09-01 Miami DE Calais Campbell
33138 6-3 222 1985-07-02 Michigan QB Chad Henne

The data given by NFL Big Data Bowl is already in quite clean for and is tidy overall. We do however need to do additional processing to help answer our question of interest.

5.3 Transforming the Data

We select specific plays that were runs, add columns for required yards and an indicator for a successful run. The logic for a successful run is as follows:

A run is considered “successful” if it gains:

  • At least 40% of the yards needed on 1st down,

  • At least 50% of the yards needed on 2nd down,

  • 100% of the yards needed (i.e., picks up the first down or TD) on 3rd or 4th down.

We also remove the unknown rush locations as that provides no value to us.

Code
runs <- plays |> filter(!is.na(rush_location_type)) |>
                          filter(rush_location_type != "UNKNOWN")

runs <- runs |>
  mutate(
    required_yards = case_when(
      down == 1 ~ 0.4 * yards_to_go,
      down == 2 ~ 0.5 * yards_to_go,
      TRUE      ~ yards_to_go
    ),
    successful_run = yards_gained >= required_yards
  )