Draft Kings Optimization

Draft Kings Player Optimization¶

A breif introdution to Linear Programming with Python¶

Kyle Stahl¶

January 2018¶

Introduction¶

This analysis will use the PuLP python package for Linear Programming to find the draft kings team of Week 17 of the 2017 NFL season that has the maximum amount of average points that stays within the cost limits. I have downloaded the data for Week 17 of the 2017 NFL season draft kings contest onto my local machine downloads folder. The file is called DKSalaries.csv. Here we change the working directory to be where the file is.

In [1]:
import os
os.chdir("C:\\Users\\kyles\\Downloads") # windows

Import Required Packages¶

In [2]:
from pulp import *
import numpy as np
import pandas as pd

Read in the data¶

Here we read the data into a pandas dataframe. If you are not familiar with pandas data frames you can check out the official documentation 10 min introduction here: https://pandas.pydata.org/pandas-docs/stable/10min.html. In Draft Kings, there is a constraint to how many players at each position you can have on your team, so we need a method to represent this in our optimization model. Here we do this by adding binary variables for each position QB, RB, WR, TE, or DST. A 1 for that variable would represent that the player can be drafted at that position, and a 0 otherwise. A player can only be drafted for one position type. Also the Salary variable is converted from an integer to a float for consistency.

In [3]:
players = pd.read_csv("DKSalaries.csv")
In [4]:
players["RB"] = (players["Position"] == 'RB').astype(float)
players["WR"] = (players["Position"] == 'WR').astype(float)
players["QB"] = (players["Position"] == 'QB').astype(float)
players["TE"] = (players["Position"] == 'TE').astype(float)
players["DST"] = (players["Position"] == 'DST').astype(float)
players["Salary"] = players["Salary"].astype(float)
In [5]:
players.head(10)
Out[5]:
Position Name Salary GameInfo AvgPointsPerGame teamAbbrev RB WR QB TE DST
0 RB Le'Veon Bell 10000.0 CLE@PIT 12/31/2017 01:00PM ET 23.907 PIT 1.0 0.0 0.0 0.0 0.0
1 RB Todd Gurley II 9800.0 SF@LAR 12/31/2017 04:25PM ET 27.087 LAR 1.0 0.0 0.0 0.0 0.0
2 WR Antonio Brown 8900.0 CLE@PIT 12/31/2017 01:00PM ET 23.879 PIT 0.0 1.0 0.0 0.0 0.0
3 RB Ezekiel Elliott 8700.0 DAL@PHI 12/31/2017 01:00PM ET 22.122 DAL 1.0 0.0 0.0 0.0 0.0
4 WR DeAndre Hopkins 8400.0 HOU@IND 12/31/2017 01:00PM ET 21.720 HOU 0.0 1.0 0.0 0.0 0.0
5 WR Julio Jones 8200.0 CAR@ATL 12/31/2017 04:25PM ET 16.206 ATL 0.0 1.0 0.0 0.0 0.0
6 RB LeSean McCoy 8000.0 BUF@MIA 12/31/2017 04:25PM ET 18.093 BUF 1.0 0.0 0.0 0.0 0.0
7 RB Alvin Kamara 7900.0 NO@TB 12/31/2017 04:25PM ET 19.840 NO 1.0 0.0 0.0 0.0 0.0
8 WR Keenan Allen 7800.0 OAK@LAC 12/31/2017 04:25PM ET 17.860 LAC 0.0 1.0 0.0 0.0 0.0
9 WR Michael Thomas 7700.0 NO@TB 12/31/2017 04:25PM ET 16.607 NO 0.0 1.0 0.0 0.0 0.0

Initialize Model and paramters¶

Now we initialize the optimization model with pulp.LpProblem for a Linear Programming problem. We give the problem a name, whether we are maximizing or minimizing, and store it in a python variable. Then we need to create dictionaries for each one of our parameters: points, cost, QB, RB, WR, TE, and DST. There is an additional dictionary to create the constriant for the total number of players constraint that will be described below.

In [6]:
model = pulp.LpProblem("Draft Kings", pulp.LpMaximize)
In [7]:
total_points = {}
cost = {}
QBs = {}
RBs = {}
WRs = {}
TEs = {}
DST = {}
number_of_players = {}

PuLP does not work directory with pandas and numpy objects, just with native python (hense the empty dictionaries we made above). Now we need to loop through the data frame and create an individual variable for each player that will take on binary values 1 = we draft the player, 0 = we do not draft the player.

This is done by looping through the rows of the data frame with the iterrows() method. The name of each variable will be an x followed by the index of the row that player is in (example: x23, x457). Then we create a PuLP variable object with pulp.LpVariable(); we give it the variable name and specify that it is a binary variable. Other variable options are Continuous (default) and Integer.

Then we fill the dictionaries that we created above. The keys of each dictionary are the 569 binary variables for each player. the values of the dictionaries are the players corresponding values for each variable (Avg PPG, Salary, etc). The values for the number_of_players dictionary are all one; this is so we can create a constriant that sums over all of these 1s and make sure the optimization problem does not select more than 9 players.

In [8]:
# i = row index, player = player attributes
for i, player in players.iterrows():
    var_name = 'x' + str(i) # Create variable name
    decision_var = pulp.LpVariable(var_name, cat='Binary') # Initialize Variables

    total_points[decision_var] = player["AvgPointsPerGame"] # Create PPG Dictionary
    cost[decision_var] = player["Salary"] # Create Cost Dictionary
    
    # Create Dictionary for Player Types
    QBs[decision_var] = player["QB"]
    RBs[decision_var] = player["RB"]
    WRs[decision_var] = player["WR"]
    TEs[decision_var] = player["TE"]
    DST[decision_var] = player["DST"]
    number_of_players[decision_var] = 1.0

Next we will use LpAffineExpressions to create our optimization objective and constraints. The documentation for the AffineExpressions can be found at this link: https://pythonhosted.org/PuLP/pulp.html#pulp.LpConstraint. Essentially this will create a pulp object that multiplys a variable (the keys of our dictionary) by a parameter value (the values of our dictionaries), and then takes the sum of all the results:

$$LpAffineExpression(\{x_1:a_1, x_2:a_2,...., x_I:a_I\} ) = \sum_{i}^{} x_ia_i$$

NOTE: IF the above and below expressions do not show up as bold and centered equations refresh the page and they should. This is due to some squarespace bug.

This is a powerful tool to define linear objective functions and linear constraints. The next step is to create these affine expressions and add them to our defined model. This is done with the python += operator. We our trying to create the optimization problem represented by:

  • $I$ = {Set of 569 players available to draft}
  • $x_{i}$ - Binary decision variable: 1 = player i is drafted, 0 = player i NOT drafted
  • $PPG_{i}$ - Average points per game of player i
  • $C_{i}$ - Cost to draft player i
  • $QB_{i}$ - Binary parameter: 1 = player i is a QB, 0 = player i is NOT a QB
  • $RB_{i}$ - Binary parameter: 1 = player i is a RB, 0 = player i is NOT a RB
  • $WR_{i}$ - Binary parameter: 1 = player i is a WR, 0 = player i is NOT a WR
  • $TE_{i}$ - Binary parameter: 1 = player i is a TE, 0 = player i is NOT a TE
  • $DST_{i}$ - Binary parameter: 1 = player i is a DST, 0 = player i is NOT a DST
\begin{align} maximize\ \sum_{i}^{} PPG_i x_i \qquad \qquad \\ subject\ to:\quad \qquad \qquad \qquad \qquad \qquad \\ \\ \sum_{i}^{} C_i x_i \le 50K \qquad \quad \sum_{i}^{} QB_i x_i \le 1 \\ \sum_{i}^{} RB_i x_i \le 3 \qquad \quad \sum_{i}^{} WR_i x_i \le 4 \\ \sum_{i}^{} TE_i x_i \le 2 \qquad \quad \sum_{i}^{} DST_i x_i \le 1 \\ \sum_{i}^{} x_i \le 9 \qquad \qquad \qquad x_i \in \{0,1\} \end{align}

The way the teams are drafted there needs to be exactly 1 QB and DST. While there needs to be at least 2 RB's, 3 WR's, and 1 TE; and then there is a flex position that can be any of those three. This totals up to 9 total players. Since this is a maximization problem the strict equality constraints can be relaxed to less than or equal to constraints. The optimization algorithm will try to choose as many players as it can (within the 50,000 cost constraint).

In [9]:
# Define ojective function and add it to the model
objective_function = pulp.LpAffineExpression(total_points)
model += objective_function

#Define cost constraint and add it to the model
total_cost = pulp.LpAffineExpression(cost)
model += (total_cost <= 50000)
In [10]:
# Add player type constraints
QB_constraint = pulp.LpAffineExpression(QBs)
RB_constraint = pulp.LpAffineExpression(RBs)
WR_constraint = pulp.LpAffineExpression(WRs)
TE_constraint = pulp.LpAffineExpression(TEs)
DST_constraint = pulp.LpAffineExpression(DST)
total_players = pulp.LpAffineExpression(number_of_players)

model += (QB_constraint <= 1)
model += (RB_constraint <= 3)
model += (WR_constraint <= 4)
model += (TE_constraint <= 2)
model += (DST_constraint <= 1)
model += (total_players <= 9)

Solve the optimization problem¶

PuLP comes with it's own solver, or a variety of different propietary solvers can be integrated. The current status of model is 0, once the model is solved the status of the model will change to 1. If the linear program (LP) is infeasible, the status will change to -1. If the LP is unbounded, the status will be -2; and for undefined the status is -3. A more deatiled explanation can be found in the pulp documentation: https://pythonhosted.org/PuLP/constants.html#pulp.constants.LpStatus. Below is a list of all the possible solvers compatible with PuLP:

In [11]:
# All of the possible PuLP solvers
pulp.pulpTestAll()
	 Testing zero subtraction
	 Testing inconsistant lp solution
	 Testing continuous LP solution
	 Testing maximize continuous LP solution
	 Testing unbounded continuous LP solution
	 Testing Long Names
	 Testing repeated Names
	 Testing zero constraint
	 Testing zero objective
	 Testing LpVariable (not LpAffineExpression) objective
	 Testing Long lines in LP
	 Testing LpAffineExpression divide
	 Testing MIP solution
	 Testing MIP solution with floats in objective
	 Testing MIP relaxation
	 Testing feasibility problem (no objective)
	 Testing an infeasible problem
	 Testing an integer infeasible problem
	 Testing column based modelling
	 Testing dual variables and slacks reporting
	 Testing fractional constraints
	 Testing elastic constraints (no change)
	 Testing elastic constraints (freebound)
	 Testing elastic constraints (penalty unchanged)
	 Testing elastic constraints (penalty unbounded)
* Solver pulp.solvers.PULP_CBC_CMD passed.
Solver pulp.solvers.CPLEX_DLL unavailable
Solver pulp.solvers.CPLEX_CMD unavailable
Solver pulp.solvers.CPLEX_PY unavailable
Solver pulp.solvers.COIN_CMD unavailable
Solver pulp.solvers.COINMP_DLL unavailable
Solver pulp.solvers.GLPK_CMD unavailable
Solver pulp.solvers.XPRESS unavailable
Solver pulp.solvers.GUROBI unavailable
Solver pulp.solvers.GUROBI_CMD unavailable
Solver pulp.solvers.PYGLPK unavailable
Solver pulp.solvers.YAPOSIB unavailable
In [12]:
model.status
Out[12]:
0
In [13]:
model.solve()
Out[13]:
1

Check the results¶

Currently, there are 569 individual python variables which hold the results of the interger linear program. We want to visualize the results with the original data frame. Here we loop through the PuLP decision variables to add them to a is_drafted column in the original data frame.

In [14]:
players["is_drafted"] = 0.0
for var in model.variables():
    # Set is drafted to the value determined by the LP
    players.iloc[int(var.name[1:]),11] = var.varValue # column 11 = is_drafted
In [15]:
my_team = players[players["is_drafted"] == 1.0]
my_team = my_team[["Name","Position","teamAbbrev","Salary","AvgPointsPerGame"]]
In [16]:
my_team.head(10)
Out[16]:
Name Position teamAbbrev Salary AvgPointsPerGame
1 Todd Gurley II RB LAR 9800.0 27.087
2 Antonio Brown WR PIT 8900.0 23.879
22 Russell Wilson QB SEA 6900.0 23.365
52 Tyreek Hill WR KC 6000.0 17.213
103 Carlos Hyde RB SF 4900.0 14.333
163 Will Fuller V WR HOU 4300.0 12.500
230 Jermaine Kearse WR NYJ 3800.0 11.407
467 Austin Seferian-Jenkins TE NYJ 2700.0 7.900
469 Eagles DST PHI 2700.0 10.667
In [17]:
print "Total used amount of salary cap: {}".format(my_team["Salary"].sum())
print "Projected points for week 17: {}".format(my_team["AvgPointsPerGame"].sum().round(1))
Total used amount of salary cap: 50000.0
Projected points for week 17: 148.4

Results and future analysis¶

The optimal solution used up all 50,000 of the available salary, and on average this team would score approximately 148.4 points. This is not really enough to do well in a draft kings contest. This stragey may be suited more for long term fantasy football leagues. Daily fantasy depends so much on picking the players that severly over preform each week, rather than picking a team the will consistently do well on average.

One possible future analysis would be to incorperate the probability of each player of doing abnormally well with some anomaly detection algorithms. Then instead of trying to pick the players that maximize on the average past performance, we can pick the players that have the best chance of doing abnormally well in the coming week. Another possible step for future analysis could be to scrap the data straight from the web instead of downloading it to a local computer and then importing it into pandas.

Resources:¶

Huge shoutout to Ben Alex Keen for this introduction to PuLP that can be found here: http://benalexkeen.com/linear-programming-with-python-and-pulp-part-6/
Also to Anna Nicanorova for this talk on Linear Programming in Python at PyData in 2015 found on YouTube here: https://www.youtube.com/watch?v=7yZ5xxdkTb8

In [ ]: