Draft Kings Player OptimizationÂ¶

A breif introdution to Linear Programming with PythonÂ¶

Kyle StahlÂ¶

January 2018Â¶

IntroductionÂ¶

This analysis will use the PuLP python package for Linear Programming to find the draft kings team of Week 17 of the 2017 NFL season that has the maximum amount of average points that stays within the cost limits. I have downloaded the data for Week 17 of the 2017 NFL season draft kings contest onto my local machine downloads folder. The file is called DKSalaries.csv. Here we change the working directory to be where the file is.

import os
os.chdir("C:\\Users\\kyles\\Downloads") # windows

Import Required PackagesÂ¶

Note: You are going to have to install the python package PuLP for this analysis. This can get somewhat complicated but here are some helpful links to documentation and helpful stack overflow answers.
https://pythonhosted.org/PuLP/main/installing_pulp_at_home.html
https://pypi.python.org/pypi/PuLP
https://stackoverflow.com/questions/39299726/cant-find-package-on-anaconda-navigator-what-to-do-next
https://stackoverflow.com/questions/45156080/installing-modules-to-anaconda-from-tar-gz
https://anaconda.org/primer/pulp

from pulp import *
import numpy as np
import pandas as pd

Read in the dataÂ¶

Here we read the data into a pandas dataframe. If you are not familiar with pandas data frames you can check out the official documentation 10 min introduction here: https://pandas.pydata.org/pandas-docs/stable/10min.html. In Draft Kings, there is a constraint to how many players at each position you can have on your team, so we need a method to represent this in our optimization model. Here we do this by adding binary variables for each position QB, RB, WR, TE, or DST. A 1 for that variable would represent that the player can be drafted at that position, and a 0 otherwise. A player can only be drafted for one position type. Also the Salary variable is converted from an integer to a float for consistency.

players = pd.read_csv("DKSalaries.csv")

players["RB"] = (players["Position"] == 'RB').astype(float)
players["WR"] = (players["Position"] == 'WR').astype(float)
players["QB"] = (players["Position"] == 'QB').astype(float)
players["TE"] = (players["Position"] == 'TE').astype(float)
players["DST"] = (players["Position"] == 'DST').astype(float)
players["Salary"] = players["Salary"].astype(float)

players.head(10)

Initialize Model and paramtersÂ¶

Now we initialize the optimization model with pulp.LpProblem for a Linear Programming problem. We give the problem a name, whether we are maximizing or minimizing, and store it in a python variable. Then we need to create dictionaries for each one of our parameters: points, cost, QB, RB, WR, TE, and DST. There is an additional dictionary to create the constriant for the total number of players constraint that will be described below.

model = pulp.LpProblem("Draft Kings", pulp.LpMaximize)

total_points = {}
cost = {}
QBs = {}
RBs = {}
WRs = {}
TEs = {}
DST = {}
number_of_players = {}

PuLP does not work directory with pandas and numpy objects, just with native python (hense the empty dictionaries we made above). Now we need to loop through the data frame and create an individual variable for each player that will take on binary values 1 = we draft the player, 0 = we do not draft the player.

This is done by looping through the rows of the data frame with the iterrows() method. The name of each variable will be an x followed by the index of the row that player is in (example: x23, x457). Then we create a PuLP variable object with pulp.LpVariable(); we give it the variable name and specify that it is a binary variable. Other variable options are Continuous (default) and Integer.

Then we fill the dictionaries that we created above. The keys of each dictionary are the 569 binary variables for each player. the values of the dictionaries are the players corresponding values for each variable (Avg PPG, Salary, etc). The values for the number_of_players dictionary are all one; this is so we can create a constriant that sums over all of these 1s and make sure the optimization problem does not select more than 9 players.

# i = row index, player = player attributes
for i, player in players.iterrows():
    var_name = 'x' + str(i) # Create variable name
    decision_var = pulp.LpVariable(var_name, cat='Binary') # Initialize Variables

    total_points[decision_var] = player["AvgPointsPerGame"] # Create PPG Dictionary
    cost[decision_var] = player["Salary"] # Create Cost Dictionary
    
    # Create Dictionary for Player Types
    QBs[decision_var] = player["QB"]
    RBs[decision_var] = player["RB"]
    WRs[decision_var] = player["WR"]
    TEs[decision_var] = player["TE"]
    DST[decision_var] = player["DST"]
    number_of_players[decision_var] = 1.0

Next we will use LpAffineExpressions to create our optimization objective and constraints. The documentation for the AffineExpressions can be found at this link: https://pythonhosted.org/PuLP/pulp.html#pulp.LpConstraint. Essentially this will create a pulp object that multiplys a variable (the keys of our dictionary) by a parameter value (the values of our dictionaries), and then takes the sum of all the results:

$$LpAffineExpression(\{x_1:a_1, x_2:a_2,...., x_I:a_I\} ) = \sum_{i}^{} x_ia_i$$

NOTE: IF the above and below expressions do not show up as bold and centered equations refresh the page and they should. This is due to some squarespace bug.

This is a powerful tool to define linear objective functions and linear constraints. The next step is to create these affine expressions and add them to our defined model. This is done with the python += operator. We our trying to create the optimization problem represented by:

$I$ = {Set of 569 players available to draft}
$x_{i}$ - Binary decision variable: 1 = player i is drafted, 0 = player i NOT drafted
$PPG_{i}$ - Average points per game of player i
$C_{i}$ - Cost to draft player i
$QB_{i}$ - Binary parameter: 1 = player i is a QB, 0 = player i is NOT a QB
$RB_{i}$ - Binary parameter: 1 = player i is a RB, 0 = player i is NOT a RB
$WR_{i}$ - Binary parameter: 1 = player i is a WR, 0 = player i is NOT a WR
$TE_{i}$ - Binary parameter: 1 = player i is a TE, 0 = player i is NOT a TE
$DST_{i}$ - Binary parameter: 1 = player i is a DST, 0 = player i is NOT a DST

\begin{align} maximize\ \sum_{i}^{} PPG_i x_i \qquad \qquad \\ subject\ to:\quad \qquad \qquad \qquad \qquad \qquad \\ \\ \sum_{i}^{} C_i x_i \le 50K \qquad \quad \sum_{i}^{} QB_i x_i \le 1 \\ \sum_{i}^{} RB_i x_i \le 3 \qquad \quad \sum_{i}^{} WR_i x_i \le 4 \\ \sum_{i}^{} TE_i x_i \le 2 \qquad \quad \sum_{i}^{} DST_i x_i \le 1 \\ \sum_{i}^{} x_i \le 9 \qquad \qquad \qquad x_i \in \{0,1\} \end{align}

The way the teams are drafted there needs to be exactly 1 QB and DST. While there needs to be at least 2 RB's, 3 WR's, and 1 TE; and then there is a flex position that can be any of those three. This totals up to 9 total players. Since this is a maximization problem the strict equality constraints can be relaxed to less than or equal to constraints. The optimization algorithm will try to choose as many players as it can (within the 50,000 cost constraint).

# Define ojective function and add it to the model
objective_function = pulp.LpAffineExpression(total_points)
model += objective_function

#Define cost constraint and add it to the model
total_cost = pulp.LpAffineExpression(cost)
model += (total_cost <= 50000)

# Add player type constraints
QB_constraint = pulp.LpAffineExpression(QBs)
RB_constraint = pulp.LpAffineExpression(RBs)
WR_constraint = pulp.LpAffineExpression(WRs)
TE_constraint = pulp.LpAffineExpression(TEs)
DST_constraint = pulp.LpAffineExpression(DST)
total_players = pulp.LpAffineExpression(number_of_players)

model += (QB_constraint <= 1)
model += (RB_constraint <= 3)
model += (WR_constraint <= 4)
model += (TE_constraint <= 2)
model += (DST_constraint <= 1)
model += (total_players <= 9)

Solve the optimization problemÂ¶

PuLP comes with it's own solver, or a variety of different propietary solvers can be integrated. The current status of model is 0, once the model is solved the status of the model will change to 1. If the linear program (LP) is infeasible, the status will change to -1. If the LP is unbounded, the status will be -2; and for undefined the status is -3. A more deatiled explanation can be found in the pulp documentation: https://pythonhosted.org/PuLP/constants.html#pulp.constants.LpStatus. Below is a list of all the possible solvers compatible with PuLP:

# All of the possible PuLP solvers
pulp.pulpTestAll()

	 Testing zero subtraction
	 Testing inconsistant lp solution
	 Testing continuous LP solution
	 Testing maximize continuous LP solution
	 Testing unbounded continuous LP solution
	 Testing Long Names
	 Testing repeated Names
	 Testing zero constraint
	 Testing zero objective
	 Testing LpVariable (not LpAffineExpression) objective
	 Testing Long lines in LP
	 Testing LpAffineExpression divide
	 Testing MIP solution
	 Testing MIP solution with floats in objective
	 Testing MIP relaxation
	 Testing feasibility problem (no objective)
	 Testing an infeasible problem
	 Testing an integer infeasible problem
	 Testing column based modelling
	 Testing dual variables and slacks reporting
	 Testing fractional constraints
	 Testing elastic constraints (no change)
	 Testing elastic constraints (freebound)
	 Testing elastic constraints (penalty unchanged)
	 Testing elastic constraints (penalty unbounded)
* Solver pulp.solvers.PULP_CBC_CMD passed.
Solver pulp.solvers.CPLEX_DLL unavailable
Solver pulp.solvers.CPLEX_CMD unavailable
Solver pulp.solvers.CPLEX_PY unavailable
Solver pulp.solvers.COIN_CMD unavailable
Solver pulp.solvers.COINMP_DLL unavailable
Solver pulp.solvers.GLPK_CMD unavailable
Solver pulp.solvers.XPRESS unavailable
Solver pulp.solvers.GUROBI unavailable
Solver pulp.solvers.GUROBI_CMD unavailable
Solver pulp.solvers.PYGLPK unavailable
Solver pulp.solvers.YAPOSIB unavailable

model.status

0

model.solve()

1

Check the resultsÂ¶

Currently, there are 569 individual python variables which hold the results of the interger linear program. We want to visualize the results with the original data frame. Here we loop through the PuLP decision variables to add them to a is_drafted column in the original data frame.

players["is_drafted"] = 0.0
for var in model.variables():
    # Set is drafted to the value determined by the LP
    players.iloc[int(var.name[1:]),11] = var.varValue # column 11 = is_drafted

my_team = players[players["is_drafted"] == 1.0]
my_team = my_team[["Name","Position","teamAbbrev","Salary","AvgPointsPerGame"]]

my_team.head(10)

print "Total used amount of salary cap: {}".format(my_team["Salary"].sum())
print "Projected points for week 17: {}".format(my_team["AvgPointsPerGame"].sum().round(1))

Total used amount of salary cap: 50000.0
Projected points for week 17: 148.4

Results and future analysisÂ¶

The optimal solution used up all 50,000 of the available salary, and on average this team would score approximately 148.4 points. This is not really enough to do well in a draft kings contest. This stragey may be suited more for long term fantasy football leagues. Daily fantasy depends so much on picking the players that severly over preform each week, rather than picking a team the will consistently do well on average.

One possible future analysis would be to incorperate the probability of each player of doing abnormally well with some anomaly detection algorithms. Then instead of trying to pick the players that maximize on the average past performance, we can pick the players that have the best chance of doing abnormally well in the coming week. Another possible step for future analysis could be to scrap the data straight from the web instead of downloading it to a local computer and then importing it into pandas.

Resources:Â¶

Huge shoutout to Ben Alex Keen for this introduction to PuLP that can be found here: http://benalexkeen.com/linear-programming-with-python-and-pulp-part-6/
Also to Anna Nicanorova for this talk on Linear Programming in Python at PyData in 2015 found on YouTube here: https://www.youtube.com/watch?v=7yZ5xxdkTb8

	Name	Position	teamAbbrev	Salary	AvgPointsPerGame
1	Todd Gurley II	RB	LAR	9800.0	27.087
2	Antonio Brown	WR	PIT	8900.0	23.879
22	Russell Wilson	QB	SEA	6900.0	23.365
52	Tyreek Hill	WR	KC	6000.0	17.213
103	Carlos Hyde	RB	SF	4900.0	14.333
163	Will Fuller V	WR	HOU	4300.0	12.500
230	Jermaine Kearse	WR	NYJ	3800.0	11.407
467	Austin Seferian-Jenkins	TE	NYJ	2700.0	7.900
469	Eagles	DST	PHI	2700.0	10.667

Draft Kings Optimization