Introduction¶
This analysis will use the PuLP python package for Linear Programming to find the draft kings team of Week 17 of the 2017 NFL season that has the maximum amount of average points that stays within the cost limits. I have downloaded the data for Week 17 of the 2017 NFL season draft kings contest onto my local machine downloads folder. The file is called DKSalaries.csv. Here we change the working directory to be where the file is.
import os
os.chdir("C:\\Users\\kyles\\Downloads") # windows
Import Required Packages¶
Note: You are going to have to install the python package PuLP for this analysis. This can get somewhat complicated but here are some helpful links to documentation and helpful stack overflow answers.
https://pythonhosted.org/PuLP/main/installing_pulp_at_home.html
https://pypi.python.org/pypi/PuLP
https://stackoverflow.com/questions/39299726/cant-find-package-on-anaconda-navigator-what-to-do-next
https://stackoverflow.com/questions/45156080/installing-modules-to-anaconda-from-tar-gz
https://anaconda.org/primer/pulp
from pulp import *
import numpy as np
import pandas as pd
Read in the data¶
Here we read the data into a pandas dataframe. If you are not familiar with pandas data frames you can check out the official documentation 10 min introduction here: https://pandas.pydata.org/pandas-docs/stable/10min.html. In Draft Kings, there is a constraint to how many players at each position you can have on your team, so we need a method to represent this in our optimization model. Here we do this by adding binary variables for each position QB, RB, WR, TE, or DST. A 1 for that variable would represent that the player can be drafted at that position, and a 0 otherwise. A player can only be drafted for one position type. Also the Salary
variable is converted from an integer to a float for consistency.
players = pd.read_csv("DKSalaries.csv")
players["RB"] = (players["Position"] == 'RB').astype(float)
players["WR"] = (players["Position"] == 'WR').astype(float)
players["QB"] = (players["Position"] == 'QB').astype(float)
players["TE"] = (players["Position"] == 'TE').astype(float)
players["DST"] = (players["Position"] == 'DST').astype(float)
players["Salary"] = players["Salary"].astype(float)
players.head(10)
Initialize Model and paramters¶
Now we initialize the optimization model with pulp.LpProblem
for a Linear Programming problem. We give the problem a name, whether we are maximizing or minimizing, and store it in a python variable. Then we need to create dictionaries for each one of our parameters: points
, cost
, QB
, RB
, WR
, TE
, and DST
. There is an additional dictionary to create the constriant for the total number of players constraint that will be described below.
model = pulp.LpProblem("Draft Kings", pulp.LpMaximize)
total_points = {}
cost = {}
QBs = {}
RBs = {}
WRs = {}
TEs = {}
DST = {}
number_of_players = {}
PuLP does not work directory with pandas and numpy objects, just with native python (hense the empty dictionaries we made above). Now we need to loop through the data frame and create an individual variable for each player that will take on binary values 1 = we draft the player, 0 = we do not draft the player.
This is done by looping through the rows of the data frame with the iterrows()
method. The name of each variable will be an x
followed by the index of the row that player is in (example: x23, x457). Then we create a PuLP variable object with pulp.LpVariable(); we give it the variable name and specify that it is a binary variable. Other variable options are Continuous
(default) and Integer
.
Then we fill the dictionaries that we created above. The keys of each dictionary are the 569 binary variables for each player. the values of the dictionaries are the players corresponding values for each variable (Avg PPG, Salary, etc). The values for the number_of_players
dictionary are all one; this is so we can create a constriant that sums over all of these 1s and make sure the optimization problem does not select more than 9 players.
# i = row index, player = player attributes
for i, player in players.iterrows():
var_name = 'x' + str(i) # Create variable name
decision_var = pulp.LpVariable(var_name, cat='Binary') # Initialize Variables
total_points[decision_var] = player["AvgPointsPerGame"] # Create PPG Dictionary
cost[decision_var] = player["Salary"] # Create Cost Dictionary
# Create Dictionary for Player Types
QBs[decision_var] = player["QB"]
RBs[decision_var] = player["RB"]
WRs[decision_var] = player["WR"]
TEs[decision_var] = player["TE"]
DST[decision_var] = player["DST"]
number_of_players[decision_var] = 1.0
Next we will use LpAffineExpressions
to create our optimization objective and constraints. The documentation for the AffineExpressions can be found at this link: https://pythonhosted.org/PuLP/pulp.html#pulp.LpConstraint. Essentially this will create a pulp object that multiplys a variable (the keys of our dictionary) by a parameter value (the values of our dictionaries), and then takes the sum of all the results:
NOTE: IF the above and below expressions do not show up as bold and centered equations refresh the page and they should. This is due to some squarespace bug.
This is a powerful tool to define linear objective functions and linear constraints. The next step is to create these affine expressions and add them to our defined model
. This is done with the python +=
operator. We our trying to create the optimization problem represented by:
- $I$ = {Set of 569 players available to draft}
- $x_{i}$ - Binary decision variable: 1 = player i is drafted, 0 = player i NOT drafted
- $PPG_{i}$ - Average points per game of player i
- $C_{i}$ - Cost to draft player i
- $QB_{i}$ - Binary parameter: 1 = player i is a QB, 0 = player i is NOT a QB
- $RB_{i}$ - Binary parameter: 1 = player i is a RB, 0 = player i is NOT a RB
- $WR_{i}$ - Binary parameter: 1 = player i is a WR, 0 = player i is NOT a WR
- $TE_{i}$ - Binary parameter: 1 = player i is a TE, 0 = player i is NOT a TE
- $DST_{i}$ - Binary parameter: 1 = player i is a DST, 0 = player i is NOT a DST
The way the teams are drafted there needs to be exactly 1 QB and DST. While there needs to be at least 2 RB's, 3 WR's, and 1 TE; and then there is a flex position that can be any of those three. This totals up to 9 total players. Since this is a maximization problem the strict equality constraints can be relaxed to less than or equal to constraints. The optimization algorithm will try to choose as many players as it can (within the 50,000 cost constraint).
# Define ojective function and add it to the model
objective_function = pulp.LpAffineExpression(total_points)
model += objective_function
#Define cost constraint and add it to the model
total_cost = pulp.LpAffineExpression(cost)
model += (total_cost <= 50000)
# Add player type constraints
QB_constraint = pulp.LpAffineExpression(QBs)
RB_constraint = pulp.LpAffineExpression(RBs)
WR_constraint = pulp.LpAffineExpression(WRs)
TE_constraint = pulp.LpAffineExpression(TEs)
DST_constraint = pulp.LpAffineExpression(DST)
total_players = pulp.LpAffineExpression(number_of_players)
model += (QB_constraint <= 1)
model += (RB_constraint <= 3)
model += (WR_constraint <= 4)
model += (TE_constraint <= 2)
model += (DST_constraint <= 1)
model += (total_players <= 9)
Solve the optimization problem¶
PuLP comes with it's own solver, or a variety of different propietary solvers can be integrated. The current status of model is 0, once the model is solved the status of the model will change to 1. If the linear program (LP) is infeasible, the status will change to -1. If the LP is unbounded, the status will be -2; and for undefined the status is -3. A more deatiled explanation can be found in the pulp documentation: https://pythonhosted.org/PuLP/constants.html#pulp.constants.LpStatus. Below is a list of all the possible solvers compatible with PuLP:
# All of the possible PuLP solvers
pulp.pulpTestAll()
model.status
model.solve()
Check the results¶
Currently, there are 569 individual python variables which hold the results of the interger linear program. We want to visualize the results with the original data frame. Here we loop through the PuLP decision variables to add them to a is_drafted
column in the original data frame.
players["is_drafted"] = 0.0
for var in model.variables():
# Set is drafted to the value determined by the LP
players.iloc[int(var.name[1:]),11] = var.varValue # column 11 = is_drafted
my_team = players[players["is_drafted"] == 1.0]
my_team = my_team[["Name","Position","teamAbbrev","Salary","AvgPointsPerGame"]]
my_team.head(10)
print "Total used amount of salary cap: {}".format(my_team["Salary"].sum())
print "Projected points for week 17: {}".format(my_team["AvgPointsPerGame"].sum().round(1))
Results and future analysis¶
The optimal solution used up all 50,000 of the available salary, and on average this team would score approximately 148.4 points. This is not really enough to do well in a draft kings contest. This stragey may be suited more for long term fantasy football leagues. Daily fantasy depends so much on picking the players that severly over preform each week, rather than picking a team the will consistently do well on average.
One possible future analysis would be to incorperate the probability of each player of doing abnormally well with some anomaly detection algorithms. Then instead of trying to pick the players that maximize on the average past performance, we can pick the players that have the best chance of doing abnormally well in the coming week. Another possible step for future analysis could be to scrap the data straight from the web instead of downloading it to a local computer and then importing it into pandas.
Resources:¶
Huge shoutout to Ben Alex Keen for this introduction to PuLP that can be found here: http://benalexkeen.com/linear-programming-with-python-and-pulp-part-6/
Also to Anna Nicanorova for this talk on Linear Programming in Python at PyData in 2015 found on YouTube here: https://www.youtube.com/watch?v=7yZ5xxdkTb8