Махди Юсуф записал 20-ти минутный скринкаст с данными, которые мне удалось найти по имени файла Kevin Durant 2012-13 Game Log Он анализирует действия игрока Кевина Дюранта...
Файл csv пришлось предварительно отредактировать, создание новой таблицы, процедура замены строки "40:00" числом (секунд) -2400, группировка... Пост заканчивается диаграммой, построенной с помощью модуля vincent - Vega
Файл csv пришлось предварительно отредактировать, создание новой таблицы, процедура замены строки "40:00" числом (секунд) -2400, группировка... Пост заканчивается диаграммой, построенной с помощью модуля vincent - Vega
При импорте vincent возникли трудности, так как не была задокументирована команда "from IPython.display import display, HTML, Javascript". Решение было найдено благодаря http://nbviewer.ipython.org/gist/anonymous/5436794
In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('BM7j6YGOv7U')
Out[1]:
Introductijn Pandas DataFrames¶
In [60]:
# do not forget to do it if it needs
from IPython.display import display, HTML, Javascript
# %pylab inline
In [3]:
import pandas as pd
In [7]:
pd.set_option('display.max_columns',None)
In [8]:
# I forgot ... I`ve already imported Pandas
from pandas import DataFrame, Series
In [61]:
# https://pypi.python.org/pypi/vincent
import vincent
vincent.core.initialize_notebook()
Importing csv data¶
As you can see the CSV file has two defects (... 'Tm','','Opp','','GS',...) - missing items in header row. And huge problem with rows (header row is repeating after every 20 rows). But there are not any remarks of it in video...
I have removed repeated header rows and insert two missing header strings literals 'H/A','W/L'
I have removed repeated header rows and insert two missing header strings literals 'H/A','W/L'
In [11]:
columns=['Rk','G','Date','Age','Tm','H/A','Opp','W/L','GS','MP','FG','FGA','FG%','3P','3PA','3P%','FT','FTA','FT%','ORB','DRB','TRB','AST','STL','BLK','TOV','PF','PTS','GmSc','+/-']
In [37]:
data=pd.read_csv('../data/KevinDuran2012-13GameLog.csv',names=columns)
data.head()
Out[37]:
Here you can dig the explanations of all literals
In []:
...<tr class="">
<th data-stat="ranker" align="right" class="ranker sort_default_asc show_partial_when_sorting" tip="Rank">Rk</th>
<th data-stat="game_season" align="right" class="tooltip" tip="Season Game">G</th>
<th data-stat="date_game" align="left" class="tooltip sort_default_asc">Date</th>
<th data-stat="age" align="center" class="tooltip sort_default_asc" tip="Age of Player at the start of February 1st of that season.">Age</th>
<th data-stat="team_id" align="left" class="tooltip sort_default_asc" tip="Team">Tm</th>
<th data-stat="game_location" align="center" class="tooltip"></th>
<th data-stat="opp_id" align="left" class="tooltip sort_default_asc" tip="Opponent">Opp</th>
<th data-stat="game_result" align="center" class="tooltip"></th>
<th data-stat="gs" align="right" class="tooltip" tip="Games Started">GS</th>
<th data-stat="mp" align="right" class="tooltip" tip="Minutes Played">MP</th>
<th data-stat="fg" align="right" class="tooltip" tip="Field Goals">FG</th>
<th data-stat="fga" align="right" class="tooltip" tip="Field Goal Attempts">FGA</th>
<th data-stat="fg_pct" align="right" class="tooltip" tip="Field Goal Percentage">FG%</th>
<th data-stat="fg3" align="right" class="tooltip" tip="3-Point Field Goals">3P</th>
<th data-stat="fg3a" align="right" class="tooltip" tip="3-Point Field Goal Attempts">3PA</th>
<th data-stat="fg3_pct" align="right" class="tooltip" tip="3-Point Field Goal Percentage">3P%</th>
<th data-stat="ft" align="right" class="tooltip" tip="Free Throws">FT</th>
<th data-stat="fta" align="right" class="tooltip" tip="Free Throw Attempts">FTA</th>
<th data-stat="ft_pct" align="right" class="tooltip" tip="Free Throw Percentage">FT%</th>
<th data-stat="orb" align="right" class="tooltip" tip="Offensive Rebounds">ORB</th>
<th data-stat="drb" align="right" class="tooltip" tip="Defensive Rebounds">DRB</th>
<th data-stat="trb" align="right" class="tooltip" tip="Total Rebounds">TRB</th>
<th data-stat="ast" align="right" class="tooltip" tip="Assists">AST</th>
<th data-stat="stl" align="right" class="tooltip" tip="Steals">STL</th>
<th data-stat="blk" align="right" class="tooltip" tip="Blocks">BLK</th>
<th data-stat="tov" align="right" class="tooltip" tip="Turnovers">TOV</th>
<th data-stat="pf" align="right" class="tooltip" tip="Personal Fouls">PF</th>
<th data-stat="pts" align="right" class="tooltip" tip="Points">PTS</th>
<th data-stat="game_score" align="right" class="tooltip" tip="Game Score">GmSc</th>
<th data-stat="plus_minus" align="right" class="tooltip" tip="Plus/Minus">+/-</th>
...</table>
In []:
Deleting columns
In [38]:
del data['Rk']
del data['H/A']
del data['Tm']
#del data['Opp']
del data['GS']
del data['W/L']
data.head()
Out[38]:
Fields Goal Made/Fields Goal Attempted per minute¶
In [20]:
data[['MP','FG','FGA']].dtypes
Out[20]:
In [27]:
temp=data[['MP','FG','FGA']]
MP is object (string). It is not conviniet for us. Let us try to convert it into seconds...
In [22]:
import time
import datetime
In [26]:
def string_to_seconds(minutes):
minutes=str(minutes)
minutes=time.strptime(minutes,'%M:%S')
return datetime.timedelta(minutes=minutes.tm_min, seconds=minutes.tm_sec).total_seconds()
print string_to_seconds('40:00')
And now We can take every element of 'MP' column and... convert it to seconds
In [29]:
temp['MP']=temp['MP'].map(string_to_seconds)
temp.head()
Out[29]:
In [30]:
temp.dtypes
Out[30]:
In [34]:
# Attempts per minute - create new column
temp['FGA/M']=temp['FGA']*60/temp['MP']
temp['FG/M']=temp['FG']*60/temp['MP']
temp.head()
Out[34]:
In [35]:
temp.describe()
Out[35]:
Let form new subset for groups¶
In [39]:
data.head()
Out[39]:
We deleted 'Opp' column earlier and have to rerun data=pd.read_csv('../data/KevinDuran2012-13GameLog.csv',names=columns) in the cell after 'Importing csv data' item of this paper...
In [40]:
group_by_opp=data.groupby('Opp')
In [42]:
group_by_opp.size()
Out[42]:
Now if you want figure out how mane shots he took against particular team
In [43]:
group_by_opp.sum()
Out[43]:
In [44]:
data[data.Opp=='ATL']
Out[44]:
In [62]:
field_goal_per_team=group_by_opp.sum()[['FGA','FG']]
field_goal_per_team
Out[62]:
In [63]:
stacked=vincent.StackedBar(field_goal_per_team)
In [64]:
stacked.legend(title='Field Goals')
stacked.scales['x'].padding=0.1
display(stacked)
In [66]:
stacked.display()
In []:
Посты чуть ниже также могут вас заинтересовать
Комментариев нет:
Отправить комментарий