Импортируем csv -> DataFrame (бейсболисты 1952 строки), строим таблицу .Year.value_counts(), выбираем 2010 год - Create a subset of mlb dataset for Year 2010, сортируем по индексу (его можно задать разными способами), строим диаграмму доходов top-10 игроков в 2010году.
Python for Data Analysis Lightning Tutorials Pandas Cookbook Series
Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.
Created by Alfred Essa, Dec 22nd, 2013
Note: IPython Notebook and Data files can be found at my Github Site: http://github/alfredessa
Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.
Created by Alfred Essa, Dec 22nd, 2013
Note: IPython Notebook and Data files can be found at my Github Site: http://github/alfredessa
In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo('rNmn8bLFgdg')
Out[2]:
Chapter 2: Common Operations 2.1 Problem. How can I sort in a DataFrame?
2.11 Review: DataFrame Object
The DataFrame data structure in Pandas is a two-dimensional labeled array.
In [1]:
from IPython.display import Image
Image(filename='C:/Users/kiss/Documents/GitHub/pdacookbook/images/df2.jpg',width=400)
Out[1]:
2.12 Import Pandas and Read .CSV File of Major League Baseball Salaries
In [4]:
import pandas as pd
In [5]:
mlb = pd.read_csv('C:/Users/kiss/Documents/GitHub/pdacookbook/data/mlbsalaries.csv')
In [7]:
mlb.tail()
Out[7]:
2.13 Review: Use value_counts() method to identify unique values and corresponding counts
In [6]:
mlb.Year.value_counts()
Out[6]:
2.14 Create a subset of mlb dataset for Year 2010
In [8]:
yr2010 = mlb[mlb.Year==2010]
In [9]:
yr2010 = yr2010.set_index('Player')
In [10]:
yr2010.head()
Out[10]:
2.15 Sort Operations
In [11]:
# sort row labels
yr2010.sort_index().head()
Out[11]:
In [12]:
# sort column labels
yr2010.sort_index(axis=1).head()
Out[12]:
In [13]:
# sort column values using order field; note: the order field returns a series
yr2010.Salary.order(ascending=False).head()
Out[13]:
In [14]:
# sort column values using sort_index method
sorted_yr2010 = yr2010.sort_index(ascending=False, by = ['Salary'])
In [15]:
sorted_yr2010.head(20)
Out[15]:
In [16]:
yr2010.sort_index(ascending=[False,True], by =['Salary', 'Team']).head(20)
Out[16]:
In [17]:
# Top 10 highest paid players
top10 = yr2010.Salary.order(ascending=False).head()
In [18]:
type(top10)
Out[18]:
In [20]:
%pylab inline
In [21]:
#plot highest paid
plt.figure()
top10.plot(label='Salaries')
xticks(rotation='vertical')
plt.legend()
Out[21]:
Посты чуть ниже также могут вас заинтересовать
Комментариев нет:
Отправить комментарий