Поиск по блогу

четверг, 13 февраля 2014 г.

DataFrame object in Pandas

DataFrame object - это таблица с индексом. Задаем список дат (datetime), при помощи словаря задаем четыре столбца таблицы, вырезаем один столбец (Series)... следующий пример (Titanic): читаем csv-> DataFrame, еще пример "Olympic Medalists"
Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.

Created by Alfred Essa, Dec 15th, 2013
Note: IPython Notebook and Data files can be found at my Github Site: http://github/alfredessa
In [8]:
from IPython.display import YouTubeVideo
YouTubeVideo('lhkchS9gSYk')
Out[8]:

Chapter 1: Data Structures

1.2 Problem. How can I create a DataFrame object in Pandas?

1.21 What is a DataFrame?

The DataFrame data structure in Pandas is a two-dimensional labeled array.
  • Data in the array can be of any type (integers, strings, floating point numbers, Python objects, etc.).
  • Data within each column is homogeneous
  • By default Pandas creates a numerical index for the rows in sequence 0...n
In [5]:
Image(filename='C:/Users/kiss/Documents/GitHub/pdacookbook/images/df1.jpg',width=400)
Out[5]:
Here's an example where we have set the Dates column to be the index and label for the rows.
In [6]:
Image(filename='C:/Users/kiss/Documents/GitHub/pdacookbook/images/df2.jpg',width=400)
Out[6]:

1.22 Preliminaries - import pandas and datetime library; create data for populating our first dataframe object

In [9]:
import pandas as pd
import datetime
In [10]:
# create a list containing dates from 12-01 to 12-07
dt = datetime.datetime(2013,12,1)
end = datetime.datetime(2013,12,8)
step = datetime.timedelta(days=1)
dates = []
In [11]:
# populate the list
while dt < end:
    dates.append(dt.strftime('%m-%d'))
    dt += step
In [12]:
dates
Out[12]:
['12-01', '12-02', '12-03', '12-04', '12-05', '12-06', '12-07']
In [13]:
d = {'Date': dates, 'Tokyo' : [15,19,15,11,9,8,13], 'Paris': [-2,0,2,5,7,-5,-3], 'Mumbai':[20,18,23,19,25,27,23]}
In [14]:
d
Out[14]:
{'Date': ['12-01', '12-02', '12-03', '12-04', '12-05', '12-06', '12-07'],
 'Mumbai': [20, 18, 23, 19, 25, 27, 23],
 'Paris': [-2, 0, 2, 5, 7, -5, -3],
 'Tokyo': [15, 19, 15, 11, 9, 8, 13]}

1.23 Example 1: Create Dataframe Object from a Python Dictionary of equal length lists

In [15]:
temps = pd.DataFrame(d)
In [28]:
pd.DataFrame(d)
Out[28]:
Date Mumbai Paris Tokyo
0 12-01 20 -2 15
1 12-02 18 0 19
2 12-03 23 2 15
3 12-04 19 5 11
4 12-05 25 7 9
5 12-06 27 -5 8
6 12-07 23 -3 13
7 rows × 4 columns
In [16]:
ntemp = temps['Mumbai']
In [26]:
type(ntemp)
Out[26]:
pandas.core.series.Series
In [17]:
# this is a Series (object)
ntemp
Out[17]:
0    20
1    18
2    23
3    19
4    25
5    27
6    23
Name: Mumbai, dtype: int64
In [18]:
temps = temps.set_index('Date')
In [19]:
temps
Out[19]:
Mumbai Paris Tokyo
Date
12-01 20 -2 15
12-02 18 0 19
12-03 23 2 15
12-04 19 5 11
12-05 25 7 9
12-06 27 -5 8
12-07 23 -3 13
7 rows × 3 columns

1.24 Example 2 : Create DataFrame Object by reading a .csv file (Titanic passengers)

In [20]:
titanic = pd.read_csv('C:/Users/kiss/Documents/GitHub/pdacookbook/data/titanic.csv')
In [32]:
titanic.head()
Out[32]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NaN S
5 rows × 12 columns
In [34]:
titanic.Sex.value_counts()
Out[34]:
male      577
female    314
dtype: int64
In [21]:
titanic.Survived.value_counts()
Out[21]:
0    549
1    342
dtype: int64

1.25 Example 3 : Create DataFrame Object by reading a .csv file (Olympic Medalists)

In [22]:
medals=pd.read_csv('C:/Users/kiss/Documents/GitHub/pdacookbook/data/olympicmedals.csv')
In [23]:
medals.tail()
Out[23]:
City Edition Sport Discipline Athlete NOC Gender Event Event_gender Medal
29211 Beijing 2008 Wrestling Wrestling Gre-R ENGLICH, Mirko GER Men 84 - 96kg M Silver
29212 Beijing 2008 Wrestling Wrestling Gre-R MIZGAITIS, Mindaugas LTU Men 96 - 120kg M Bronze
29213 Beijing 2008 Wrestling Wrestling Gre-R PATRIKEEV, Yuri ARM Men 96 - 120kg M Bronze
29214 Beijing 2008 Wrestling Wrestling Gre-R LOPEZ, Mijain CUB Men 96 - 120kg M Gold
29215 Beijing 2008 Wrestling Wrestling Gre-R BAROEV, Khasan RUS Men 96 - 120kg M Silver
5 rows × 10 columns
In [24]:
medals.Sport.value_counts()
Out[24]:
Aquatics             3828
Athletics            3448
Rowing               2523
Gymnastics           2214
Fencing              1547
Football             1387
Hockey               1325
Wrestling            1140
Shooting             1105
Sailing              1061
Cycling              1025
Canoe / Kayak        1002
Basketball            940
Volleyball            910
Equestrian            894
Handball              886
Boxing                842
Weightlifting         548
Judo                  435
Baseball              335
Archery               305
Tennis                272
Rugby                 192
Softball              180
Modern Pentathlon     174
Badminton             120
Table Tennis          120
Tug of War             94
Taekwondo              80
Polo                   66
Lacrosse               59
Golf                   30
Skating                27
Ice Hockey             27
Cricket                24
Triathlon              18
Rackets                10
Croquet                 8
Water Motorsports       5
Basque Pelota           4
Roque                   3
Jeu de paume            3
dtype: int64
In [35]:
medals.NOC.value_counts()
Out[35]:
USA    4335
URS    2049
GBR    1594
FRA    1314
ITA    1228
GER    1211
AUS    1075
HUN    1053
SWE    1021
GDR     825
NED     782
JPN     704
CHN     679
RUS     638
ROU     624
...
BDI    1
AHO    1
AFG    1
BER    1
TOG    1
BAR    1
ERI    1
CIV    1
IRQ    1
MKD    1
SEN    1
TGA    1
UAE    1
MRI    1
SUD    1
Length: 138, dtype: int64


Посты чуть ниже также могут вас заинтересовать

Комментариев нет:

Отправить комментарий