Объект Series обладает своиствами списка, массива и словаря. На первый взгляд это именно то, что надо для работы с CSV файлами. Потому здесь изучаем 20-минутный видеоролик.

Key Learning Objective: Series exhibits both array-like and dict-like properties. Помимо видео внизу мне еще понравился вот этот пост The pandas library is used for all the data analysis excluding a small piece of the data presentation section
Но самый полный список обучающих ресурсов здесь This is a guide to many pandas tutorials, geared mainly for new users

In [1]:

from IPython.display import YouTubeVideo
YouTubeVideo('eRpFC2CKvao')

Out[1]:

In [50]:

from IPython.display import Image
Image(filename='C:/Users/kiss/Documents/GitHub/pdacookbook/images/series4.jpg',width=200)

Out[50]:

Как я понял идею "скрий"... При создании серии есть данные (список) и есть индекс (список), таким образом, индексом можно назначить, как столбец с цифрами, так и столбец со строками. Попимо индекса, есть еще некий системный аналог - offset. Получается, что любой элемент списка (данных) можно вызвать двумя способами...

Повторяем примеры из видеоролика¶

In []:

# s = Series(data,index=index)
# data may be different lists, including
# a list, an array, a dictionary

In [2]:

import pandas as pd
import numpy as np

In [3]:

s1= pd.Series([33,19,15,89,11,-5,9])

In [4]:

# The default index is not specified in the Series constructor
s1

Out[4]:

0    33
1    19
2    15
3    89
4    11
5    -5
6     9
dtype: int64

In [5]:

type(s1)

Out[5]:

pandas.core.series.Series

In [6]:

s1.values

Out[6]:

array([33, 19, 15, 89, 11, -5,  9], dtype=int64)

In [7]:

type(s1.values)

Out[7]:

numpy.ndarray

In [8]:

 s1.index

Out[8]:

Int64Index([0, 1, 2, 3, 4, 5, 6], dtype='int64')

Пример 2. Объект Series with meaningfull labels¶

In [11]:

#define data and index as separate lists
data1=[33,19,15,89,11,-5,9]
index1=['Mon','Tue','Wed','Thu','Fri','Sat','Sun']

In [12]:

s2= pd.Series(data1,index=index1)

In [13]:

# all elements are int64
s2

Out[13]:

Mon    33
Tue    19
Wed    15
Thu    89
Fri    11
Sat    -5
Sun     9
dtype: int64

In [14]:

s2.index

Out[14]:

Index([u'Mon', u'Tue', u'Wed', u'Thu', u'Fri', u'Sat', u'Sun'], dtype='object')

In [15]:

s2.name='Dayly Temperatures'
s2.index.name='Weekday'

Третий пример¶

In [16]:

# The second dada element have float '19.3' instead int '19' in data1
data2=[33,19.3,15,89,11,-5,9]

In [17]:

# create new Series with second element
s3= pd.Series(data2,index=index1)

In [19]:

# and we see that all elements are homogeneous (float64)
s3

Out[19]:

Mon    33.0
Tue    19.3
Wed    15.0
Thu    89.0
Fri    11.0
Sat    -5.0
Sun     9.0
dtype: float64

Creating a Series from Python Dict¶

In [20]:

dict1={'Mon':33,'Tue':19,'Wed':15,'Thu':89,'Fri':11,'Sat':-5,'Sun':9}

In [21]:

s4= pd.Series(dict1)

In [22]:

s4

Out[22]:

Fri    11
Mon    33
Sat    -5
Sun     9
Thu    89
Tue    19
Wed    15
dtype: int64

Series in ndArray-like¶

In [23]:

# vectorized operations
s4*2

Out[23]:

Fri     22
Mon     66
Sat    -10
Sun     18
Thu    178
Tue     38
Wed     30
dtype: int64

In [24]:

np.log(s4)

Out[24]:

Fri    2.397895
Mon    3.496508
Sat         NaN
Sun    2.197225
Thu    4.488636
Tue    2.944439
Wed    2.708050
dtype: float64

Note: NAN (not number) is statdart missing data marking in Pandas

In [26]:

# slice using index labels
s4['Thu':'Wed']

Out[26]:

Thu    89
Tue    19
Wed    15
dtype: int64

In [27]:

# slice using position
# there are only two elements s[1] and s[2]
s4[1:3]

Out[27]:

Mon    33
Sat    -5
dtype: int64

In [29]:

# retreive value using offset
s4[1], s4[2]

Out[29]:

(33, -5)

In [30]:

# set value using offset
s4[1]=199
s4

Out[30]:

Fri     11
Mon    199
Sat     -5
Sun      9
Thu     89
Tue     19
Wed     15
dtype: int64

In [32]:

# as a subclass of ndarray Series is a valid argument for most Numpy functions
s4.median(), s4.max()

Out[32]:

(15.0, 199)

In [33]:

s4.cumsum()

Out[33]:

Fri     11
Mon    210
Sat    205
Sun    214
Thu    303
Tue    322
Wed    337
dtype: int64

In [37]:

#looping over a collection and indeces
for i,v in enumerate(s4):
    print i,v

In [39]:

#list comprehension can be used to create a new list
new_list=[x**2 for x in s4]

In [40]:

new_list

Out[40]:

[121, 39601, 25, 81, 7921, 361, 225]

Series is dict-like¶

In [42]:

'Sun' in s4

Out[42]:

True

In [45]:

#retrieve value using key of index
s4['Tue'], s4[5]

Out[45]:

(19, 19)

In [46]:

# looping over dictionary keys and values
for k,v in s4.iteritems():
    print v,k

11 Fri
199 Mon
-5 Sat
9 Sun
89 Thu
19 Tue
15 Wed

In [48]:

from IPython.display import HTML

In [49]:

HTML('<iframe src=http://alfredessa.com/data-analysis-tutorial/2-pandas-library/ width=800 height=350></iframe>')

Out[49]:

Посты чуть ниже также могут вас заинтересовать

iPython R Rapid Miner

Поиск по блогу

Страницы

среда, 12 февраля 2014 г.

How to Create a Series Object in Pandas... Key take away

Повторяем примеры из видеоролика¶

Пример 2. Объект Series with meaningfull labels¶

Третий пример¶

Creating a Series from Python Dict¶

Series in ndArray-like¶

Series is dict-like¶

Комментариев нет:

Отправить комментарий

Поиск по блогу

Страницы

среда, 12 февраля 2014 г.

How to Create a Series Object in Pandas... Key take away

Повторяем примеры из видеоролика¶

Пример 2. Объект Series with meaningfull labels¶

Третий пример¶

Creating a Series from Python Dict¶

Series in ndArray-like¶

Series is dict-like¶

Комментариев нет:

Отправить комментарий

среда, 12 февраля 2014 г.