Вот подробная документация (ниже две ссылки на один и тот же сайт). Я с ней познакомился, когда искал варианты работы с csv файлами. Причем, просто мне чем-то не понравились PyTables. Так что первая ссылка на csv-объект. 10 Minutes to Pandas, pandas: powerful Python data analysis toolkit

На официальном сайте есть 10-минутный ролик pandas.pydata.org, в котором рассказывается, как здорово библиотека обрабатывает Timeseries
Там же есть ссылки на другие интересные библиотеки scikit-learn Machine Learning in Python Statsmodels

Начнем с примера Remote Data Access. Здесь прямой доступ к Yahoo, Google,FRED, Fama/French, World Bank

Yahoo! Finance¶

In [1]:

import pandas.io.data as web

In [2]:

import datetime

In [3]:

start = datetime.datetime(2010, 1, 1)

In [4]:

end = datetime.datetime(2013, 01, 27)

In [5]:

f=web.DataReader("F", 'yahoo', start, end)

In [6]:

f.ix['2010-01-04']

Out[6]:

Open               10.17
High               10.28
Low                10.05
Close              10.28
Volume       60855800.00
Adj Close           9.75
Name: 2010-01-04 00:00:00, dtype: float64

World Bank¶

In [8]:

from pandas.io import wb

In [10]:

wb.search('gdp.*capita.*const').iloc[:,:2]

Out[10]:

	id	name
3242	GDPPCKD	GDP per Capita, constant US$, millions
5183	NY.GDP.PCAP.KD	GDP per capita (constant 2005 US$)
5185	NY.GDP.PCAP.KN	GDP per capita (constant LCU)
5187	NY.GDP.PCAP.PP.KD	GDP per capita, PPP (constant 2005 internation...

4 rows × 2 columns

Out[10]:

	id	name
3242	GDPPCKD	GDP per Capita, constant US$, millions
5183	NY.GDP.PCAP.KD	GDP per capita (constant 2005 US$)
5185	NY.GDP.PCAP.KN	GDP per capita (constant LCU)
5187	NY.GDP.PCAP.PP.KD	GDP per capita, PPP (constant 2005 internation...

4 rows × 2 columns

In [11]:

dat = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'CA', 'MX'], start=2005, end=2008)

In [12]:

 print(dat)

                    NY.GDP.PCAP.KD
country       year                
Canada        2008    36005.500498
              2007    36182.913844
              2006    35785.969817
              2005    35087.892593
Mexico        2008     8312.764412
              2007     8301.822341
              2006     8149.882439
              2005     7858.762170
United States 2008    44872.653626
              2007    45431.027016
              2006    45058.649753
              2005    44313.585241

[12 rows x 1 columns]

The resulting dataset is a properly formatted DataFrame with a hierarchical index, so it is easy to apply .groupby transformations to it:

In [13]:

dat['NY.GDP.PCAP.KD'].groupby(level=0).mean()

Out[13]:

country
Canada           35765.569188
Mexico            8155.807840
United States    44918.978909
dtype: float64

Now imagine you want to compare GDP to the share of people with cellphone contracts around the world.

In [14]:

wb.search('cell.*%').iloc[:,:2]

Out[14]:

	id	name
4018	IT.CEL.SETS.FE.ZS	Mobile cellular telephone users, female (% of ...
4019	IT.CEL.SETS.MA.ZS	Mobile cellular telephone users, male (% of po...
4055	IT.MOB.COV.ZS	Population coverage of mobile cellular telepho...

3 rows × 2 columns

Notice that this second search was much faster than the first one because Pandas now has a cached list of available data series.

In [15]:

ind = ['NY.GDP.PCAP.KD', 'IT.MOB.COV.ZS']

In [16]:

dat = wb.download(indicator=ind, country='all', start=2011, end=2011).dropna()

In [17]:

dat.columns = ['gdp', 'cellphone']

In [18]:

print(dat.tail())

                        gdp  cellphone
country   year                        
Swaziland 2011  2413.952853       94.9
Tunisia   2011  3687.340170      100.0
Uganda    2011   405.332501      100.0
Zambia    2011   767.911290       62.0
Zimbabwe  2011   423.735752       72.4

[5 rows x 2 columns]

Finally, we use the statsmodels package to assess the relationship between our two variables using ordinary least squares regression. Unsurprisingly, populations in rich countries tend to use cellphones at a higher rate:

In [19]:

import numpy as np

In [20]:

import statsmodels.formula.api as smf

In [21]:

mod = smf.ols("cellphone ~ np.log(gdp)", dat).fit()

In [22]:

print(mod.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:              cellphone   R-squared:                       0.286
Model:                            OLS   Adj. R-squared:                  0.262
Method:                 Least Squares   F-statistic:                     12.01
Date:                Tue, 11 Feb 2014   Prob (F-statistic):            0.00162
Time:                        14:52:58   Log-Likelihood:                -135.54
No. Observations:                  32   AIC:                             275.1
Df Residuals:                      30   BIC:                             278.0
Df Model:                           1                                         
===============================================================================
                  coef    std err          t      P>|t|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------
Intercept      17.0706     19.645      0.869      0.392       -23.049    57.190
np.log(gdp)     9.7804      2.822      3.466      0.002         4.017    15.544
==============================================================================
Omnibus:                       33.694   Durbin-Watson:                   2.072
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              100.351
Skew:                          -2.211   Prob(JB):                     1.62e-22
Kurtosis:                      10.464   Cond. No.                         45.7
==============================================================================

Эти "новые" возможностми импорта курсов акций настолько меня восхитили, что я решил упражнения с csv-файлами перенести в следующий пост.

Посты чуть ниже также могут вас заинтересовать

iPython R Rapid Miner

Поиск по блогу

Страницы

среда, 12 февраля 2014 г.

Первое знакомство с Pandas импорт серий Yahoo и World Bank

Yahoo! Finance¶

World Bank¶

Комментариев нет:

Отправить комментарий

Поиск по блогу

Страницы

среда, 12 февраля 2014 г.

Первое знакомство с Pandas импорт серий Yahoo и World Bank

Yahoo! Finance¶

World Bank¶

Комментариев нет:

Отправить комментарий

среда, 12 февраля 2014 г.