Нет, этот пост не дополнения к "Cookbook" из документации. Здесь ссылки на 9 частей руководства github/jvns/pandas-cookbook для просмотра в nbviewer. При первом прочтении я нашел здесь множество приемов. Сначала хотел все пересмотреть и добавить свои комментарии, но быстро понял, что тогда я этот пост никогда не закончу. Так что здесь только краткие оригинальные комментарии, а мои, надеюсь, появятся в комментариях к посту.
Как получить прямую ссылку на файл с GitHub¶
Например, если кликнуть по ссылке bikes.csv в паке с файлами, то откроется окно, если там найти кнопку "Raw", то после клика по ней, откроется по прямой ссылке (direct_link)новое окно браузера с CSV файлом:
html_page = "https://github.com/jvns/pandas-cookbook/blob/v0.1/data/bikes.csv
direct_url= "https://raw.githubusercontent.com/jvns/pandas-cookbook/master/data/bikes.csv"
Можно просто скопировать все папки на локальный компьютер¶
git clone https://github.com/jvns/pandas-cookbook
Pandas cookbook
pandas is a Python library for doing data analysis. It's really fast and lets you do exploratory work incredibly quickly.
The goal of this cookbook is to give you some concrete examples for getting started with pandas. The docs are really comprehensive. However, I've often had people tell me that they have some trouble getting started, so these are examples with real-world data, and all the bugs and weirdness that that entails.
I'm working with 3 datasets right now
- 311 calls in New York
- How many people were on Montréal's bike paths in 2012
- Montreal's weather for 2012, hourly
It comes with batteries (data) included, so you can try out all the examples right away.
Table of Contents
-
A quick tour of the IPython Notebook
Shows off IPython's awesome tab completion and magic functions. -
Chapter 1: Reading from a CSV
Reading your data into pandas is pretty much the easiest thing. Even when the encoding is wrong! -
Chapter 2: Selecting data & finding the most common complaint type
It's not totally obvious how to select data from a pandas dataframe. Here I explain the basics (how to take slices and get columns) -
Chapter 3: Which borough has the most noise complaints? (or, more selecting data)
Here we get into serious slicing and dicing and learn how to filter dataframes in complicated ways, really fast. -
Chapter 4: Find out on which weekday people bike the most with groupby and aggregate
The groupby/aggregate is seriously my favorite thing about pandas and I use it all the time. You should probably read this. -
Chapter 5: Combining dataframes and scraping Canadian weather data
Here you get to find out if it's cold in Montreal in the winter (spoiler: yes). Web scraping with pandas is fun! -
Chapter 6: String operations! Which month was the snowiest?
Strings with pandas are great. It has all these vectorized string operations and they're the best. We will turn a bunch of strings containing "Snow" into vectors of numbers in a trice. -
Chapter 7: Cleaning up messy data
Cleaning up messy data is never a joy, but with pandas it's easier <3 -
Chapter 8: Parsing Unix timestamps
This is basically a quick trick that took me 2 days to figure out. - Chapter 9: ???
Installation
You'll need an up-to-date version of IPython Notebook (>= 1.0) and pandas (>=0.12) for this to work properly
You can get these using pip
:
pip install ipython pandas numpy
Alternatively, I use and recommend Anaconda, which will give you everything you need. It's free and open source.
Once you have pandas and IPython, you can get going!
git clone https://github.com/jvns/pandas-cookbook.git
cd pandas-cookbook/cookbook
ipython notebook --pylab inline
A tab should open up in your browser at http://localhost:8888
Happy pandas!
Send me email!
Here's how this works: This is a prototype, and I haven't decided if it would be useful to continue with it yet. If you find it useful, send me email! If there's something you'd like to see, send me email!
TODO
- Joining dataframes
- Using stack/unstack
- ???
Посты чуть ниже также могут вас заинтересовать
Комментариев нет:
Отправить комментарий