Поиск по блогу

пятница, 10 апреля 2015 г.

Подгоняем таблицу_1 за январь 2014 к формату шаблона и сохраняем с новым порядком столбцов

В процессе работы выяснилось, что шаблон слишком громоздкий. Здесь я попробовал не копировать предыдущий вариант шаблона, а копипастить отдельный ячейки. Здесь при считывании файла пришлось пропускать первую строку, а при записи менять порядок столбцов. Кроме того, здесь столбцы "нарастающим итогом с начала года" изначально отсутствовали, при вставке нулей, три столбца '2014y', '2013y', '1413y' записались в формате Int, посмотрим в следующих постах, как они "concatent" c float

В процессе копипаста я ошибся и указал путь к файлу в папке 1feb, а надо было 1aug2... Не смог сразу сообразить, почему из файла читается только одна строчка... пока не распечатал файл. Процедура по-прежнему требует слишком большого внимания...

In [2]:
import os
import pandas as pd
In [3]:
# Path to dir with files Must end with traling slash '/'
DIRPATH = '/media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv/feb1/'

# Insert my columns names in Dataframe objects
COLNAMES =['Brandt', '2014y', '2013y', '1413y', '2014m', '2013m', '1413m', 'Datem']
COLNAMESjan =['Brandt', '2014m', '2013m', '1413m', 'Datem']

# Filter only filenames with
#filterfilenames = 'eng_car-sales-in'
# При считывании каждого файла во фрейм нужно будет добавлять столбец дат, 
# для этого мы используем список и словарь:
months = ['january', 'february', 'march', 'april', 'may', 'june',
        'july', 'august','september', 'october','november', 'december']
    
dict = {'january':'31-1', 'february':'28-2', 'march':'31-1',
        'april':'30-4', 'may':'31-5', 'june':'30-6',
        'july':'31-7', 'august':'31-8','september':'30-9', 
        'october':'31-10','november':'30-11', 'december':'31-12'}

newpath = '/media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv_new/'
In [ ]:
# %load '/media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv/1aug2/sales-in-january_2014_eng_final_1.csv'
"Association of European Businesses","Phone.:","+7 (495) 234 27 64","E-mail: info@aebrus.ru",
"Lada","23647","30037","-21%",
"Renault*","12894","14741","-13%",
"KIA","11199","11059","1%",
"Nissan*","11051","8925","24%",
"Hyundai*","11044","11306","-2%",
"Chevrolet","9420","9542","-1%",
"VW","8599","9968","-14%",
"Toyota*","8319","6860","21%",
"Mitsubishi","5007","4020","25%",
"Škoda","4744","4623","3%",
"Opel","4281","5211","-18%",
"Ford*","4250","6364","-33%",
"Daewoo","3718","5693","-35%",
"GAZ LCV*","3260","4108","-21%",
"Mazda","2848","2071","38%",
"Mercedes-Benz","2710","1902","42%",
"BMW","2540","2171","17%",
"UAZ*","2201","2906","-24%",
"Honda","1929","1209","60%",
"Peugeot*","1843","2338","-21%",
"Citroën*","1720","1581","9%",
"Audi","1623","1725","-6%",
"Suzuki","1482","1498","-1%",
"SsangYong","1330","1853","-28%",
"Geely","1158","817","42%",
"Chery","1137","1031","10%",
"Lifan","1118","843","33%",
"VW NFZ*","949","700","36%",
"Subaru","938","864","9%",
"Great Wall","932","1280","-27%",
"Land Rover","922","815","13%",
"Lexus","707","778","-9%",
"Infiniti","471","625","-25%",
"Jeep","420","146","188%",
"FIAT*","407","291","40%",
"Volvo","402","337","19%",
"FAW*","302","68","344%",
"Porsche","126","176","-28%",
"BAW*","113","128","-12%",
"ZAZ","101","203","-50%",
"Jaguar","94","70","34%",
"MINI","82","140","-41%",
"Cadillac","80","88","-9%",
"SEAT","72","201","-64%",
"Bogdan","56","261","-79%",
"Isuzu*","28","4","600%",
"Haima","20","6","233%",
"smart","17","3","467%",
"Luxgen","12","-","-",
"Izh","10","89","-89%",
"Chrysler","7","12","-42%",
"TagAZ*","6","33","-82%",
"Foton*","2","4","-50%",
"Dodge","2","9","-78%",
"Vortex","0","81","-",
"BYD","0","69","-",
In [ ]:
####2.1 Подготовим информационный файл (вот в такой вид), но сохраним его в последнюю очередь. 
In [41]:
%%file /media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv_new/2014_1_info.txt
resources folder: /media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv
filters:   if csvfile.startswith filterfilenames = 'eng_car-sales-in'
concatent sourcefiles: 2
'/media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv/1aug2/sales-in-january_2014_eng_final_1.csv',

headers:   COLNAMES =['Brandt', '2014y', '2013y', '1413y', '2014m', '2013m', '1413m', 'Datem'] 
results:   newpath = '/media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv_new/2014_1.txt'    
help:      [Первый рабочий пример конкатенции таблиц с преобразованием форматов столбцов object -> float]
           http://pythonr.blogspot.ru/2015/04/object-float.html
weblog:    http://pythonr.blogspot.com/2015/04/2014.html
Writing /media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv_new/2014_1_info.txt

2.2 Проверим опции для сохранения объекта pdall в файл, но выполним код в ячейках в предпоследнюю очередь.

In [39]:
print newpath
print COLNAMES
/media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv_new/
['Brandt', '2014y', '2013y', '1413y', '2014m', '2013m', '1413m', 'Datem']
In [40]:
pdall.to_csv(newpath + '2014_1.txt', 
        sep=';', 
        #float_format='%.8f',
        columns=COLNAMES,
        index=False)

Начнем парсить файл, обратите внримание на два "новых" параметра

Один файл с данными 'sales-in-january_2014_eng_final_1.csv', первую строчку убрать

In [ ]:
"Association of European Businesses","Phone.:","+7 (495) 234 27 64","E-mail: info@aebrus.ru",
"Lada","23647","30037","-21%",
"Renault*","12894","14741","-13%",
"KIA","11199","11059","1%",
"Nissan*","11051","8925","24%",
In [11]:
csvfile_path = '/media/MYLINUXLIVE/Documents/Xpdf/aerbu_2014_all_csv/1aug2/sales-in-january_2014_eng_final_1.csv'
pdtemp = pd.read_csv(csvfile_path, error_bad_lines=False, \
                         names=COLNAMESjan, na_values=['-','n/a'], \
                         quotechar='"', sep=',', thousands=' ', \
                    skiprows=1)
pdall=pdtemp
print(pdall)    
           Brandt  2014m  2013m 1413m  Datem
0            Lada  23647  30037  -21%    NaN
1        Renault*  12894  14741  -13%    NaN
2             KIA  11199  11059    1%    NaN
3         Nissan*  11051   8925   24%    NaN
4        Hyundai*  11044  11306   -2%    NaN
5       Chevrolet   9420   9542   -1%    NaN
6              VW   8599   9968  -14%    NaN
7         Toyota*   8319   6860   21%    NaN
8      Mitsubishi   5007   4020   25%    NaN
9           Škoda   4744   4623    3%    NaN
10           Opel   4281   5211  -18%    NaN
11          Ford*   4250   6364  -33%    NaN
12         Daewoo   3718   5693  -35%    NaN
13       GAZ LCV*   3260   4108  -21%    NaN
14          Mazda   2848   2071   38%    NaN
15  Mercedes-Benz   2710   1902   42%    NaN
16            BMW   2540   2171   17%    NaN
17           UAZ*   2201   2906  -24%    NaN
18          Honda   1929   1209   60%    NaN
19       Peugeot*   1843   2338  -21%    NaN
20       Citroën*   1720   1581    9%    NaN
21           Audi   1623   1725   -6%    NaN
22         Suzuki   1482   1498   -1%    NaN
23      SsangYong   1330   1853  -28%    NaN
24          Geely   1158    817   42%    NaN
25          Chery   1137   1031   10%    NaN
26          Lifan   1118    843   33%    NaN
27        VW NFZ*    949    700   36%    NaN
28         Subaru    938    864    9%    NaN
29     Great Wall    932   1280  -27%    NaN
30     Land Rover    922    815   13%    NaN
31          Lexus    707    778   -9%    NaN
32       Infiniti    471    625  -25%    NaN
33           Jeep    420    146  188%    NaN
34          FIAT*    407    291   40%    NaN
35          Volvo    402    337   19%    NaN
36           FAW*    302     68  344%    NaN
37        Porsche    126    176  -28%    NaN
38           BAW*    113    128  -12%    NaN
39            ZAZ    101    203  -50%    NaN
40         Jaguar     94     70   34%    NaN
41           MINI     82    140  -41%    NaN
42       Cadillac     80     88   -9%    NaN
43           SEAT     72    201  -64%    NaN
44         Bogdan     56    261  -79%    NaN
45         Isuzu*     28      4  600%    NaN
46          Haima     20      6  233%    NaN
47          smart     17      3  467%    NaN
48         Luxgen     12    NaN   NaN    NaN
49            Izh     10     89  -89%    NaN
50       Chrysler      7     12  -42%    NaN
51         TagAZ*      6     33  -82%    NaN
52         Foton*      2      4  -50%    NaN
53          Dodge      2      9  -78%    NaN
54         Vortex      0     81   NaN    NaN
55            BYD      0     69   NaN    NaN

Теперь прсто присвоим значения данным в столбце дат

In [12]:
pdall.Datem = '31-1-2014'

Конвертируем столбец с процентами в число

In [36]:
pdall['1413m'] = pdall['1413m'].str.replace('%','').astype(float)

Создадим нулевые столцы, чтобы размер таблицы привести к стандартному

In [33]:
pdall['2014y'] = 0 
pdall['2013y'] = 0 
pdall['1413y'] = 0
In [37]:
pdall
Out[37]:
Brandt 2014m 2013m 1413m Datem 2014y 2013y 1413y
0 Lada 23647 30037 -21 31-1-2014 0 0 0
1 Renault* 12894 14741 -13 31-1-2014 0 0 0
2 KIA 11199 11059 1 31-1-2014 0 0 0
3 Nissan* 11051 8925 24 31-1-2014 0 0 0
4 Hyundai* 11044 11306 -2 31-1-2014 0 0 0
5 Chevrolet 9420 9542 -1 31-1-2014 0 0 0
6 VW 8599 9968 -14 31-1-2014 0 0 0
7 Toyota* 8319 6860 21 31-1-2014 0 0 0
8 Mitsubishi 5007 4020 25 31-1-2014 0 0 0
9 Škoda 4744 4623 3 31-1-2014 0 0 0
10 Opel 4281 5211 -18 31-1-2014 0 0 0
11 Ford* 4250 6364 -33 31-1-2014 0 0 0
12 Daewoo 3718 5693 -35 31-1-2014 0 0 0
13 GAZ LCV* 3260 4108 -21 31-1-2014 0 0 0
14 Mazda 2848 2071 38 31-1-2014 0 0 0
15 Mercedes-Benz 2710 1902 42 31-1-2014 0 0 0
16 BMW 2540 2171 17 31-1-2014 0 0 0
17 UAZ* 2201 2906 -24 31-1-2014 0 0 0
18 Honda 1929 1209 60 31-1-2014 0 0 0
19 Peugeot* 1843 2338 -21 31-1-2014 0 0 0
20 Citroën* 1720 1581 9 31-1-2014 0 0 0
21 Audi 1623 1725 -6 31-1-2014 0 0 0
22 Suzuki 1482 1498 -1 31-1-2014 0 0 0
23 SsangYong 1330 1853 -28 31-1-2014 0 0 0
24 Geely 1158 817 42 31-1-2014 0 0 0
25 Chery 1137 1031 10 31-1-2014 0 0 0
26 Lifan 1118 843 33 31-1-2014 0 0 0
27 VW NFZ* 949 700 36 31-1-2014 0 0 0
28 Subaru 938 864 9 31-1-2014 0 0 0
29 Great Wall 932 1280 -27 31-1-2014 0 0 0
30 Land Rover 922 815 13 31-1-2014 0 0 0
31 Lexus 707 778 -9 31-1-2014 0 0 0
32 Infiniti 471 625 -25 31-1-2014 0 0 0
33 Jeep 420 146 188 31-1-2014 0 0 0
34 FIAT* 407 291 40 31-1-2014 0 0 0
35 Volvo 402 337 19 31-1-2014 0 0 0
36 FAW* 302 68 344 31-1-2014 0 0 0
37 Porsche 126 176 -28 31-1-2014 0 0 0
38 BAW* 113 128 -12 31-1-2014 0 0 0
39 ZAZ 101 203 -50 31-1-2014 0 0 0
40 Jaguar 94 70 34 31-1-2014 0 0 0
41 MINI 82 140 -41 31-1-2014 0 0 0
42 Cadillac 80 88 -9 31-1-2014 0 0 0
43 SEAT 72 201 -64 31-1-2014 0 0 0
44 Bogdan 56 261 -79 31-1-2014 0 0 0
45 Isuzu* 28 4 600 31-1-2014 0 0 0
46 Haima 20 6 233 31-1-2014 0 0 0
47 smart 17 3 467 31-1-2014 0 0 0
48 Luxgen 12 NaN NaN 31-1-2014 0 0 0
49 Izh 10 89 -89 31-1-2014 0 0 0
50 Chrysler 7 12 -42 31-1-2014 0 0 0
51 TagAZ* 6 33 -82 31-1-2014 0 0 0
52 Foton* 2 4 -50 31-1-2014 0 0 0
53 Dodge 2 9 -78 31-1-2014 0 0 0
54 Vortex 0 81 NaN 31-1-2014 0 0 0
55 BYD 0 69 NaN 31-1-2014 0 0 0
In [38]:
pdall.info(verbose=True, buf=None, max_cols=None, memory_usage=True, null_counts=True)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 56 entries, 0 to 55
Data columns (total 8 columns):
Brandt    56 non-null object
2014m     56 non-null int64
2013m     55 non-null float64
1413m     53 non-null float64
Datem     56 non-null object
2014y     56 non-null int64
2013y     56 non-null int64
1413y     56 non-null int64
dtypes: float64(2), int64(4), object(2)
memory usage: 3.5+ KB

Запомним, что формат int в трех столбцах "необычен", но он должен "сочетаться" с float (в других файлах за другие месяцы) И вернемся наверх, чтобы сохранить файлы.



Посты чуть ниже также могут вас заинтересовать

Комментариев нет:

Отправить комментарий