Копипастим в последнюю ячейку одну из трех таблиц из текстового файла, редактируем таблицу вручную. Потом загружаем содержание ячейки в DataFrame, добавляем столбец даты, убираем из столбцов символы '%" , заменяем '(' на '-'сохраняем все в 'eng_car_sales_in_feubrary_2015.csv'.
.isin .str.contains .str.replace('[().]', '')
Пытаемся упростить обработку файлов pdf AEB, но находим новые ошибки
Удалить '%' и сменить формат на числовой (Важно! ))
Sales of cars and light commercial vehicles RSS
Regular Expressions Python ... и мой перечень главных команд
Учимся фильтровать строки DataFrame (pandas) .str.contains()
import pandas as pd
import numpy as np
from cStringIO import StringIO
pd.set_option('display.width', 5000)
pd.set_option('display.max_columns', 60)
colnames_2015_t1 = ['Brandt', \
'2015m1', '2014m1', '1514m1', \
'2015y1', '2014y1', '1514y1']
colnames_2015_t2 = ['Brandt', '2015pm2', '2014pm2', '1514pm2',\
'2015m2', '2014m2', '1514m2', \
'2015py2', '2014py2', '1514py2',
'2015y2', '2014y2', '1514y2']
colnames_2015_t3 = ['n1_m3', 'Model_m3', 'Brandt_m3', \
'2014m3', '2013m3', '1413m3', \
'n1_y3', 'Model_y3', 'Brandt_y3', \
'2014y3', '2013y3', '1413y3', 'Datem']
Редактировать¶
tt = t2
cl = colnames_2015_t2
pdt2 = pd.read_csv(StringIO(tt), names=cl, skiprows=1, \
na_values=['-','n/a'], skip_blank_lines=True, \
sep=' {4,}', thousands=',')
print pdt2[:5]
print pdt2.tail(10)
А что стало со строками Мерседеса (тут сначала была лестница, потом я ее исправил и скопировал назад в конец поста)
MERCEDES-BENZ 3.6 1.9 1.7 4,627 3,999 15.7% 3.3 2.0 1.3 8,126 7,038 15.5%
MERCEDES-BENZ cars 3.2 1.7 1.5 4,151 3,520 17.9% 3.0 1.7 1.3 7,326 6,230 17.6%
MERCEDES-BENZ vans 0.3 0.2 0.1 444 459 -3.3% 0.3 0.2 0.1 749 771 -2.9%
MERCEDES-BENZ smart 0.0 0.0 0.0 32 20 60.0% 0.0 0.0 0.0 51 37 37.8%
Сначала вспомнил .isin
pdt2[pdt2['Brandt'].isin(['MERCEDES-BENZ','cars','vans','smart'])]
А потом проверил .str.contains(r'MERCEDES')
pdt2[pdt1['Brandt'].str.contains(r'MERCEDES')]
Добавим и почистим столбцы¶
pdt2['Datem'] = '28-2-2015'
pdt2.info(verbose=True, buf=None, max_cols=None, memory_usage=True, null_counts=True)
pdt2['1514y2'] = pdt2['1514y2'].str.replace('%','').astype(float)
pdt2['1514m2'] = pdt2['1514m2'].str.replace('%','').astype(float)
pdt2['1514pm2'] = pdt2['1514pm2'].str.replace('(', '-')
pdt2['1514pm2'] = pdt2['1514pm2'].str.replace(')', '').astype(float)
pdt2['1514py2'] = pdt2['1514py2'].str.replace('(', '-')
pdt2['1514py2'] = pdt2['1514py2'].str.replace(')', '').astype(float)
########### Тест
pdt2['Brandt'].str.replace('*','').astype(object)
pdt2['Brandt'].tail()
- LCV sales are included into total brand’s sales if exist in the product line of the brand (marked with*); reported separately for some brands. LCV<3,5t (with several exceptions reaching highest range of 6t). Some updates concerning LCV data may occur.
pdt2[:5], pdt2.tail(10)
pdt2.info(verbose=True, buf=None, max_cols=None, memory_usage=True, null_counts=True)
Редактировать файл и путь для записи¶
ttname = 'eng_car_sales_in_feubrary_2015.csv'
ttfolder = '/media/MYLINUXLIVE/Documents/Xpdf/aerbu_2015_all_csv/2/'
# Проверить имена столбцов
colnames = list(pdt1.columns.values)
colnames
Сохраним файл¶
pdt2.to_csv(ttfolder + ttname,
sep=';',
#float_format='%.8f',
columns=colnames,
index=False)
Приложение (добавить номер, например t3)¶
t2 =""""
AVTOVAZ-RENAULT-NISSAN 35.5 31.8 3.7 45,509 65,608 -30.6% 34.7 31.6 3.1 84,534 113,671 -25.6%
AVTOVAZ 18.4 15.0 3.4 23,639 30,896 -23.5% 16.9 15.2 1.7 41,131 54,543 -24.6%
RENAULT 7.1 8.1 (1.0) 9,100 16,721 -45.6% 7.3 8.2 (0.9) 17,909 29,615 -39.5%
NISSAN 7.4 8.3 (0.9) 9,447 17,158 -44.9% 7.6 7.9 (0.3) 18,548 28,209 -34.2%
INFINITI 0.3 0.4 (0.1) 419 833 -49.7% 0.5 0.4 0.1 1,109 1,304 -15.0%
Datsun 2.3 - - 2,904 - - 2.4 - - 5,837 - -
VW Group 11.1 10.6 0.5 14,188 21,849 -35.1% 11.3 10.5 0.8 27,527 37,836 -27.2%
VOLKSWAGEN cars 4.5 5.5 (1.0) 5,804 11,390 -49.0% 4.9 5.6 (0.7) 11,969 19,989 -40.1%
VOLKSWAGEN vans 0.5 0.5 0.0 672 1,058 -36.5% 0.5 0.6 (0.1) 1,280 2,007 -36.2%
AUDI 1.6 1.4 0.2 2,092 2,905 -28.0% 1.4 1.3 0.1 3,502 4,528 -22.7%
SEAT 0.0 0.1 (0.1) 2 112 -98.2% 0.0 0.1 (0.1) 33 184 -82.1%
ŠKODA 4.4 3.1 1.3 5,618 6,384 -12.0% 4.4 3.1 1.3 10,743 11,128 -3.5%
HYUNDAI 10.3 6.7 3.6 13,233 13,901 -4.8% 10.6 6.9 3.7 25,940 24,945 4.0%
KIA 9.8 6.4 3.4 12,563 13,303 -5.6% 9.8 6.8 3.0 23,909 24,502 -2.4%
TOYOTA Group 6.2 6.1 0.1 7,934 12,606 -37.1% 6.5 6.0 0.5 15,773 21,632 -27.1%
TOYOTA 5.6 5.8 (0.2) 7,143 11,892 -39.9% 5.9 5.6 0.3 14,292 20,211 -29.3%
LEXUS 0.6 0.3 0.3 791 714 10.8% 0.6 0.4 0.2 1,481 1,421 4.2%
MERCEDES-BENZ 3.6 1.9 1.7 4,627 3,999 15.7% 3.3 2.0 1.3 8,126 7,038 15.5%
MERCEDES-BENZ cars 3.2 1.7 1.5 4,151 3,520 17.9% 3.0 1.7 1.3 7,326 6,230 17.6%
MERCEDES-BENZ vans 0.3 0.2 0.1 444 459 -3.3% 0.3 0.2 0.1 749 771 -2.9%
MERCEDES-BENZ smart 0.0 0.0 0.0 32 20 60.0% 0.0 0.0 0.0 51 37 37.8%
GM Group 3.3 9.3 (6.0) 4,201 19,155 -78.1% 3.4 9.2 (5.8) 8,362 32,936 -74.6%
OPEL 0.7 3.2 (2.5) 912 6,610 -86.2% 0.8 3.0 (2.2) 1,985 10,891 -81.8%
CHEVROLET 2.5 6.0 (3.5) 3,255 12,448 -73.9% 2.6 6.1 (3.5) 6,305 21,868 -71.2%
CADILLAC 0.0 0.0 0.0 34 97 -64.9% 0.0 0.0 0.0 72 177 -59.3%
BMW Group 2.9 1.7 1.2 3,666 3,504 4.6% 2.6 1.7 0.9 6,302 6,126 2.9%
BMW 2.8 1.6 1.2 3,550 3,356 5.8% 2.5 1.6 0.9 6,095 5,896 3.4%
MINI 0.1 0.1 0.0 116 148 -21.6% 0.1 0.1 0.0 207 230 -10.0%
MITSUBISHI 2.6 3.3 (0.7) 3,323 6,817 -51.3% 2.7 3.3 (0.6) 6,543 11,824 -44.7%
GAZ LCV 2.3 2.2 0.1 2,970 4,527 -34.4% 2.0 2.2 (0.2) 4,798 7,787 -38.4%
MAZDA 1.9 1.9 0.0 2,381 3,899 -38.9% 2.0 1.9 0.1 4,765 6,747 -29.4%
UAZ 1.8 1.4 0.4 2,260 2,903 -22.1% 1.9 1.4 0.5 4,513 5,104 -11.6%
DAEWOO 1.3 2.2 (0.9) 1,639 4,536 -63.9% 1.5 2.3 (0.8) 3,654 8,254 -55.7%
FORD 1.1 3.1 (2.0) 1,399 6,306 -77.8% 1.3 2.9 (1.6) 3,212 10,556 -69.6%
JAGUAR LAND ROVER 1.0 1.0 0.0 1,258 2,024 -37.8% 0.9 0.8 0.1 2,092 3,040 -31.2%
LAND ROVER 0.9 0.9 0.0 1,183 1,867 -36.6% 0.8 0.8 0.0 1,958 2,789 -29.8%
JAGUAR 0.1 0.1 0.0 75 157 -52.2% 0.1 0.1 0.0 134 251 -46.6%
SUBARU 0.7 0.7 0.0 836 1,374 -39.2% 0.7 0.6 0.1 1,752 2,312 -24.2%
PSA PEUGEOT 0.6 2.2 (1.6) 757 4,546 -83.3% 0.7 2.3 (1.6) 1,655 8,109 -79.6%
CITROЁN
PEUGEOT 0.3 1.1 (0.8) 377 2,354 -84.0% 0.3 1.2 (0.9) 795 4,197 -81.1%
CITROЁN 0.3 1.1 (0.8) 380 2,192 -82.7% 0.4 1.1 (0.7) 860 3,912 -78.0%
VOLVO 0.6 0.6 0.0 757 1,143 -33.8% 0.4 0.4 0.0 1,027 1,545 -33.5%
LIFAN 0.6 0.7 (0.1) 721 1,548 -53.4% 0.6 0.7 (0.1) 1,484 2,666 -44.3%
GEELY 0.5 0.8 (0.3) 681 1,687 -59.6% 0.5 0.8 (0.3) 1,131 2,845 -60.2%
CHRYSLER Group 0.5 0.6 (0.1) 589 1,208 -51.2% 0.4 0.6 (0.2) 1,097 2,044 -46.3%
FIAT 0.3 0.3 0.0 345 581 -40.6% 0.3 0.3 0.0 641 988 -35.1%
CHRYSLER 0.0 0.0 0.0 8 14 -42.9% 0.0 0.0 0.0 12 21 -42.9%
JEEP 0.2 0.3 (0.1) 235 609 -61.4% 0.2 0.3 (0.1) 440 1,029 -57.2%
DODGE - 0.0 - - 4 - - 0.0 - - 6 -
ALFA ROMEO 0.0 - - 1 - - 0.0 - - 4 - -
SSANGYONG 0.4 1.0 (0.6) 561 2,007 -72.0% 0.5 0.9 (0.4) 1,294 3,337 -61.2%
SUZUKI 0.3 0.8 (0.5) 416 1,707 -75.6% 0.4 0.9 (0.5) 871 3,189 -72.7%
CHERY 0.3 0.8 (0.5) 402 1,709 -76.5% 0.3 0.8 (0.5) 770 2,846 -72.9%
GREAT WALL 0.3 0.7 (0.4) 402 1,402 -71.3% 0.3 0.6 (0.3) 807 2,334 -65.4%
PORSCHE 0.3 0.1 0.2 381 174 119.0% 0.2 0.1 0.1 575 300 91.7%
HONDA Group 0.3 1.2 (0.9) 341 2,484 -86.3% 0.3 1.2 (0.9) 798 4,413 -81.9%
HONDA 0.2 1.2 (1.0) 252 2,484 -89.9% 0.3 1.2 (0.9) 636 4,413 -85.6%
ACURA 0.1 - - 89 - - 0.1 - - 162 - -
BRILLIANCE 0.1 - - 144 - - 0.1 - - 241 - -
ISUZU 0.1 0.0 0.1 74 5 1380.0% 0.0 0.0 0.0 103 33 212.1%
JAC 0.0 - - 38 - - 0.0 - - 53 - -
BAW 0.0 0.1 (0.1) 24 118 -79.7% 0.0 0.1 (0.1) 53 231 -77.1%
HAIMA 0.0 0.0 0.0 20 21 -4.8% 0.0 0.0 0.0 61 41 48.8%
FOTON 0.0 0.0 0.0 3 3 0.0% 0.0 0.0 0.0 4 5 -20.0%
CHANGAN - - - - 50 - - - - - 50 -
LUXGEN - 0.0 - - 15 - - - - - 27 -
TAGAZ - 0.0 - - 22 - - 0.0 - - 28 -
FAW - 0.1 - - 234 - - 0.1 - - 536 -
ZAZ - 0.1 - - 113 - - 0.1 - - 214 -
IZH - 0.0 - - 5 - - 0.0 - - 15 -
BOGDAN - 0.0 - - 14 - - 0.0 - - 70 -
"""
В str.replace можно использовать регулярные выражения. Ниже пример со множеством символов¶
sss = pdt2['1514pm2']
sss.str.replace('[().]', '')
sss
sss.str.replace('(', '-')
Посты чуть ниже также могут вас заинтересовать
Комментариев нет:
Отправить комментарий