Поиск по блогу

среда, 11 февраля 2015 г.

Найти все имена файлов в папке при помощи os.listdir(path)

Взять все (однотипные) файлы в папке, да и конвертировать. Как их перебирать? Можно все (вместе с содержимым файлов) загнать в словарь... А молжно проще... Здесь фрагменты кода с os.listdir(path) и итерации по файлам в папке

In []:
!C:\Program Files\Xpdf\bin64\pdftotext.exe
In []:
!cmd.exe
In []:
import os
path = r'C:\abc\def\ghi'  # remove the trailing '\'
data = {}
for dir_entry in os.listdir(path):
    dir_entry_path = os.path.join(path, dir_entry)
    if os.path.isfile(dir_entry_path):
        with open(dir_entry_path, 'r') as my_file:
            data[dir_entry] = my_file.read()
In [2]:
import os
path = r'C:\Users\kiss\Documents\Xpdf\aebru_2014_all'  # remove the trailing '\'
data = {}
for dir_entry in os.listdir(path):
    dir_entry_path = os.path.join(path, dir_entry)
    if os.path.isfile(dir_entry_path):
        with open(dir_entry_path, 'r') as my_file:
            data[dir_entry] = my_file.read()
In [7]:
os.listdir(path)
Out[7]:
['eng_car-sales-in-april-2014.pdf',
 'eng_car-sales-in-august-2014.pdf',
 'eng_car-sales-in-december-2014.pdf',
 'eng_car-sales-in-july-2014.pdf',
 'eng_car-sales-in-june-2014.pdf',
 'eng_car-sales-in-may-2014.pdf',
 'eng_car-sales-in-november-2014.pdf',
 'eng_car-sales-in-october-2014.pdf',
 'eng_car-sales-in-september-2014.pdf',
 'sales-in-december_2013_eng_final.pdf',
 'sales-in-february_2014_eng_final.pdf',
 'sales-in-january_2014_eng_final_1.pdf',
 'sales-in-march_2014_eng_final.pdf']
In [8]:
data['eng_car-sales-in-april-2014.pdf']
Out[8]:
'%PDF-1.5\n%\xb5\xb5\xb5\xb5\n1 0 obj\n<</Type/Catalog/Pages 2 0 R/Lang(ru-RU) /StructTreeRoot 47 0 R/MarkInfo<</Marked true>>>>\nendobj\n2 0 obj\n<</Type/Pages/Count 6/Kids[ 3 0 R 34 0 R 36 0 R 40 0 R 42 0 R 44 0 R] >>\nendobj\n3 0 obj\n<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 7 0 R/F3 9 0 R/F4 15 0 R/F5 17 0 R/F6 19 0 R/F7 21 0 R/F8 25 0 R/F9 27 0 R>>/XObject<</Image14 14 0 R/Image23 23 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/Annots[ 32 0 R 33 0 R] /MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>\nendobj\n4 0 obj\n<</Filter/FlateDecode/Length 6805>>\nstream\nx\x9c\xb5]ms\xdb8\x92\xfe\x9e\xaa\xfc\x07~\xd9+j7\xa2\x89W\x92S)\xd5\xca\x8e\xbd\x9b\xb9d7;\x93\xbd\xab\xdb\xcc~\x90%Y\xd6F\x96|\x92\x9cY\xdf\xaf\xbf\xee\x06H\x02$A\xd1\x8eg\xa6dQ$\xd0\x00'
In [10]:
appGet("c:\windows\system32\cmd.exe")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-df7e798e3851> in <module>()
----> 1 appGet("c:\windows\system32\cmd.exe")

NameError: name 'appGet' is not defined
In []:
 
In [1]:
!tree C:\Users\kiss\Documents\Xpdf /F
‘вагЄвга  Ї Ї®Є
‘ҐаЁ©­л© ­®¬Ґа ⮬ : 00000005 6017:2A0B
C:\USERS\KISS\DOCUMENTS\XPDF
і   AEB_2014_4.txt
і   AEB_2014_4_tab3.csv
і   text1.txt
і   
ГДДДAEBru_2014_4
і       AEBru_2014_4_01.pdf
і       AEBru_2014_4_02.pdf
і       AEBru_2014_4_03.pdf
і       AEBru_2014_4_04.pdf
і       AEBru_2014_4_05.pdf
і       doc_data.txt
і       
ГДДДaebru_2014_all
і       eng_car-sales-in-april-2014.pdf
і       eng_car-sales-in-august-2014.pdf
і       eng_car-sales-in-december-2014.pdf
і       eng_car-sales-in-july-2014.pdf
і       eng_car-sales-in-june-2014.pdf
і       eng_car-sales-in-may-2014.pdf
і       eng_car-sales-in-november-2014.pdf
і       eng_car-sales-in-october-2014.pdf
і       eng_car-sales-in-september-2014.pdf
і       sales-in-december_2013_eng_final.pdf
і       sales-in-february_2014_eng_final.pdf
і       sales-in-january_2014_eng_final_1.pdf
і       sales-in-march_2014_eng_final.pdf
і       
ГДДДAEB_2014_4
і       index.html
і       page1.html
і       page1.png
і       page2.html
і       page2.png
і       page3.html
і       page3.png
і       page4.html
і       page4.png
і       page5.html
і       page5.png
і       
ГДДДaeb_papers
і       aeb-bq_04_2014_12.12.pdf
і       aeb-rem_4_2014_web (1).pdf
і       aeb-rem_4_2014_web (2).pdf
і       aeb-rem_4_2014_web.pdf
і       aeb_pp_lr.pdf
і       Newsletter_February_2015.pdf
і       
АДДДtext1.html
        index.html
        page1.html
        page1.png
        page2.html
        page2.png
        page3.html
        page3.png
        

To iterate through all the files within the specified directory (folder), with ability to use wildcards (*, ?, and [ ]-style ranges), use the following code snippet:

In []:
import os
import glob
 
path = 'sequences/'
for infile in glob.glob( os.path.join(path, '*.fasta') ):
    print "current file is: " + infile
In []:
If you do not need wildcards, then there is a simpler way to list all items in a directory:
In []:
import os
 
path = 'sequences/'
listing = os.listdir(path)
for infile in listing:
    print "current file is: " + infile

print was promoted from a statement to a function in Python 3 (use print(infile) instead of print infile).

One should use ‘os.path.join()’ part to make the script cross-platform-portable (different OS use different path separators, and hard-coding path separator would stop the script from executing under a different OS).

Python docs mention that there is also iglob(), which is an iterator and thus working on directories with way too many files it will save memory by returning only single result per iteration, and not the whole list of files – as glob() does.



Посты чуть ниже также могут вас заинтересовать

Комментариев нет:

Отправить комментарий