Поиск по блогу

четверг, 12 февраля 2015 г.

Читам 5 отличных статей про subprocess Python и смотрим три видеоролика

Нужно было выполнить рутинную работу по конвертации pdf файлов и последующео парсинга текстовых строк в таблицы. Мне не хотелось плодить временные файлы для каждого этапа конвертации, потому я сначала посмотрел видео и с большой неохотой прочитал таки малопонятную документацию..., зато потом с удовольствием прочитал все

Ниже просто черновые записи, сделанные при изучении subprocess. Лучше использовать поиск по странице, а не искать то, что надо.

In [1]:
import os   
In [12]:
p =os.system(r'dir "C:\Program Files\Xpdf\bin64"')
In [13]:
p
Out[13]:
0
In [8]:
os.listdir(r'C:\Program Files\Xpdf\bin64')
Out[8]:
['AEBru_2014_4.pdf',
 'demo1.pdf',
 'pdfdetach.exe',
 'pdffonts.exe',
 'pdfimages.exe',
 'pdfinfo.exe',
 'pdftohtml.exe',
 'pdftopng.exe',
 'pdftoppm.exe',
 'pdftops.exe',
 'pdftotext.exe',
 'xpdfrc',
 'xpdfrc.txt']
In [10]:
d=!dir "C:\Program Files\Xpdf\bin64"
In [11]:
d
Out[11]:
[' \x92\xae\xac \xa2 \xe3\xe1\xe2\xe0\xae\xa9\xe1\xe2\xa2\xa5 C \xad\xa5 \xa8\xac\xa5\xa5\xe2 \xac\xa5\xe2\xaa\xa8.',
 ' \x91\xa5\xe0\xa8\xa9\xad\xeb\xa9 \xad\xae\xac\xa5\xe0 \xe2\xae\xac\xa0: 6017-2A0B',
 '',
 ' \x91\xae\xa4\xa5\xe0\xa6\xa8\xac\xae\xa5 \xaf\xa0\xaf\xaa\xa8 C:\\Program Files\\Xpdf\\bin64',
 '',
 '04.02.2015  20:57    <DIR>          .',
 '04.02.2015  20:57    <DIR>          ..',
 '04.02.2015  20:52           464\xff648 AEBru_2014_4.pdf',
 '02.02.2015  21:23            98\xff291 demo1.pdf',
 '03.02.2015  17:31         1\xff086\xff464 pdfdetach.exe',
 '03.02.2015  17:31         1\xff103\xff360 pdffonts.exe',
 '03.02.2015  17:31         1\xff111\xff552 pdfimages.exe',
 '03.02.2015  17:31         1\xff101\xff824 pdfinfo.exe',
 '03.02.2015  17:31         2\xff450\xff432 pdftohtml.exe',
 '03.02.2015  17:31         2\xff293\xff760 pdftopng.exe',
 '03.02.2015  17:31         2\xff110\xff464 pdftoppm.exe',
 '03.02.2015  17:31         2\xff248\xff704 pdftops.exe',
 '03.02.2015  17:31         1\xff179\xff648 pdftotext.exe',
 '03.02.2015  20:04             3\xff587 xpdfrc',
 '03.02.2015  20:22             3\xff566 xpdfrc.txt',
 '              13 \xe4\xa0\xa9\xab\xae\xa2     15\xff256\xff300 \xa1\xa0\xa9\xe2',
 '               2 \xaf\xa0\xaf\xae\xaa  387\xff062\xff153\xff216 \xa1\xa0\xa9\xe2 \xe1\xa2\xae\xa1\xae\xa4\xad\xae']
In [1]:
!chcp 65001
Active code page: 65001

In [2]:
import subprocess
import re
In [13]:
import shlex
In []:
import subprocess
import re

varCommand = raw_input("ping")

myProcess = subprocess.Popen(
    [varCommand],
    stdout = subprocess.PIPE,
    stderr = subprocess.PIPE)

out, error = myProcess.communicate()

print out

On Windows with shell=True, the COMSPEC environment variable specifies the default shell. The only time you need to specify shell=True on Windows is when the command you wish to execute is built into the shell (e.g. dir or copy). You do not need shell=True to run a batch file or console-based executable.

In [32]:
subprocess.call(["dir", r"C:\Program Files\Xpdf\bin64"], shell=True)
Out[32]:
0
In [33]:
subprocess.check_output(["dir", r"C:\Program Files\Xpdf\bin64"], shell=True)
Out[33]:
' Volume in drive C has no label.\r\n Volume Serial Number is 6017-2A0B\r\n\r\n Directory of C:\\Program Files\\Xpdf\\bin64\r\n\r\n04.02.2015  20:57    <DIR>          .\r\n04.02.2015  20:57    <DIR>          ..\r\n04.02.2015  20:52           464\xc2\xa0648 AEBru_2014_4.pdf\r\n02.02.2015  21:23            98\xc2\xa0291 demo1.pdf\r\n03.02.2015  17:31         1\xc2\xa0086\xc2\xa0464 pdfdetach.exe\r\n03.02.2015  17:31         1\xc2\xa0103\xc2\xa0360 pdffonts.exe\r\n03.02.2015  17:31         1\xc2\xa0111\xc2\xa0552 pdfimages.exe\r\n03.02.2015  17:31         1\xc2\xa0101\xc2\xa0824 pdfinfo.exe\r\n03.02.2015  17:31         2\xc2\xa0450\xc2\xa0432 pdftohtml.exe\r\n03.02.2015  17:31         2\xc2\xa0293\xc2\xa0760 pdftopng.exe\r\n03.02.2015  17:31         2\xc2\xa0110\xc2\xa0464 pdftoppm.exe\r\n03.02.2015  17:31         2\xc2\xa0248\xc2\xa0704 pdftops.exe\r\n03.02.2015  17:31         1\xc2\xa0179\xc2\xa0648 pdftotext.exe\r\n03.02.2015  20:04             3\xc2\xa0587 xpdfrc\r\n03.02.2015  20:22             3\xc2\xa0566 xpdfrc.txt\r\n              13 File(s)     15\xc2\xa0256\xc2\xa0300 bytes\r\n               2 Dir(s)  386\xc2\xa0693\xc2\xa0574\xc2\xa0656 bytes free\r\n'
In [31]:
subprocess.check_output(["dir", "/W"], shell=True)
Out[31]:
' Volume in drive E is SL-63-X86_6\r\n Volume Serial Number is 2E3A-7167\r\n\r\n Directory of E:\\w8\\IPython Notebooks\\2015_2\r\n\r\n[.]                     [..]                    [.ipynb_checkpoints]\r\n7_kali_nodejs.ipynb     7_upt-get.ipynb         8_Scripted.ipynb\r\n8_cloud9.ipynb          6_forex.ipynb           6_aeb.ipynb\r\n4_comodo.ipynb          [pandas]                2_split.ipynb\r\n[old]                   8_Scripted.html         8_cloud9.html\r\n7_kali_nodejs.html      10_xpath.ipynb          7_upt-get.html\r\n11_carmailPrice.ipynb   10_xpath.html           11_carmailPrice.html\r\n11_osdir.ipynb          Untitled0.ipynb         \r\n              18 File(s)      5\xc2\xa0652\xc2\xa0117 bytes\r\n               5 Dir(s)     177\xc2\xa0782\xc2\xa0784 bytes free\r\n'
In [23]:
subprocess.check_output(["echo", "Hello World!"], shell=True)
Out[23]:
'"Hello World!"\r\n'
In [38]:
subprocess.check_output('echo "Hello World!"', shell=True)
Out[38]:
'"Hello World!"\r\n'
In [39]:
subprocess.check_output('echo Hello World!', shell=True)
Out[39]:
'Hello World!\r\n'
In [41]:
subprocess.check_output('dir /W ', shell=True)
Out[41]:
' Volume in drive E is SL-63-X86_6\r\n Volume Serial Number is 2E3A-7167\r\n\r\n Directory of E:\\w8\\IPython Notebooks\\2015_2\r\n\r\n[.]                     [..]                    [.ipynb_checkpoints]\r\n7_kali_nodejs.ipynb     7_upt-get.ipynb         8_Scripted.ipynb\r\n8_cloud9.ipynb          6_forex.ipynb           6_aeb.ipynb\r\n4_comodo.ipynb          [pandas]                2_split.ipynb\r\n[old]                   8_Scripted.html         8_cloud9.html\r\n7_kali_nodejs.html      10_xpath.ipynb          7_upt-get.html\r\n11_carmailPrice.ipynb   10_xpath.html           11_carmailPrice.html\r\n11_osdir.ipynb          Untitled0.ipynb         \r\n              18 File(s)      5\xc2\xa0659\xc2\xa0280 bytes\r\n               5 Dir(s)     177\xc2\xa0778\xc2\xa0688 bytes free\r\n'
In [37]:
goga= "wooow"
filename = input("What file would you like to display?\n")
What file would you like to display?
goga

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-37-50e4377e9a64> in <module>()
      1 goga= "wooow"
----> 2 filename = input("What file would you like to display?\n")

C:\Users\kiss\Anaconda\lib\site-packages\IPython\kernel\zmq\ipkernel.pyc in <lambda>(prompt)
    362         if content.get('allow_stdin', False):
    363             raw_input = lambda prompt='': self._raw_input(prompt, ident, parent)
--> 364             input = lambda prompt='': eval(raw_input(prompt))
    365         else:
    366             raw_input = input = lambda prompt='' : self._no_raw_input()

C:\Users\kiss\Anaconda\lib\site-packages\IPython\kernel\zmq\ipkernel.pyc in <module>()

NameError: name 'goga' is not defined

Note shlex.split() can be useful when determining the correct tokenization for args, especially in complex cases:

In []:
>>> import shlex, subprocess
>>> command_line = raw_input()
/bin/vikings -input eggs.txt -output "spam spam.txt" -cmd "echo '$MONEY'"
>>> args = shlex.split(command_line)
>>> print args
['/bin/vikings', '-input', 'eggs.txt', '-output', 'spam spam.txt', '-cmd', "echo '$MONEY'"]
>>> p = subprocess.Popen(args) # Success!
In []:
 
In [42]:
subprocess.STD_INPUT_HANDLE
Out[42]:
-10

On windows, if you pass a list for args, it will be turned into a string using the same rules as the MS C runtime. See the doc-string for subprocess.list2cmdline for more on this. Whereas on unix-like systems, even if you pass a string, its turned into a list of one item :).

In [43]:
help(subprocess.list2cmdline)
Help on function list2cmdline in module subprocess:

list2cmdline(seq)
    Translate a sequence of arguments into a command line
    string, using the same rules as the MS C runtime:
    
    1) Arguments are delimited by white space, which is either a
       space or a tab.
    
    2) A string surrounded by double quotation marks is
       interpreted as a single argument, regardless of white space
       contained within.  A quoted string can be embedded in an
       argument.
    
    3) A double quotation mark preceded by a backslash is
       interpreted as a literal double quotation mark.
    
    4) Backslashes are interpreted literally, unless they
       immediately precede a double quotation mark.
    
    5) If backslashes immediately precede a double quotation mark,
       every pair of backslashes is interpreted as a literal
       backslash.  If the number of backslashes is odd, the last
       backslash escapes the next double quotation mark as
       described in rule 3.




Посты чуть ниже также могут вас заинтересовать

Комментариев нет:

Отправить комментарий