четверг, 12 февраля 2015 г.

Читам 5 отличных статей про subprocess Python и смотрим три видеоролика

Нужно было выполнить рутинную работу по конвертации pdf файлов и последующео парсинга текстовых строк в таблицы. Мне не хотелось плодить временные файлы для каждого этапа конвертации, потому я сначала посмотрел видео и с большой неохотой прочитал таки малопонятную документацию..., зато потом с удовольствием прочитал все

Ниже просто черновые записи, сделанные при изучении subprocess. Лучше использовать поиск по странице, а не искать то, что надо.

In [1]:
import os   
In [12]:
p =os.system(r'dir "C:\Program Files\Xpdf\bin64"')
In [13]:
In [8]:
os.listdir(r'C:\Program Files\Xpdf\bin64')
In [10]:
d=!dir "C:\Program Files\Xpdf\bin64"
In [11]:
[' \x92\xae\xac \xa2 \xe3\xe1\xe2\xe0\xae\xa9\xe1\xe2\xa2\xa5 C \xad\xa5 \xa8\xac\xa5\xa5\xe2 \xac\xa5\xe2\xaa\xa8.',
 ' \x91\xa5\xe0\xa8\xa9\xad\xeb\xa9 \xad\xae\xac\xa5\xe0 \xe2\xae\xac\xa0: 6017-2A0B',
 ' \x91\xae\xa4\xa5\xe0\xa6\xa8\xac\xae\xa5 \xaf\xa0\xaf\xaa\xa8 C:\\Program Files\\Xpdf\\bin64',
 '04.02.2015  20:57    <DIR>          .',
 '04.02.2015  20:57    <DIR>          ..',
 '04.02.2015  20:52           464\xff648 AEBru_2014_4.pdf',
 '02.02.2015  21:23            98\xff291 demo1.pdf',
 '03.02.2015  17:31         1\xff086\xff464 pdfdetach.exe',
 '03.02.2015  17:31         1\xff103\xff360 pdffonts.exe',
 '03.02.2015  17:31         1\xff111\xff552 pdfimages.exe',
 '03.02.2015  17:31         1\xff101\xff824 pdfinfo.exe',
 '03.02.2015  17:31         2\xff450\xff432 pdftohtml.exe',
 '03.02.2015  17:31         2\xff293\xff760 pdftopng.exe',
 '03.02.2015  17:31         2\xff110\xff464 pdftoppm.exe',
 '03.02.2015  17:31         2\xff248\xff704 pdftops.exe',
 '03.02.2015  17:31         1\xff179\xff648 pdftotext.exe',
 '03.02.2015  20:04             3\xff587 xpdfrc',
 '03.02.2015  20:22             3\xff566 xpdfrc.txt',
 '              13 \xe4\xa0\xa9\xab\xae\xa2     15\xff256\xff300 \xa1\xa0\xa9\xe2',
 '               2 \xaf\xa0\xaf\xae\xaa  387\xff062\xff153\xff216 \xa1\xa0\xa9\xe2 \xe1\xa2\xae\xa1\xae\xa4\xad\xae']
In [1]:
!chcp 65001
Active code page: 65001

In [2]:
import subprocess
import re
In [13]:
import shlex
In []:
import subprocess
import re

varCommand = raw_input("ping")

myProcess = subprocess.Popen(
    stdout = subprocess.PIPE,
    stderr = subprocess.PIPE)

out, error = myProcess.communicate()

print out

On Windows with shell=True, the COMSPEC environment variable specifies the default shell. The only time you need to specify shell=True on Windows is when the command you wish to execute is built into the shell (e.g. dir or copy). You do not need shell=True to run a batch file or console-based executable.

In [32]:
subprocess.call(["dir", r"C:\Program Files\Xpdf\bin64"], shell=True)
In [33]:
subprocess.check_output(["dir", r"C:\Program Files\Xpdf\bin64"], shell=True)
' Volume in drive C has no label.\r\n Volume Serial Number is 6017-2A0B\r\n\r\n Directory of C:\\Program Files\\Xpdf\\bin64\r\n\r\n04.02.2015  20:57    <DIR>          .\r\n04.02.2015  20:57    <DIR>          ..\r\n04.02.2015  20:52           464\xc2\xa0648 AEBru_2014_4.pdf\r\n02.02.2015  21:23            98\xc2\xa0291 demo1.pdf\r\n03.02.2015  17:31         1\xc2\xa0086\xc2\xa0464 pdfdetach.exe\r\n03.02.2015  17:31         1\xc2\xa0103\xc2\xa0360 pdffonts.exe\r\n03.02.2015  17:31         1\xc2\xa0111\xc2\xa0552 pdfimages.exe\r\n03.02.2015  17:31         1\xc2\xa0101\xc2\xa0824 pdfinfo.exe\r\n03.02.2015  17:31         2\xc2\xa0450\xc2\xa0432 pdftohtml.exe\r\n03.02.2015  17:31         2\xc2\xa0293\xc2\xa0760 pdftopng.exe\r\n03.02.2015  17:31         2\xc2\xa0110\xc2\xa0464 pdftoppm.exe\r\n03.02.2015  17:31         2\xc2\xa0248\xc2\xa0704 pdftops.exe\r\n03.02.2015  17:31         1\xc2\xa0179\xc2\xa0648 pdftotext.exe\r\n03.02.2015  20:04             3\xc2\xa0587 xpdfrc\r\n03.02.2015  20:22             3\xc2\xa0566 xpdfrc.txt\r\n              13 File(s)     15\xc2\xa0256\xc2\xa0300 bytes\r\n               2 Dir(s)  386\xc2\xa0693\xc2\xa0574\xc2\xa0656 bytes free\r\n'
In [31]:
subprocess.check_output(["dir", "/W"], shell=True)
' Volume in drive E is SL-63-X86_6\r\n Volume Serial Number is 2E3A-7167\r\n\r\n Directory of E:\\w8\\IPython Notebooks\\2015_2\r\n\r\n[.]                     [..]                    [.ipynb_checkpoints]\r\n7_kali_nodejs.ipynb     7_upt-get.ipynb         8_Scripted.ipynb\r\n8_cloud9.ipynb          6_forex.ipynb           6_aeb.ipynb\r\n4_comodo.ipynb          [pandas]                2_split.ipynb\r\n[old]                   8_Scripted.html         8_cloud9.html\r\n7_kali_nodejs.html      10_xpath.ipynb          7_upt-get.html\r\n11_carmailPrice.ipynb   10_xpath.html           11_carmailPrice.html\r\n11_osdir.ipynb          Untitled0.ipynb         \r\n              18 File(s)      5\xc2\xa0652\xc2\xa0117 bytes\r\n               5 Dir(s)     177\xc2\xa0782\xc2\xa0784 bytes free\r\n'
In [23]:
subprocess.check_output(["echo", "Hello World!"], shell=True)
'"Hello World!"\r\n'
In [38]:
subprocess.check_output('echo "Hello World!"', shell=True)
'"Hello World!"\r\n'
In [39]:
subprocess.check_output('echo Hello World!', shell=True)
'Hello World!\r\n'
In [41]:
subprocess.check_output('dir /W ', shell=True)
' Volume in drive E is SL-63-X86_6\r\n Volume Serial Number is 2E3A-7167\r\n\r\n Directory of E:\\w8\\IPython Notebooks\\2015_2\r\n\r\n[.]                     [..]                    [.ipynb_checkpoints]\r\n7_kali_nodejs.ipynb     7_upt-get.ipynb         8_Scripted.ipynb\r\n8_cloud9.ipynb          6_forex.ipynb           6_aeb.ipynb\r\n4_comodo.ipynb          [pandas]                2_split.ipynb\r\n[old]                   8_Scripted.html         8_cloud9.html\r\n7_kali_nodejs.html      10_xpath.ipynb          7_upt-get.html\r\n11_carmailPrice.ipynb   10_xpath.html           11_carmailPrice.html\r\n11_osdir.ipynb          Untitled0.ipynb         \r\n              18 File(s)      5\xc2\xa0659\xc2\xa0280 bytes\r\n               5 Dir(s)     177\xc2\xa0778\xc2\xa0688 bytes free\r\n'
In [37]:
goga= "wooow"
filename = input("What file would you like to display?\n")
What file would you like to display?

NameError                                 Traceback (most recent call last)
<ipython-input-37-50e4377e9a64> in <module>()
      1 goga= "wooow"
----> 2 filename = input("What file would you like to display?\n")

C:\Users\kiss\Anaconda\lib\site-packages\IPython\kernel\zmq\ipkernel.pyc in <lambda>(prompt)
    362         if content.get('allow_stdin', False):
    363             raw_input = lambda prompt='': self._raw_input(prompt, ident, parent)
--> 364             input = lambda prompt='': eval(raw_input(prompt))
    365         else:
    366             raw_input = input = lambda prompt='' : self._no_raw_input()

C:\Users\kiss\Anaconda\lib\site-packages\IPython\kernel\zmq\ipkernel.pyc in <module>()

NameError: name 'goga' is not defined

Note shlex.split() can be useful when determining the correct tokenization for args, especially in complex cases:

In []:
>>> import shlex, subprocess
>>> command_line = raw_input()
/bin/vikings -input eggs.txt -output "spam spam.txt" -cmd "echo '$MONEY'"
>>> args = shlex.split(command_line)
>>> print args
['/bin/vikings', '-input', 'eggs.txt', '-output', 'spam spam.txt', '-cmd', "echo '$MONEY'"]
>>> p = subprocess.Popen(args) # Success!
In []:
In [42]:

On windows, if you pass a list for args, it will be turned into a string using the same rules as the MS C runtime. See the doc-string for subprocess.list2cmdline for more on this. Whereas on unix-like systems, even if you pass a string, its turned into a list of one item :).

In [43]:
Help on function list2cmdline in module subprocess:

    Translate a sequence of arguments into a command line
    string, using the same rules as the MS C runtime:
    1) Arguments are delimited by white space, which is either a
       space or a tab.
    2) A string surrounded by double quotation marks is
       interpreted as a single argument, regardless of white space
       contained within.  A quoted string can be embedded in an
    3) A double quotation mark preceded by a backslash is
       interpreted as a literal double quotation mark.
    4) Backslashes are interpreted literally, unless they
       immediately precede a double quotation mark.
    5) If backslashes immediately precede a double quotation mark,
       every pair of backslashes is interpreted as a literal
       backslash.  If the number of backslashes is odd, the last
       backslash escapes the next double quotation mark as
       described in rule 3.

