Поиск по блогу

пятница, 18 апреля 2014 г.

Первая попытка установить Portia оказалась неудачной. Пришлось отложить...

Не хочется тратить слишком много времени на задачи, которые пока плохо понятны. Заинтересовал пакет Portia ....использует Scrapy, Scrapely, Twisted... Все это надо изучать, потому попытался установить пакет, но не смог "сходу" разобраться с virtualenv... Ну и ладно, чтобы любопытство не стало пороком.... Отложил задачу до лучших времен. Может быть у них документация поприличней появится... или установщик нормальный напишут.
There are two main components in this repository, slyd and slybot: slyd
The visual editor used to create your scraping projects. slybot
The Python web crawler that performs the actual site scraping. It's implemented on top of the Scrapy web crawling framework and the Scrapely extraction library. It uses projects created with slyd as input.
In []:
[]()
<br/>[portia Visual scraping for Scrapy](https://github.com/scrapinghub/portia)
<br/>[Slybot 0.9 documentation](http://slybot.readthedocs.org/en/latest/)
<br/>[Scrapely](https://github.com/scrapy/scrapely)
<br/>[Twisted Documentation](https://twistedmatrix.com/trac/wiki/Documentation)Twisted is an event-driven networking engine written in Python
<br/>[]()
<br/>[]()
<br/>[]()
<br/>[]()
In []:
#How to try it:
#The recommended way to install dependencies is to use virtualenv and then do:

pip install -r requirements.txt

#Run the server using:
twistd -n slyd

#and point your browser to: http://localhost:9001/static/main.html
#Chrome and Firefox are supported, but it works better with chrome.
Что это за файл с требованиями? Получается, что установить необходимые программы можно, если прочитать дополнительные команды из файла requirements.txt
In []:
twisted
scrapy
loginform
lxml
jsonschema
-e git://github.com/scrapy/scrapely.git#egg=scrapely
-e ../slybot
In []:
C:\Users\kiss\Anaconda\Lib\site-packages\portia-master\slyd>pip install -r requirements.txt
Obtaining file:///C:/Users/kiss/Anaconda/Lib/site-packages/portia-master/slybot (from -r requirements.txt (line 7))
  Running setup.py (path:C:/Users/kiss/Anaconda/Lib/site-packages/portia-master/slybot\setup.py) egg_info for package from file:///C
:/Users/kiss/Anaconda/Lib/site-packages/portia-master/slybot
    C:\Users\kiss\Anaconda\lib\distutils\dist.py:267: UserWarning: Unknown distribution option: 'tests_requires'
      warnings.warn(msg)

    package init file 'slybot\tests\__init__.py' not found (or not a regular file)
Requirement already satisfied (use --upgrade to upgrade): twisted in c:\users\kiss\anaconda\lib\site-packages (from -r requirements.
txt (line 1))
Requirement already satisfied (use --upgrade to upgrade): scrapy in c:\users\kiss\anaconda\lib\site-packages (from -r requirements.t
xt (line 2))
Downloading/unpacking loginform (from -r requirements.txt (line 3))
  Downloading loginform-1.0.tar.gz
  Running setup.py (path:c:\users\kiss\appdata\local\temp\pip_build_kiss\loginform\setup.py) egg_info for package loginform

Requirement already satisfied (use --upgrade to upgrade): lxml in c:\users\kiss\anaconda\lib\site-packages (from -r requirements.txt
 (line 4))
Downloading/unpacking jsonschema (from -r requirements.txt (line 5))
  Downloading jsonschema-2.3.0-py2.py3-none-any.whl
Obtaining scrapely from git+git://github.com/scrapy/scrapely.git#egg=scrapely (from -r requirements.txt (line 6))
  Cloning git://github.com/scrapy/scrapely.git to c:\users\kiss\anaconda\lib\site-packages\portia-master\slyd\src\scrapely
Cleaning up...
Cannot find command 'git'
Storing debug log for failure in C:\Users\kiss\pip\pip.log

C:\Users\kiss\Anaconda\Lib\site-packages\portia-master\slyd>
In [9]:
#Проверяем, установлены ли программы:
#    c:\users\kiss\anaconda\lib\site-packages\portia-master\slyd\src\scrapely
# и не находим здесь папки scrapely
# но находим pip-delete-this-directory.txt   
#
%load C:\Users\kiss\Anaconda\Lib\site-packages\portia-master\slyd\src\pip-delete-this-directory.txt
In []:
This file is placed here by pip to indicate the source was put
here by pip.

Once this package is successfully installed this source code will be
deleted (unless you remove this file).
In [6]:
#Не очень понятно, почему так произошло, может быть от того, что у меня не был установлен дистрибутив scrapely
# Устанавливаем

C:\Users\kiss\Anaconda>pip install scrapely
Downloading/unpacking scrapely
  Downloading scrapely-0.10.tar.gz
  Running setup.py (path:c:\users\kiss\appdata\local\temp\pip_build_kiss\scrapely\setup.py) egg_info for package scrapely

Requirement already satisfied (use --upgrade to upgrade): numpy in c:\users\kiss\anaconda\lib\site-packages (from scrapely)
Requirement already satisfied (use --upgrade to upgrade): w3lib in c:\users\kiss\anaconda\lib\site-packages (from scrapely)
Requirement already satisfied (use --upgrade to upgrade): six>=1.4.1 in c:\users\kiss\anaconda\lib\site-packages (from w3lib->scrape
ly)
Installing collected packages: scrapely
  Running setup.py install for scrapely

Successfully installed scrapely
Cleaning up...

C:\Users\kiss\Anaconda>
In []:
# Повторяем установку
pip install -r requirements.txt
In []:
C:\Users\kiss\Anaconda\Lib\site-packages\portia-master\slyd>pip install -r requirements.txt
Obtaining file:///C:/Users/kiss/Anaconda/Lib/site-packages/portia-master/slybot (from -r requirements.txt (line 7))
  Running setup.py (path:C:/Users/kiss/Anaconda/Lib/site-packages/portia-master/slybot\setup.py) egg_info for package from file:///C
:/Users/kiss/Anaconda/Lib/site-packages/portia-master/slybot
    C:\Users\kiss\Anaconda\lib\distutils\dist.py:267: UserWarning: Unknown distribution option: 'tests_requires'
      warnings.warn(msg)

    package init file 'slybot\tests\__init__.py' not found (or not a regular file)
Requirement already satisfied (use --upgrade to upgrade): twisted in c:\users\kiss\anaconda\lib\site-packages (from -r requirements.
txt (line 1))
Requirement already satisfied (use --upgrade to upgrade): scrapy in c:\users\kiss\anaconda\lib\site-packages (from -r requirements.t
xt (line 2))
Downloading/unpacking loginform (from -r requirements.txt (line 3))
  Downloading loginform-1.0.tar.gz
  Running setup.py (path:c:\users\kiss\appdata\local\temp\pip_build_kiss\loginform\setup.py) egg_info for package loginform

Requirement already satisfied (use --upgrade to upgrade): lxml in c:\users\kiss\anaconda\lib\site-packages (from -r requirements.txt
 (line 4))
Downloading/unpacking jsonschema (from -r requirements.txt (line 5))
  Downloading jsonschema-2.3.0-py2.py3-none-any.whl
Obtaining scrapely from git+git://github.com/scrapy/scrapely.git#egg=scrapely (from -r requirements.txt (line 6))
  Cloning git://github.com/scrapy/scrapely.git to c:\users\kiss\anaconda\lib\site-packages\portia-master\slyd\src\scrapely
Cleaning up...
Cannot find command 'git'
Storing debug log for failure in C:\Users\kiss\pip\pip.log

C:\Users\kiss\Anaconda\Lib\site-packages\portia-master\slyd>
In []:
#Проверяем C:\Users\kiss\Anaconda\Lib\site-packages\portia-master\slyd\src...
#    ничего не изменилось

Пытаемся запустить сервер

In []:
C:\Users\kiss\Anaconda\Lib\site-packages\portia-master\slyd>C:\Users\kiss\Anaconda\Scripts\twistd.py -n slyd
Usage: twistd [options]
Options:
      --savestats      save the Stats object rather than the text output of the
                       profiler.
  -o, --no_save        do not save state on shutdown
  -e, --encrypted      The specified tap/aos file is encrypted.
  -n, --nodaemon       (for backwards compatability).
  -l, --logfile=       log to a specified file, - for stdout
      --logger=        A fully-qualified name to a log observer factory to use
                       for the initial log observer. Takes precedence over
                       --logfile and --syslog (when available).
  -p, --profile=       Run in profile mode, dumping results to specified file
      --profiler=      Name of the profiler to use (profile, cprofile, hotshot).
                       [default: hotshot]
  -f, --file=          read the given .tap file [default: twistd.tap]
  -y, --python=        read an application from within a Python file (implies
                       -o)
  -s, --source=        Read an application from a .tas file (AOT format).
  -d, --rundir=        Change to a supplied directory before running [default:
                       .]
      --help-reactors  Display a list of possibly available reactor names.
      --version        Print version information and exit.
      --spew           Print an insanely verbose log of everything that happens.
                       Useful when debugging freezes or locks in complex code.
  -b, --debug          Run the application in the Python Debugger (implies
                       nodaemon), sending SIGUSR2 will drop into debugger
  -r, --reactor=       Which reactor to use (see --help-reactors for a list of
                       possibilities)
      --help           Display this help and exit.

twistd reads a twisted.application.service.Application out of a file and runs
it.
Commands:
    conch            A Conch SSH service.
    dns              A domain name server.
    ftp              An FTP server.
    inetd            An inetd(8) replacement.
    mail             An email service
    manhole          An interactive remote debugger service accessible via
                     telnet and ssh and providing syntax coloring and basic line
                     editing functionality.
    manhole-old      An interactive remote debugger service.
    news             A news server.
    portforward      A simple port-forwarder.
    procmon          A process watchdog / supervisor
    slyd             A server for creating scrapely spiders
    socks            A SOCKSv4 proxy service.
    telnet           A simple, telnet-based remote debugging service.
    web              A general-purpose web server which can serve from a
                     filesystem or application resource.
    words            A modern words server
    xmpp-router      An XMPP Router server


C:\Users\kiss\Anaconda\Lib\site-packages\portia-master\slyd>
Что-то не так, почему выдается сообщение хелпера. Не лезем вглубь, но внимательно читаем руководство по установке... Обращаем внимание на том, что там используется virtualenv... Пробуем установить:
In []:
C:\Users\kiss\Anaconda>pip install virtualenv

WARNING: using virtualenv with Anaconda is untested and not recommended.
    We suggest using the conda command to create environments instead.
    For more information about creating conda environments, please see:
         http://docs.continuum.io/conda/examples/create.html

Proceed (y/n)? n

C:\Users\kiss\Anaconda>
Попробуем выполнить рекомендации, действительно, я забыл про Conda. Я об этой команде писал в этом блоге "Conda update conda"
Далее я выполнил сначала обновления conda и anaconda, а потом не обнаружил virtualenv..., а потом попытался разобраться с Twisted, посмотрел видео Architecting an Event-driven Networking Engine: Twisted Python Оказалось, что Jessica McKellar слишком быстро говорит, потому нашел текст на YouTube
Просмотрел половину ролика, понял, что event-driven - это асинхронные запросы, что Scrapy использует Twisted ... и решил, что проще будет сначала освоить Scrapy... Так что здесь прерываемся в расчете на то, что через месяц я вернусь к этой теме. Вот к этой записной книжке C:\Users\kiss\Documents\IPython Notebooks\web\proxy\pyProxy


Посты чуть ниже также могут вас заинтересовать

Комментариев нет:

Отправить комментарий