Поиск по блогу

вторник, 29 апреля 2014 г.

Скрипт для импорта RSSS c delicious.com и первые неудачные попытки вызывать его из модуля

Здесь я дублирую скрипт (patterns), который написал ранее. Почему эти десять десять строчек кода нужно каждый раз копировать и вставлять в ячейку? Проще запускать их из собственного модуля. Здесь я распечатываю манул по команде %run?, но последующая попытка запустить модуль оканчивается ошибкой. Спустя три дня (изучив PYTHONPATH) возвращаюсь к этой работе и пробую прописать переменные окружения... Но сначала надо прочитать %run? Некогда. Пост не дописан..., но публикую.
In [1]:
from IPython.display import HTML, Image
import pattern.web
# We have to determine 'tag1+tag2+tag3' for delicious
# And number of items for parsing from RSS
number=10
tags='scrapy+proxy'
url = 'http://delicious.com/v2/rss/altersego2007/' + tags
results = pattern.web.Newsfeed().search(url, count=number)
str=''
for result in results:
    str = str + '<br/><a href='+ result.url + ' target="_blank">' + 'ref ' + result.title + '</a>  ' + result.text
HTML(str)
Out[1]:

ref Search · scrapy list Идея поиска на GitHub по специальным терминам, например "scrapy list"
ref pondering/scrapy-linkedin Using Scrapy to get Linkedin's person public profile.
ref yanfeiSun/proxycrawler get list of usable proxies based scrapy from websites
ref aivarsk/scrapy-proxies Еще одна библиотека для изучения (прокси-листы)
ref darthbear/scrapy-proxynova The first run will generate the list of proxies from http://proxynova.com and store it in the cache. It will individually check each proxy to see if they work and remove the ones that timed out or cannot connect to.
In [2]:
%run?
In []:
Type:       Magic function
String Form:<bound method ExecutionMagics.run of <IPython.core.magics.execution.ExecutionMagics object at 0x00000000081A74E0>>
Namespace:  IPython internal
File:       c:\users\kiss\anaconda\lib\site-packages\ipython\core\magics\execution.py
Definition: %run(self, parameter_s='', runner=None, file_finder=<function get_py_filename at 0x0000000002A68B38>)
Docstring:
Run the named file inside IPython as a program.

Usage:
  %run [-n -i -e -G]
       [( -t [-N<N>] | -d [-b<N>] | -p [profile options] )]
       ( -m mod | file ) [args]

Parameters after the filename are passed as command-line arguments to
the program (put in sys.argv). Then, control returns to IPython's
prompt.

This is similar to running at a system prompt:
  $ python file args
but with the advantage of giving you IPython's tracebacks, and of
loading all variables into your interactive namespace for further use
(unless -p is used, see below).

The file is executed in a namespace initially consisting only of
__name__=='__main__' and sys.argv constructed as indicated. It thus
sees its environment as if it were being run as a stand-alone program
(except for sharing global objects such as previously imported
modules). But after execution, the IPython interactive namespace gets
updated with all variables defined in the program (except for __name__
and sys.argv). This allows for very convenient loading of code for
interactive work, while giving each program a 'clean sheet' to run in.

Arguments are expanded using shell-like glob match.  Patterns
'*', '?', '[seq]' and '[!seq]' can be used.  Additionally,
tilde '~' will be expanded into user's home directory.  Unlike
real shells, quotation does not suppress expansions.  Use
*two* back slashes (e.g., '\\*') to suppress expansions.
To completely disable these expansions, you can use -G flag.

Options:

-n: __name__ is NOT set to '__main__', but to the running file's name
without extension (as python does under import).  This allows running
scripts and reloading the definitions in them without calling code
protected by an ' if __name__ == "__main__" ' clause.

-i: run the file in IPython's namespace instead of an empty one. This
is useful if you are experimenting with code written in a text editor
which depends on variables defined interactively.

-e: ignore sys.exit() calls or SystemExit exceptions in the script
being run.  This is particularly useful if IPython is being used to
run unittests, which always exit with a sys.exit() call.  In such
cases you are interested in the output of the test results, not in
seeing a traceback of the unittest module.

-t: print timing information at the end of the run.  IPython will give
you an estimated CPU time consumption for your script, which under
Unix uses the resource module to avoid the wraparound problems of
time.clock().  Under Unix, an estimate of time spent on system tasks
is also given (for Windows platforms this is reported as 0.0).

If -t is given, an additional -N<N> option can be given, where <N>
must be an integer indicating how many times you want the script to
run.  The final timing report will include total and per run results.

For example (testing the script uniq_stable.py)::

    In [1]: run -t uniq_stable

    IPython CPU timings (estimated):
      User  :    0.19597 s.
      System:        0.0 s.

    In [2]: run -t -N5 uniq_stable

    IPython CPU timings (estimated):
    Total runs performed: 5
      Times :      Total       Per run
      User  :   0.910862 s,  0.1821724 s.
      System:        0.0 s,        0.0 s.

-d: run your program under the control of pdb, the Python debugger.
This allows you to execute your program step by step, watch variables,
etc.  Internally, what IPython does is similar to calling:

  pdb.run('execfile("YOURFILENAME")')

with a breakpoint set on line 1 of your file.  You can change the line
number for this automatic breakpoint to be <N> by using the -bN option
(where N must be an integer).  For example::

  %run -d -b40 myscript

will set the first breakpoint at line 40 in myscript.py.  Note that
the first breakpoint must be set on a line which actually does
something (not a comment or docstring) for it to stop execution.

Or you can specify a breakpoint in a different file::

  %run -d -b myotherfile.py:20 myscript

When the pdb debugger starts, you will see a (Pdb) prompt.  You must
first enter 'c' (without quotes) to start execution up to the first
breakpoint.

Entering 'help' gives information about the use of the debugger.  You
can easily see pdb's full documentation with "import pdb;pdb.help()"
at a prompt.

-p: run program under the control of the Python profiler module (which
prints a detailed report of execution times, function calls, etc).

You can pass other options after -p which affect the behavior of the
profiler itself. See the docs for %prun for details.

In this mode, the program's variables do NOT propagate back to the
IPython interactive namespace (because they remain in the namespace
where the profiler executes them).

Internally this triggers a call to %prun, see its documentation for
details on the options available specifically for profiling.

There is one special usage for which the text above doesn't apply:
if the filename ends with .ipy, the file is run as ipython script,
just as if the commands were written on IPython prompt.

-m: specify module name to load instead of script path. Similar to
the -m option for the python interpreter. Use this option last if you
want to combine with other %run options. Unlike the python interpreter
only source modules are allowed no .pyc or .pyo files.
For example::

    %run -m example

will run the example module.

-G: disable shell-like glob expansion of arguments.
In [6]:
%load C:\\Users\\kiss\\Documents\\IPython Notebooks\\Scripts\\dlicious.py
In []:
from IPython.display import HTML, Image
import pattern.web
# We have to determine 'tag1+tag2+tag3' for delicious
# And number of items for parsing from RSS
number=10
tags='scrapy+proxy'
url = 'http://delicious.com/v2/rss/altersego2007/' + tags
results = pattern.web.Newsfeed().search(url, count=number)
str=''
for result in results:
    str = str + '<br/><a href='+ result.url + ' target="_blank">' + 'ref ' + result.title + '</a>  ' + result.text
HTML(str)


Посты чуть ниже также могут вас заинтересовать

Комментариев нет:

Отправить комментарий