Сюда я скопировал текст из консоли после выполнения команды "scrapy crawl dmoz". Дабы был пример, что видно в консоли в случае успешной работы. Сам паук находится у меня в компьютере W8 (C:). Далее начну с ним экспериментировать...
Идем в корневую паку проекта (там, где .cfg) и запускаем паука из консоли¶
In []:
To put our spider to work, go to the project’s top level directory and run:
In []:
C:\Users\kiss\Documents\GitHub\dirbot>scrapy crawl dmoz
2014-04-22 21:30:48+0400 [scrapy] INFO: Scrapy 0.20.1 started (bot: scrapybot)
2014-04-22 21:30:48+0400 [scrapy] DEBUG: Optional features available: ssl, http11, boto, django
2014-04-22 21:30:48+0400 [scrapy] DEBUG: Overridden settings: {'DEFAULT_ITEM_CLASS': 'dirbot.items.Website', 'NEWSPID
rbot.spiders', 'SPIDER_MODULES': ['dirbot.spiders']}
2014-04-22 21:30:51+0400 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreSte
2014-04-22 21:30:54+0400 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddlewar
dleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddlew
dleware, ChunkedTransferMiddleware, DownloaderStats
2014-04-22 21:30:54+0400 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererM
engthMiddleware, DepthMiddleware
C:\Users\kiss\Anaconda\lib\site-packages\scrapy\contrib\pipeline\__init__.py:21: ScrapyDeprecationWarning: ITEM_PIPEL
a list or a set is deprecated, switch to a dict
category=ScrapyDeprecationWarning, stacklevel=1)
2014-04-22 21:30:54+0400 [scrapy] DEBUG: Enabled item pipelines: FilterWordsPipeline
2014-04-22 21:30:54+0400 [dmoz] INFO: Spider opened
2014-04-22 21:30:54+0400 [dmoz] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-04-22 21:30:54+0400 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2014-04-22 21:30:54+0400 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-04-22 21:30:55+0400 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/
r: None)
C:\Users\kiss\Anaconda\lib\site-packages\scrapy\selector\lxmlsel.py:20: ScrapyDeprecationWarning: HtmlXPathSelector i
nstanciate scrapy.selector.Selector instead
category=ScrapyDeprecationWarning, stacklevel=1)
dirbot\spiders\dmoz.py:24: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
sites = hxs.select('//ul[@class="directory-url"]/li')
dirbot\spiders\dmoz.py:29: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
item['name'] = site.select('a/text()').extract()
dirbot\spiders\dmoz.py:30: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
item['url'] = site.select('a/@href').extract()
dirbot\spiders\dmoz.py:31: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
item['description'] = site.select('text()').re('-\s([^\n]*?)\\n')
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/
ferer: None)
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Wesley J. Chun; Prentice Hall PTR, 2001, ISBN 0130260363. For experienced developers to
skills; professional level examples. Starts by introducing syntax, objects, error handling, functions, classes, buil
e Hall]\r'],
'name': [u'Core Python Programming'],
'url': [u'http://www.pearsonhighered.com/educator/academic/product/0,,0130260363,00%2Ben-USS_01DBC.html']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'The primary goal of this book is to promote object-oriented design using Python and to ill
of the emerging object-oriented design patterns.\r'],
'name': [u'Data Structures and Algorithms with Object-Oriented Design Patterns in Python'],
'url': [u'http://www.brpreiss.com/books/opus7/html/book.html']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Mark Pilgrim, Guide to Python 3 and its differences from Python 2. Each chapter starts
de sample and explains it fully. Has a comprehensive appendix of all the syntactic and semantic changes in Python 3\r
'name': [u'Dive Into Python 3'],
'url': [u'http://www.diveintopython.net/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/R
{'description': [u'Contains links to assorted resources from the Python universe, compiled by PythonWare.\r']
'name': [u"eff-bot's Daily Python URL"],
'url': [u'http://www.pythonware.com/daily/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'This book covers a wide range of topics. From raw TCP and UDP to encryption with TSL, and
MTP, POP, IMAP, and ssh. It gives you a good understanding of each field and how to do everything on the network with
'name': [u'Foundations of Python Network Programming'],
'url': [u'http://rhodesmill.org/brandon/2011/foundations-of-python-network-programming/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'Free Python books and tutorials.\r'],
'name': [u'Free Python books'],
'url': [u'http://www.techbooksforfree.com/perlpython.shtml']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/R
{'description': [u'A directory of free Python and Zope hosting providers, with reviews and ratings.\r'],
'name': [u'Free Python and Zope Hosting Directory'],
'url': [u'http://www.oinko.net/freepython/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/R
{'description': [u'Features Python books, resources, news and articles.\r'],
'name': [u"O'Reilly Python Center"],
'url': [u'http://oreilly.com/python/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'Annotated list of free online books on Python scripting language. Topics range from beginn
\r'],
'name': [u'FreeTechBooks: Python Scripting Language'],
'url': [u'http://www.freetechbooks.com/python-f6.html']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Allen B. Downey, Jeffrey Elkner, Chris Meyers; Green Tea Press, 2002, ISBN 0971677506.
principles of programming, via Python as subject language. Thorough, in-depth approach to many basic and intermediat
opics. Full text online and downloads: HTML, PDF, PS, LaTeX. [Free, Green Tea Press]\r'],
'name': [u'How to Think Like a Computer Scientist: Learning with Python'],
'url': [u'http://greenteapress.com/thinkpython/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/R
{'description': [u'Resources for reporting bugs, accessing the Python source tree with CVS and taking part in
t of Python.\r'],
'name': [u"Python Developer's Guide"],
'url': [u'http://www.python.org/dev/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/R
{'description': [u'Scripts, examples and news about Python programming for the Windows platform.\r'],
'name': [u'Social Bug'],
'url': [u'http://win32com.goermezer.de/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Guido van Rossum, Fred L. Drake, Jr.; Network Theory Ltd., 2003, ISBN 0954161769. Print
fficial tutorial, for v2.x, from Python.org. [Network Theory, online]\r'],
'name': [u'An Introduction to Python'],
'url': [u'http://www.network-theory.co.uk/python/intro/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'Book by Alan Gauld with full text online. Introduction for those learning programming basi
, concepts, methods to write code. Assumes no prior knowledge but basic computer skills.\r'],
'name': [u'Learn to Program Using Python'],
'url': [u'http://www.freenetpages.co.uk/hp/alan.gauld/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Rashi Gupta; John Wiley and Sons, 2002, ISBN 0471219754. Covers language basics, use fo
, GUI development, network programming; shows why it is one of more sophisticated of popular scripting languages. [Wi
'name': [u'Making Use of Python'],
'url': [u'http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471219754.html']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Magnus Lie Hetland; Apress LP, 2002, ISBN 1590590066. Readable guide to ideas most vita
from basics common to high level languages, to more specific aspects, to a series of 10 ever more complex programs.
'name': [u'Practical Python'],
'url': [u'http://hetland.org/writing/practical-python/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Rytis Sileika, ISBN13: 978-1-4302-2605-5, Uses real-world system administration exampl
devices with SNMP and SOAP, build a distributed monitoring system, manage web applications and parse complex log file
manage MySQL databases.\r'],
'name': [u'Pro Python System Administration'],
'url': [u'http://www.sysadminpy.com/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'A Complete Introduction to the Python 3.\r'],
'name': [u'Programming in Python 3 (Second Edition)'],
'url': [u'http://www.qtrac.eu/py3book.html']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Dave Brueck, Stephen Tanner; John Wiley and Sons, 2001, ISBN 0764548077. Full coverage,
ions, hands-on examples, full language reference; shows step by step how to use components, assemble them, form full-
ms. [John Wiley and Sons]\r'],
'name': [u'Python 2.1 Bible'],
'url': [u'http://www.wiley.com/WileyCDA/WileyTitle/productCd-0764548077.html']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'A step-by-step tutorial for OOP in Python 3, including discussion and examples of abstract
ion, information hiding, and raise, handle, define, and manipulate exceptions.\r'],
'name': [u'Python 3 Object Oriented Programming'],
'url': [u'https://www.packtpub.com/python-3-object-oriented-programming/book']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Guido van Rossum, Fred L. Drake, Jr.; Network Theory Ltd., 2003, ISBN 0954161785. Print
fficial language reference, for v2.x, from Python.org, describes syntax, built-in datatypes. [Network Theory, online]
'name': [u'Python Language Reference Manual'],
'url': [u'http://www.network-theory.co.uk/python/language/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Thomas W. Christopher; Prentice Hall PTR, 2002, ISBN 0130409561. Shows how to write lar
troduces powerful design patterns that deliver high levels of robustness, scalability, reuse.\r'],
'name': [u'Python Programming Patterns'],
'url': [u'http://www.pearsonhighered.com/educator/academic/product/0,,0130409561,00%2Ben-USS_01DBC.html']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u"By Richard Hightower; Addison-Wesley, 2002, 0201616165. Begins with Python basics, many ex
ctive sessions. Shows programming novices concepts and practical methods. Shows programming experts Python's abilitie
nterface with Java APIs. [publisher website]\r"],
'name': [u'Python Programming with the Java Class Libraries: A Tutorial for Building Web and Enterprise Appl
ython'],
'url': [u'http://www.informit.com/store/product.aspx?isbn=0201616165&redir=1']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Chris Fehily; Peachpit Press, 2002, ISBN 0201748843. Task-based, step-by-step visual re
many screen shots, for courses in digital graphics; Web design, scripting, development; multimedia, page layout, offi
ting systems. [Prentice Hall]\r'],
'name': [u'Python: Visual QuickStart Guide'],
'url': [u'http://www.pearsonhighered.com/educator/academic/product/0,,0201748843,00%2Ben-USS_01DBC.html']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Ivan Van Laningham; Sams Publishing, 2000, ISBN 0672317354. Split into 24 hands-on, 1 h
eps needed to learn topic: syntax, language features, OO design and programming, GUIs (Tkinter), system administratio
ublishing]\r'],
'name': [u'Sams Teach Yourself Python in 24 Hours'],
'url': [u'http://www.informit.com/store/product.aspx?isbn=0672317354']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By David Mertz; Addison Wesley. Book in progress, full text, ASCII format. Asks for feedba
site, Gnosis Software, Inc.]\r'],
'name': [u'Text Processing in Python'],
'url': [u'http://gnosis.cx/TPiP/']}
2014-04-22 21:30:56+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/B
{'description': [u'By Sean McGrath; Prentice Hall PTR, 2000, ISBN 0130211192, has CD-ROM. Methods to build XM
fast, Python tutorial, DOM and SAX, new Pyxie open source XML processing library. [Prentice Hall PTR]\r'],
'name': [u'XML Processing with Python'],
'url': [u'http://www.informit.com/store/product.aspx?isbn=0130211192']}
2014-04-22 21:30:56+0400 [dmoz] INFO: Closing spider (finished)
2014-04-22 21:30:56+0400 [dmoz] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 530,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 16434,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2014, 4, 22, 17, 30, 56, 531000),
'item_scraped_count': 27,
'log_count/DEBUG': 35,
'log_count/INFO': 3,
'response_received_count': 2,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2014, 4, 22, 17, 30, 54, 874000)}
2014-04-22 21:30:56+0400 [dmoz] INFO: Spider closed (finished)
C:\Users\kiss\Documents\GitHub\dirbot>
Посты чуть ниже также могут вас заинтересовать
C:\Users\kiss\Anaconda>cd C:\Users\kiss\Documents\GitHub\dirbot
ОтветитьУдалитьC:\Users\kiss\Documents\GitHub\dirbot>scrapy shell "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"
2014-07-12 08:31:52+0400 [scrapy] INFO: Scrapy 0.20.1 started (bot: scrapybot)
2014-07-12 08:31:52+0400 [scrapy] DEBUG: Optional features available: ssl, http11, boto, django
2014-07-12 08:31:52+0400 [scrapy] DEBUG: Overridden settings: {'DEFAULT_ITEM_CLASS': 'dirbot.items.Website', 'NEWSPIDER_MODULE': 'di
rbot.spiders', 'SPIDER_MODULES': ['dirbot.spiders'], 'LOGSTATS_INTERVAL': 0}
2014-07-12 08:31:56+0400 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-07-12 08:31:59+0400 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMid
dleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMid
dleware, ChunkedTransferMiddleware, DownloaderStats
2014-07-12 08:31:59+0400 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlL
engthMiddleware, DepthMiddleware
C:\Users\kiss\Anaconda\lib\site-packages\scrapy\contrib\pipeline\__init__.py:21: ScrapyDeprecationWarning: ITEM_PIPELINES defined as
a list or a set is deprecated, switch to a dict
category=ScrapyDeprecationWarning, stacklevel=1)
2014-07-12 08:31:59+0400 [scrapy] DEBUG: Enabled item pipelines: FilterWordsPipeline
2014-07-12 08:31:59+0400 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2014-07-12 08:31:59+0400 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-07-12 08:31:59+0400 [dmoz] INFO: Spider opened
2014-07-12 08:32:00+0400 [dmoz] DEBUG: Crawled (200) (refere
r: None)
[s] Available Scrapy objects:
[s] item {}
[s] request
[s] response <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
[s] sel \r\n<**head>\r\n<**meta http-equ'>
[s] settings >
[s] spider
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
C:\Users\kiss\Anaconda\lib\site-packages\IPython\frontend.py:30: UserWarning: The top-level `frontend` package has been deprecated.
All its subpackages have been moved to the top `IPython` level.
warn("The top-level `frontend` package has been deprecated. "