Как все это запустить? Проще не бывает, если конечно запускать из консоли, а не из notebook. Был у меня провереный паук Dmoz. Скопировал его (со всеми остальными скриптами в Kali). И запустил из папки "spiders" командой "proxychains scrapy crawl dmoz". Правда, сначала пришлось запустить Tor из папки с демонами "/etc/init.d/tor start".
Здесь же собраны ссылки на файлы с настройками Tor, Polipo (альтернатива?) и документацию к Telnet
Здесь же собраны ссылки на файлы с настройками Tor, Polipo (альтернатива?) и документацию к Telnet
Ссылки, в которых есть пути к файлам и настройкам¶
Once you know TOR is working correctly, I recommend quiet mode in the proxychains.conf file.
TOR + Polipo = типа анонимность
Настройка tor + polipo + vidalia
Запускаем пример паука Scrapy dmoz из "Crawling Scrapy Tutorial" все работает
20.14. telnetlib — Telnet client
TOR + Polipo = типа анонимность
Настройка tor + polipo + vidalia
Запускаем пример паука Scrapy dmoz из "Crawling Scrapy Tutorial" все работает
20.14. telnetlib — Telnet client
В ссылках собраных выше есть подсказки о том, как работать с Tor. Это ведь сервер, и в Linux может легко управлятся из командной строки (команды хорошо знакомы по работе с сервером apache...)
In []:
root@kali:/home/kiss# /etc/init.d/tor status
[FAIL] tor is not running ... failed!
root@kali:/home/kiss# /etc/init.d/tor start
[ ok ] Starting tor daemon...done.
Результаты работы скрипта (почему DNS-сервер барахлит?)¶
In []:
kiss@kali:~/Desktop/w8/GitHub/dirbot/dirbot/spiders$ proxychains scrapy crawl dmoz
ProxyChains-3.1 (http://proxychains.sf.net)
2014-05-20 18:49:03+0400 [scrapy] INFO: Scrapy 0.14.4 started (bot: scrapybot)
|DNS-response|: kali is not exist
2014-05-20 18:49:04+0400 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
2014-05-20 18:49:04+0400 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-05-20 18:49:04+0400 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-05-20 18:49:04+0400 [scrapy] DEBUG: Enabled item pipelines: FilterWordsPipeline
2014-05-20 18:49:04+0400 [dmoz] INFO: Spider opened
2014-05-20 18:49:04+0400 [dmoz] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-05-20 18:49:04+0400 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2014-05-20 18:49:04+0400 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
|DNS-request| www.dmoz.org
|DNS-request| www.dmoz.org
|S-chain|-<>-127.0.0.1:9050-<><>-4.2.2.2:53-|S-chain|-<>-127.0.0.1:9050-<><>-4.2.2.2:53-<><>-OK
|S-chain|-<>-127.0.0.1:9050-<><>-4.2.2.2:53-<--timeout
|S-chain|-<>-127.0.0.1:9050-<><>-4.2.2.2:53-<><>-OK
|DNS-response| www.dmoz.org is 205.188.95.207
|S-chain|-<>-127.0.0.1:9050-<><>-205.188.95.207:80-<><>-OK
2014-05-20 18:49:21+0400 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2014-05-20 18:49:21+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/>
{'description': [u'Contains links to assorted resources from the Python universe, compiled by PythonWare.\r'],
'name': [u"eff-bot's Daily Python URL"],
'url': [u'http://www.pythonware.com/daily/']}
2014-05-20 18:49:21+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/>
{'description': [u'A directory of free Python and Zope hosting providers, with reviews and ratings.\r'],
'name': [u'Free Python and Zope Hosting Directory'],
'url': [u'http://www.oinko.net/freepython/']}
2014-05-20 18:49:21+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/>
{'description': [u'Features Python books, resources, news and articles.\r'],
'name': [u"O'Reilly Python Center"],
'url': [u'http://oreilly.com/python/']}
2014-05-20 18:49:21+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/>
{'description': [u'Resources for reporting bugs, accessing the Python source tree with CVS and taking part in the development of Python.\r'],
'name': [u"Python Developer's Guide"],
'url': [u'https://www.python.org/dev/']}
2014-05-20 18:49:21+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/>
{'description': [u'Scripts, examples and news about Python programming for the Windows platform.\r'],
'name': [u'Social Bug'],
'url': [u'http://win32com.goermezer.de/']}
<--denied
|DNS-response|: www.dmoz.org is not exist
2014-05-20 18:49:22+0400 [dmoz] DEBUG: Retrying <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (failed 1 times): DNS lookup failed: address 'www.dmoz.org' not found: [Errno 1] Unknown error.
|S-chain|-<>-127.0.0.1:9050-<><>-205.188.95.207:80-<><>-OK
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Wesley J. Chun; Prentice Hall PTR, 2001, ISBN 0130260363. For experienced developers to improve extant skills; professional level examples. Starts by introducing syntax, objects, error handling, functions, classes, built-ins. [Prentice Hall]\r'],
'name': [u'Core Python Programming'],
'url': [u'http://www.pearsonhighered.com/educator/academic/product/0,,0130260363,00%2Ben-USS_01DBC.html']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'The primary goal of this book is to promote object-oriented design using Python and to illustrate the use of the emerging object-oriented design patterns.\r'],
'name': [u'Data Structures and Algorithms with Object-Oriented Design Patterns in Python'],
'url': [u'http://www.brpreiss.com/books/opus7/html/book.html']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Mark Pilgrim, Guide to Python 3 and its differences from Python 2. Each chapter starts with a real code sample and explains it fully. Has a comprehensive appendix of all the syntactic and semantic changes in Python 3\r'],
'name': [u'Dive Into Python 3'],
'url': [u'http://www.diveintopython.net/']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'This book covers a wide range of topics. From raw TCP and UDP to encryption with TSL, and then to HTTP, SMTP, POP, IMAP, and ssh. It gives you a good understanding of each field and how to do everything on the network with Python.\r'],
'name': [u'Foundations of Python Network Programming'],
'url': [u'http://rhodesmill.org/brandon/2011/foundations-of-python-network-programming/']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'Free Python books and tutorials.\r'],
'name': [u'Free Python books'],
'url': [u'http://www.techbooksforfree.com/perlpython.shtml']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'Annotated list of free online books on Python scripting language. Topics range from beginner to advanced.\r'],
'name': [u'FreeTechBooks: Python Scripting Language'],
'url': [u'http://www.freetechbooks.com/python-f6.html']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Allen B. Downey, Jeffrey Elkner, Chris Meyers; Green Tea Press, 2002, ISBN 0971677506. Teaches general principles of programming, via Python as subject language. Thorough, in-depth approach to many basic and intermediate programming topics.
Full text online and downloads: HTML, PDF, PS, LaTeX. [Free, Green Tea Press]\r'],
'name': [u'How to Think Like a Computer Scientist: Learning with Python'],
'url': [u'http://greenteapress.com/thinkpython/']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Guido van Rossum, Fred L. Drake, Jr.; Network Theory Ltd., 2003, ISBN 0954161769. Printed edition of official tutorial, for v2.x, from Python.org. [Network Theory, online]\r'],
'name': [u'An Introduction to Python'],
'url': [u'http://www.network-theory.co.uk/python/intro/']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'Book by Alan Gauld with full text online. Introduction for those learning programming basics: terminology, concepts, methods to write code. Assumes no prior knowledge but basic computer skills.\r'],
'name': [u'Learn to Program Using Python'],
'url': [u'http://www.freenetpages.co.uk/hp/alan.gauld/']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Rashi Gupta; John Wiley and Sons, 2002, ISBN 0471219754. Covers language basics, use for CGI scripting, GUI development, network programming; shows why it is one of more sophisticated of popular scripting languages. [Wiley]\r'],
'name': [u'Making Use of Python'],
'url': [u'http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471219754.html']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Magnus Lie Hetland; Apress LP, 2002, ISBN 1590590066. Readable guide to ideas most vital to new users, from basics common to high level languages, to more specific aspects, to a series of 10 ever more complex programs. [Apress]\r'],
'name': [u'Practical Python'],
'url': [u'http://hetland.org/writing/practical-python/']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Rytis Sileika, ISBN13: 978-1-4302-2605-5, Uses real-world system administration examples like manage devices with SNMP and SOAP, build a distributed monitoring system, manage web applications and parse complex log files, monitor and manage MySQL databases.\r'],
'name': [u'Pro Python System Administration'],
'url': [u'http://sysadminpy.com/']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'A Complete Introduction to the Python 3.\r'],
'name': [u'Programming in Python 3 (Second Edition)'],
'url': [u'http://www.qtrac.eu/py3book.html']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Dave Brueck, Stephen Tanner; John Wiley and Sons, 2001, ISBN 0764548077. Full coverage, clear explanations, hands-on examples, full language reference; shows step by step how to use components, assemble them, form full-featured programs. [John Wiley and Sons]\r'],
'name': [u'Python 2.1 Bible'],
'url': [u'http://www.wiley.com/WileyCDA/WileyTitle/productCd-0764548077.html']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'A step-by-step tutorial for OOP in Python 3, including discussion and examples of abstraction, encapsulation, information hiding, and raise, handle, define, and manipulate exceptions.\r'],
'name': [u'Python 3 Object Oriented Programming'],
'url': [u'https://www.packtpub.com/python-3-object-oriented-programming/book']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Guido van Rossum, Fred L. Drake, Jr.; Network Theory Ltd., 2003, ISBN 0954161785. Printed edition of official language reference, for v2.x, from Python.org, describes syntax, built-in datatypes. [Network Theory, online]\r'],
'name': [u'Python Language Reference Manual'],
'url': [u'http://www.network-theory.co.uk/python/language/']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Thomas W. Christopher; Prentice Hall PTR, 2002, ISBN 0130409561. Shows how to write large programs, introduces powerful design patterns that deliver high levels of robustness, scalability, reuse.\r'],
'name': [u'Python Programming Patterns'],
'url': [u'http://www.pearsonhighered.com/educator/academic/product/0,,0130409561,00%2Ben-USS_01DBC.html']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u"By Richard Hightower; Addison-Wesley, 2002, 0201616165. Begins with Python basics, many exercises, interactive sessions. Shows programming novices concepts and practical methods. Shows programming experts Python's abilities and ways to interface with Java APIs. [publisher website]\r"],
'name': [u'Python Programming with the Java Class Libraries: A Tutorial for Building Web and Enterprise Applications with Jython'],
'url': [u'http://www.informit.com/store/product.aspx?isbn=0201616165&redir=1']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Chris Fehily; Peachpit Press, 2002, ISBN 0201748843. Task-based, step-by-step visual reference guide, many screen shots, for courses in digital graphics; Web design, scripting, development; multimedia, page layout, office tools, operating systems. [Prentice Hall]\r'],
'name': [u'Python: Visual QuickStart Guide'],
'url': [u'http://www.pearsonhighered.com/educator/academic/product/0,,0201748843,00%2Ben-USS_01DBC.html']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Ivan Van Laningham; Sams Publishing, 2000, ISBN 0672317354. Split into 24 hands-on, 1 hour lessons; steps needed to learn topic: syntax, language features, OO design and programming, GUIs (Tkinter), system administration, CGI. [Sams Publishing]\r'],
'name': [u'Sams Teach Yourself Python in 24 Hours'],
'url': [u'http://www.informit.com/store/product.aspx?isbn=0672317354']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By David Mertz; Addison Wesley. Book in progress, full text, ASCII format. Asks for feedback. [author website, Gnosis Software, Inc.]\r'],
'name': [u'Text Processing in Python'],
'url': [u'http://gnosis.cx/TPiP/']}
2014-05-20 18:49:23+0400 [dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'description': [u'By Sean McGrath; Prentice Hall PTR, 2000, ISBN 0130211192, has CD-ROM. Methods to build XML applications fast, Python tutorial, DOM and SAX, new Pyxie open source XML processing library. [Prentice Hall PTR]\r'],
'name': [u'XML Processing with Python'],
'url': [u'http://www.informit.com/store/product.aspx?isbn=0130211192']}
2014-05-20 18:49:23+0400 [dmoz] INFO: Closing spider (finished)
2014-05-20 18:49:23+0400 [dmoz] INFO: Dumping spider stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/twisted.internet.error.DNSLookupError': 1,
'downloader/request_bytes': 783,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'downloader/response_bytes': 16597,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2014, 5, 20, 14, 49, 23, 612867),
'item_scraped_count': 27,
'scheduler/memory_enqueued': 3,
'start_time': datetime.datetime(2014, 5, 20, 14, 49, 4, 263127)}
2014-05-20 18:49:23+0400 [dmoz] INFO: Spider closed (finished)
2014-05-20 18:49:23+0400 [scrapy] INFO: Dumping global stats:
{'memusage/max': 43130880, 'memusage/startup': 43130880}
kiss@kali:~/Desktop/w8/GitHub/dirbot/dirbot/spiders$
Посты чуть ниже также могут вас заинтересовать
Комментариев нет:
Отправить комментарий