Здесь около трех десятков ссылок для понимания Nodejs и того, как можно парсить html-страницы... И копипаст перечня "I'm trying to put list of possible solutions for browser automatic tests suits and headless browser platforms capable of scraping"
Создаём своё первое десктопное приложение при помощи HTML, JS и Node-WebKit
node-webkit is renamed NW.js
BROWSER TESTING / SCRAPING:
Сборка NW.js (node-webkit) приложения с помощью Web2Executable
NW.js lets you call all Node.js modules directly from DOM and enables a new way of writing applications with all Web technologies. It was previously known as "node-webkit" project.
Построение node-webkit приложения. Введение
Построение node-webkit приложения. Логика
А как еще можно парсить с помощью Nodejs¶
Web scraping with Node.js
Web Scraping in Node.js
Node.js & Socket.io Chat Part One: The Basics - Create a basic chat application using node.js, socket.io, and express by the end of this video. Let me know what additional features you want in future videos
Request - Simplified HTTP client
socket.io-computer A collaborative virtual machine where players take turns in controlling it.
socket.io
cheerio Fast, flexible, and lean implementation of core jQuery designed specifically for the server
The Express philosophy is to provide small, robust tooling for HTTP servers, making it a great solution for single page applications, web sites, hybrids, or public HTTP APIs.
Promise-based HTML/XML parser and web scraper for NodeJS.
tubes.io
Сервис Краулинга 1DMP
I'm trying to put list of possible solutions for browser automatic tests suits and headless browser platforms capable of scraping.
BROWSER TESTING / SCRAPING:
- Selenium - polyglot flagship in browser automation, bindings for Python, Ruby, JavaScript, C#, Haskell and more, IDE for Firefox (as an extension) for faster test deployment. Can act as a Server and has tons of features.
JAVASCRIPT
- PhantomJS - JavaScript, headless testing with screen capture and automation, uses Webkit. As of version 1.8 Selenium's WebDriver API is implemented, so you can use any WebDriver binding and tests will be compatible with Selenium
- SlimerJS - similar to PhantomJS, uses Gecko (Firefox) instead of WebKit
- CasperJS - JavaScript, build on both PhantomJS and SlimerJS, has extra features
- Ghost Driver - JavaScript implementation of the WebDriver Wire Protocol for PhantomJS.
- new PhantomCSS - CSS regression testing. A CasperJS module for automating visual regression testing with PhantomJS and Resemble.js.
- new WebdriverCSS - plugin for Webdriver.io for automating visual regression testing
- new PhantomFlow - Describe and visualize user flows through tests. An experimental approach to Web user interface testing.
- new trifleJS - ports the PhantomJS API to use the Internet Explorer engine.
- new CasperJS IDE (commercial)
NODE.JS
- Node-phantom - bridges the gap between PhantomJS and node.js
- WebDriverJs - Selenium WebDriver bindings for node.js by Selenium Team
- WD.js - node module for WebDriver/Selenium 2
- yiewd - WD.js wrapper using latest Harmony generators! Get rid of the callback pyramid with yield
- ZombieJs - Insanely fast, headless full-stack testing using node.js
- NightwatchJs - Node JS based testing solution using Selenium Webdriver
- Chimera - Chimera: can do everything what phantomJS does, but in a full JS environment
- Dalek.js - Automated cross browser testing with JavaScript through Selenium Webdriver
- Webdriver.io - better implementation of WebDriver bindings with predefined 50+ actions
- new Nightmare - PhantomJS bridge with a high-level API. It uses PhantomJS-Node under the hood.
WEB SCRAPING / MINING
- Scrapy - Python, mainly a scraper/miner - fast, well documented and, can be linked with Django Dynamic Scraper for nice mining deployments, or Scrapy Cloud for PaaS (server-less) deployment, works in terminal or an server stand-alone proces, can be used with Celery, built on top of Twisted
- Snailer - node.js module, untested yet.
- Node-Crawler - node.js module, untested yet.
ONLINE TOOLS
- new CasperBox - Run CasperJS scripts online
RELATED LINKS & RESOURCES
- Comparsion of Webscraping software
- new Resemble.js : Image analysis and comparison
Questions:
- Any pure Node.js solution or Nodejs to PhanthomJS/CasperJS module that actually works and is documented?
Answer: Chimera seems to go in that direction, checkout Chimera
Other solutions capable of easier JavaScript injection then Selenium?
Do you know any pure ruby solutions?
Answer: Checkout the list created by rjk with ruby based solutions
- Do you know any related tech or solution?
Feel free to reedit this question and add content as you wish! Thank you for your contributions!
Updates
- added SlimerJS to the list
- added Snailer and Node-Crawler and Node-phantom
- added Yiewd WebDriver wrapper
- added WebDriverJs and WD.js
- added Ghost Driver
- added Comparsion of Webscraping software on Screen Scraper Blog
- added ZombieJs
- added Resemble.js and PhantomCSS and PhantomFlow, categorised and reedited content
- 04.01.2014, added Chimera, answered 2 questions
- added NightWatchJs
- added DalekJS
- added WebdriverCSS
- added CasperBox
- added trifleJS
- added CasperJS IDE
- added Nightmare
Посты чуть ниже также могут вас заинтересовать
Я буду рекомендовать octoparse . Это бесплатный веб- скребок .
http://www.octoparse.com/
Спасибо, посмотрим при случае.
Удалить