Scrapoxy

What is Scrapoxy ?

http://scrapoxy.io

Scrapoxy hides your scraper behind a cloud.

It starts a pool of proxies to send your requests.

Now, you can crawl without thinking about blacklisting!

It is written in Javascript (ES6) with Node.js & AngularJS and it is open source!

How does Scrapoxy work ?

When Scrapoxy starts, it creates and manages a pool of proxies.
Your scraper uses Scrapoxy as a normal proxy.
Scrapoxy routes all requests through a pool of proxies.

What Scrapoxy does ?

Create your own proxies
Use multiple cloud providers (AWS, DigitalOcean, OVH, Vscale)
Rotate IP addresses
Impersonate known browsers
Exclude blacklisted instances
Monitor the requests
Detect bottleneck
Optimize the scraping

Why Scrapoxy doesn’t support anti-blacklisting ?

Anti-blacklisting is a job for the scraper.

When the scraper detects blacklisting, it asks Scrapoxy to remove the proxy from the proxies pool (through a REST API).

What is the best scraper framework to use with Scrapoxy ?

You could use the open source Scrapy framework (Python).

Does Scrapoxy have a SaaS mode or a support plan ?

Scrapoxy is an open source tool. Source code is highly maintained. You are very welcome to open an issue for features or bugs.

If you are looking for a commercial product in SaaS mode or with a support plan, we recommend you to check the ScrapingHub products (ScrapingHub is the company which maintains the Scrapy framework).

Documentation

You can begin with the Quick Start or look at the Changelog.

Now, you can continue with Standard, and become an expert with Advanced.

And complete with Tutorials.

Get Started

Standard

Advanced

Tutorials

Prerequisite

Node.js minimum version: 8.0.0

Contribute

You can open an issue on this repository for any feedback (bug, question, request, pull request, etc.).

License

See the License.

And don’t forget to be POLITE when you write your scrapers!