IP blocked even on cloud service?

Web Scraper version: 0.3.7
Chrome version: 67.0.3396.99
OS: macOS High Sierra Version 10.13.5

Sitemap:

{"_id":"geospatial_companies","startUrl":["https://angel.co/geospatial"],"selectors":[{"id":"geo companies","type":"SelectorLink","selector":"div.name a.startup-link","parentSelectors":["_root","pagination"],"multiple":true,"delay":0},{"id":"company","type":"SelectorText","selector":"h1.u-fontWeight500","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"description","type":"SelectorText","selector":"h2.js-startup_high_concept p","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"city","type":"SelectorText","selector":"span.js-location_tags a.tag","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"business_area","type":"SelectorText","selector":"span.js-market_tags","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"description_long","type":"SelectorText","selector":"div.product_desc div.content","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"investors","type":"SelectorLink","selector":"div.past_financing div.dsr31 div.name a","parentSelectors":["geo companies"],"multiple":true,"delay":0},{"id":"pagination","type":"SelectorElementClick","selector":"div.more","parentSelectors":["_root"],"multiple":true,"delay":0,"clickElementSelector":"div.more","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

Hi,

I am keen to buy your packages for future work, but for my first task, I am already getting some problems and I am not sure how you guys could solve this.

I used the Chrome Add-On to scrape data on www.angel.co/geospatial , but after my first real attempt, I got some usefull data, however this website blocked my IP due to bot activity. The result was a lot of "null" data entries after the first few correct ones. I was able to access this webpage using VPN then, but I think this won't solve the problem because I've read that the VPN needs to be changed after each request while scraping data.

So I read that your cloud service would be able to change the VPN after each request. However, I just tried the same on your server and got the same results.

Can you explain the problem? Or is my sitemap not correct? My goal is to get company data and then do pagination through the "show more" element.

So I am not sure whether the problem is my sitemap or the IP block. Could you check this?

Error Message:

IP blocked & null data

To access error messages follow these steps:

  1. Open chrome://extensions/ or go to manage extensions
  2. Enable “developer mode” at the top right
  3. Open Web Scrapers “background page”
  4. A new popup window should appear.
  5. Go to “Console” tab. You should see Web Scraper log messages and errors there.

Hi!

You have to set at least 2000 ms (2 seconds) delay for Link and Element Click selectors, as if delay is set to 0 (empty value also equals 0), all buttons and links are pressed simultaneously, resulting in block by website as your behavior really looks non-human made clicks.

I had some trouble with angelist also. I even tried delays of 9000 and it eventually captcha'd and blocked me. There are VPNs that will rotate your IP address, that's the only work around I was able to do.

This sitemap got me a CAPTCHA is 2 minutes flat. Where else can I delay?

{"_id":"angellist-people-only-test","startUrl":["https://angel.co/job-collections/50-hot-consumer-fintech-startups?email_uid=657516231&utm_campaign=talent_newsletter-newsletter&utm_content=50-hot-fintech-startups&utm_medium=email&utm_source=talent_newsletter-newsletter&utm_term="],"selectors":[{"id":"Company Link","type":"SelectorLink","selector":"h3.s-h3 a","parentSelectors":["_root"],"multiple":true,"delay":"4000"},{"id":"Name","type":"SelectorText","selector":"h1.u-fontWeight500","parentSelectors":["Company Link"],"multiple":false,"regex":"","delay":0},{"id":"Element Select","type":"SelectorElementClick","selector":"div.group:nth-of-type(1) div.g-lockup","parentSelectors":["Company Link"],"multiple":true,"delay":"4000","clickElementSelector":"div.group:nth-of-type(1) a.view_all","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Employee Name","type":"SelectorText","selector":"div.name a.profile-link","parentSelectors":["Element Select"],"multiple":false,"regex":"","delay":0},{"id":"Employee Title","type":"SelectorText","selector":"div.role_title","parentSelectors":["Element Select"],"multiple":false,"regex":"","delay":0},{"id":"Profile Link","type":"SelectorLink","selector":"div.name a.profile-link","parentSelectors":["Element Select"],"multiple":false,"delay":"4000"},{"id":"Linkedin Profile","type":"SelectorElementAttribute","selector":"a.icon.fontello-linkedin","parentSelectors":["Profile Link"],"multiple":false,"extractAttribute":"href","delay":0},{"id":"Catagory","type":"SelectorText","selector":".tag:first-child","parentSelectors":["Profile Link"],"multiple":false,"regex":"","delay":0},{"id":"Cand-Location","type":"SelectorText","selector":".tag:nth-child(2)","parentSelectors":["Profile Link"],"multiple":false,"regex":"","delay":0},{"id":"AngelList Profile","type":"SelectorElementAttribute","selector":"div.name a.profile-link","parentSelectors":["Element Select"],"multiple":false,"extractAttribute":"href","delay":0}]}

@KristapsWS can tell you more about Cloud Service VPN.

On the other hand, there's a paid VPN available that will change your ip after some time, i can only recommend googling it up.

You can still get the links to Job pages using a single run, and then use it in a separate sitemap with long delays like 5-10 seconds.