Web Scraper version: 0.3.7
Chrome version: 67.0.3396.99
OS: macOS High Sierra Version 10.13.5
Sitemap:
{"_id":"geospatial_companies","startUrl":["https://angel.co/geospatial"],"selectors":[{"id":"geo companies","type":"SelectorLink","selector":"div.name a.startup-link","parentSelectors":["_root","pagination"],"multiple":true,"delay":0},{"id":"company","type":"SelectorText","selector":"h1.u-fontWeight500","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"description","type":"SelectorText","selector":"h2.js-startup_high_concept p","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"city","type":"SelectorText","selector":"span.js-location_tags a.tag","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"business_area","type":"SelectorText","selector":"span.js-market_tags","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"description_long","type":"SelectorText","selector":"div.product_desc div.content","parentSelectors":["geo companies"],"multiple":false,"regex":"","delay":0},{"id":"investors","type":"SelectorLink","selector":"div.past_financing div.dsr31 div.name a","parentSelectors":["geo companies"],"multiple":true,"delay":0},{"id":"pagination","type":"SelectorElementClick","selector":"div.more","parentSelectors":["_root"],"multiple":true,"delay":0,"clickElementSelector":"div.more","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}
Hi,
I am keen to buy your packages for future work, but for my first task, I am already getting some problems and I am not sure how you guys could solve this.
I used the Chrome Add-On to scrape data on www.angel.co/geospatial , but after my first real attempt, I got some usefull data, however this website blocked my IP due to bot activity. The result was a lot of "null" data entries after the first few correct ones. I was able to access this webpage using VPN then, but I think this won't solve the problem because I've read that the VPN needs to be changed after each request while scraping data.
So I read that your cloud service would be able to change the VPN after each request. However, I just tried the same on your server and got the same results.
Can you explain the problem? Or is my sitemap not correct? My goal is to get company data and then do pagination through the "show more" element.
So I am not sure whether the problem is my sitemap or the IP block. Could you check this?
Error Message:
IP blocked & null data
To access error messages follow these steps:
- Open chrome://extensions/ or go to manage extensions
- Enable “developer mode” at the top right
- Open Web Scrapers “background page”
- A new popup window should appear.
- Go to “Console” tab. You should see Web Scraper log messages and errors there.