Scraper doesn't skip the pages unable to scrape!

akshay.kotha · December 22, 2017, 6:12am

I observed that there are some new changes (scraping takes place in a new window and there is a 'refresh' button in the web scraper interface. When I tried to scrape a website, it doesn't skip those pages where the structure is different. It got stuck, while this didn't happen with the previous version of webscraper.

Is there a bug?

Regards
Akshay

martins · December 22, 2017, 7:58am

We have partly released a new version of web scraper. Can you share the sitemap so we can check what is the problem?

akshay.kotha · March 14, 2018, 11:10am

It got stuck after visiting one page. Don't know which one is that.

I am attaching a screenshot.

Akshay

KristapsWS · March 14, 2018, 11:19am

Can you please post your sitemap and tell us which version of web scraper extension and OS version are you using?

akshay.kotha · March 14, 2018, 11:33am

{"_id":"tru_ca_nest","startUrl":["https://www.tru.ca/nursing/faculty.html","https://www.tru.ca/science/programs/physics/faculty.html","https://www.tru.ca/science/programs/nrs/faculty.html","https://www.tru.ca/science/programs/math/contact.html","https://www.tru.ca/science/programs/msces/faculty.html","https://www.tru.ca/science/programs/compsci/people.html","https://www.tru.ca/science/programs/chemistry/faculty.html","https://www.tru.ca/science/programs/aret/contactus.html","https://www.tru.ca/science/programs/aht/diploma/faculty.html","https://www.tru.ca/arts/sociology-anthropology/faculty.html","https://www.tru.ca/arts/journalism/faculty.html","https://www.tru.ca/arts/english-modern-languages/faculty.html","https://www.tru.ca/arts/geography/faculty.html","https://www.tru.ca/arts/php/faculty.html","https://www.tru.ca/arts/modern-languages/faculty.html","https://www.tru.ca/arts/psychology/faculty.html","http://visualarts.inside.tru.ca/faculty/","https://www.tru.ca/business/facultyresearch/faculty/accountingfinance.html","https://www.tru.ca/business/facultyresearch/faculty/economics.html","https://www.tru.ca/business/facultyresearch/faculty/human-enterprise-and-innovation.html","https://www.tru.ca/business/facultyresearch/faculty/management.html","https://www.tru.ca/business/facultyresearch/faculty/marketing.html","https://www.tru.ca/edsw/education/faculty.html","https://www.tru.ca/edsw/social-work/faculty.html","https://www.tru.ca/edsw/esl/faculty.html","https://www.tru.ca/law/faculty-staff/faculty.html","https://www.tru.ca/law/faculty-staff/sessional-faculty.html"],"selectors":[{"id":"namelink","type":"SelectorLink","selector":"main div.medium-4 a,td:nth-of-type(1) a,td a","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"div.large-10 h1","parentSelectors":["namelink"],"multiple":false,"regex":"","delay":0},{"id":"email","type":"SelectorText","selector":"div.contentarea > p","parentSelectors":["namelink"],"multiple":false,"regex":"","delay":0},{"id":"department","type":"SelectorText","selector":"div.breadcrumbs a:nth-of-type(3)","parentSelectors":["namelink"],"multiple":false,"regex":"","delay":0}]}

I am on Mac High Sierra. Web scraper extension version - 0.3.7 on chrome.

Akshay

KristapsWS · March 14, 2018, 12:16pm

Please post your error messages.

To access error messages follow these steps:

Open chrome://extensions/ or go to manage extensions
Enable “developer mode” at the top right
Open Web Scrapers “background page”
A new popup window should appear.
Go to “Console” tab. You should see Web Scraper log messages and errors there.

akshay.kotha · March 14, 2018, 12:34pm

I can't see any errors after clicking the 'background page'. Should I run it again - the scraper?

Akshay

KristapsWS · March 14, 2018, 12:36pm

Yes, copy the error messages after the scraper gets stuck.

akshay.kotha · March 14, 2018, 1:29pm

Hi KristpsWS,

I think this time it ran fine and hence I cannot see any red color errors in the console.

Thanks for the support.

Will be in touch,
Akshay

martins · April 12, 2018, 10:57am