Am I doing something wrong, or is the site blocking me in some way? I am trying to scrape text from the records of UK legislation on education. I have checked every step with data preview, and everything comes up fine, but then when I start scraping it opens the root page and then immediately finishes scraping, with no data scraped.
When I was making the sitemap I had to enter the CSS codes manually as for some reason the popup for the selector (with "done selecting" on) would not appear. Relevant to the problem? Thanks in advance for any help.
Url: http://www.legislation.gov.uk/uksi/education
Sitemap:
{"_id":"education1","startUrl":["http://www.legislation.gov.uk/uksi/education"],"selectors":[{"id":"Link","type":"SelectorLink","parentSelectors":["_root"],"selector":"#content > table > tbody > tr:nth-child(n+1) > td:nth-child(1) > a","multiple":true,"delay":0},{"id":"link2","type":"SelectorLink","parentSelectors":["Link"],"selector":"#viewLegSnippet > div > ol > li:nth-child(n+1) > li > p > span > a, #viewLegSnippet > div > ol > li:nth-child(n+1) > p > span.LegDS.LegContentsTitle > a","multiple":true,"delay":0},{"id":"AllText","type":"SelectorText","parentSelectors":["link2"],"selector":"#viewLegSnippet","multiple":false,"regex":"","delay":0}]}
EDIT
OK I went for a very simple test and it failed on that too, just tried to get two bits of text from the page
{"_id":"education2","startUrl":["http://www.legislation.gov.uk/"],"selectors":[{"id":"Text","type":"SelectorText","parentSelectors":["_root"],"selector":"p","multiple":true,"regex":"","delay":"2000"}]}
Looking at the log (see below) I find the error "unknown content type loaded". Does anyone know what is going on? It is driving me mad!
{"url":"http://www.legislation.gov.uk/","timestamp":1555504760,"level_name":"INFO","message":"Job execution started"}
background_script.js:465 {"contentType":"application/xhtml+xml;charset=utf-8","timestamp":1555504760,"level_name":"NOTICE","message":"unknown content type loaded"}
background_script.js:465 {"url":"http://www.legislation.gov.uk/","parentSelector":"_root","sitemapName":"education2","driver":"chrometab","error":"PAGE_UNKNOWN_CONTENT_TYPE_ERROR","timestamp":1555504760,"level_name":"NOTICE","message":"Job execution failed"}
background_script.js:465 {"timestamp":1555504760,"level_name":"PROFILE","message":"157 ms job execution"}
background_script.js:465 {"url":"http://www.legislation.gov.uk/","timestamp":1555504760,"level_name":"INFO","message":"Syncing storage because a job failed"}
background_script.js:465 {"timestamp":1555504762,"level_name":"INFO","message":"Scraper execution is finished"}
background_script.js:465