HI all! I'm having a bit of trouble creating my scraper. It seems if there are multiple link selectors defined for a page, it will only scrape the last one. I created a test website to show what I mean. All it contains is two links to two different pages that I want scraped. Here is the sitemap graph I created:
And here is the sitemap:
{
"_id": "github-webscraper-test",
"startUrl": ["https://woojoo666.github.io/web-scraper-test-site/"],
"selectors": [
{
"id": "info-link",
"type": "SelectorLink",
"parentSelectors": ["_root"],
"selector": "a[href=\"./info.html\"]",
"multiple": false,
"delay": 0
},
{
"id": "items-link",
"type": "SelectorLink",
"parentSelectors": ["_root"],
"selector": "a[href=\"./items.html\"]",
"multiple": false,
"delay": 0
},
{
"id": "info-heading",
"type": "SelectorText",
"parentSelectors": ["info-link"],
"selector": "h1",
"multiple": false,
"regex": "",
"delay": 0
},
{
"id": "info-text",
"type": "SelectorText",
"parentSelectors": ["info-link"],
"selector": "p",
"multiple": false,
"regex": "",
"delay": 0
},
{
"id": "items-li",
"type": "SelectorText",
"parentSelectors": ["items-link"],
"selector": "li",
"multiple": true,
"regex": "",
"delay": 0
}]
}
When I try to run the scraper, I can see from the popup window that it only navigates to the items.html
page, but never to the info.html
page. The scraped data further proves this, as you can see that info-heading
and info-text
are empty.
web-scraper-order | web-scraper-start-url | info-link | info-link-href | items-link | items-link-href | info-heading | info-text | items-li |
---|---|---|---|---|---|---|---|---|
1610329031-14 | https://woojoo666.github.io/web-scraper-test-site/ | site info | https://woojoo666.github.io/web-scraper-test-site/info.html | items | https://woojoo666.github.io/web-scraper-test-site/items.html | item 2 | ||
1610329031-13 | https://woojoo666.github.io/web-scraper-test-site/ | site info | https://woojoo666.github.io/web-scraper-test-site/info.html | items | https://woojoo666.github.io/web-scraper-test-site/items.html | item 1 | ||
1610329031-15 | https://woojoo666.github.io/web-scraper-test-site/ | site info | https://woojoo666.github.io/web-scraper-test-site/info.html | items | https://woojoo666.github.io/web-scraper-test-site/items.html | item 3 |
Perhaps I am misunderstanding how the scraper works? I went through all the video tutorials on the website but didn't seem to find a solution to my problem. Any help would be greatly appreciated!