Pagination+Pet details

rafal · October 28, 2022, 9:23am

Is it possible, and if yes - how? - I would like to go through the whole pagination from URL search and go inside each result to scrap the data.

Url: Schronisko Na Paluchu

Currently I'm using two sitemaps:
1st Im going throught the pagination and to get all Pet details URLs:

Sitemap1 (going from last page to first):
{"_id":"scrap_all_PETs-Links_from_SEARCH","startUrl":["https://napaluchu.waw.pl/zwierzeta/znalazly-dom?pet_page=178&pet_species=1&pet_date_from=2018-01-01&pet_date_to=2018-12-31"],"selectors":[{"id":"pagination","parentSelectors":["_root","pagination"],"paginationType":"auto","selector":"a.btn-info:nth-of-type(1)","type":"SelectorPagination"},{"id":"petsonPage","parentSelectors":["pagination"],"type":"SelectorElement","selector":"div.row div.inner-box li a","multiple":true},{"id":"Link","parentSelectors":["petsonPage"],"type":"SelectorLink","selector":"_parent_","multiple":false}]}
Next I'm exporting all urls to Pet Detailed page to csv and running 2nd sitemap
-Sitemap2:
{"_id":"pet-2018","startUrl":["<<URL#1>>","<<URL#2>>",...,"<<URL#last>>"],"selectors":[{"id":"PET_nazwa","parentSelectors":["_root"],"type":"SelectorText","selector":".pets-container h2","multiple":false,"regex":""},{"id":"PET_details","parentSelectors":["_root"],"type":"SelectorText","selector":".name ul","multiple":false,"regex":""},{"id":"PET_foto","parentSelectors":["_root"],"type":"SelectorElementAttribute","selector":"img.pet-detail-main-image","multiple":false,"extractAttribute":"src"},{"id":"PET-W typie rasy","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('W typie rasy:') strong","multiple":false,"regex":""},{"id":"PET-Wiek","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Wiek') strong","multiple":false,"regex":""},{"id":"PET-Płeć","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Płeć') strong","multiple":false,"regex":""},{"id":"PET-Waga","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Waga') strong","multiple":false,"regex":""},{"id":"PET-Nr","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Nr') strong","multiple":false,"regex":""},{"id":"PET-Status","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Status') strong","multiple":false,"regex":""},{"id":"PET-Przyjęty","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Przyjęty') strong","multiple":false,"regex":""},{"id":"PET-Wydany","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Wydany') strong","multiple":false,"regex":""},{"id":"PET-Znaleziony","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Znaleziony') strong","multiple":false,"regex":""},{"id":"PET-Boks","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Boks') strong","multiple":false,"regex":""},{"id":"PET-Grupa","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Grupa') strong","multiple":false,"regex":""}]}

1. Is it possible to join it somehow to go with one sitemap and automate manual preparation of second sitemap?
2. is it possible to set parameters like "Request interval (ms)" and "Page load delay (ms)" to for e.g. 20000 within sitemap code?

ViestursWS · October 31, 2022, 2:54pm

@rafal Hello, are you looking to use the extracted URLs as unique start URLs for a new sitemap?

If so - multiple start URLs for a sitemap can be added via the UI on Web Scraper Cloud using the 'Bulk Start URL import' feature(handles up to 20'000 start URLs). See the attached screenshot for reference.

rafal · November 10, 2022, 5:13pm

Thanks - I didn't know Web Scraper Cloud. It seems it might be the solution, however the expensive one