Is it possible, and if yes - how? - I would like to go through the whole pagination from URL search and go inside each result to scrap the data.
Currently I'm using two sitemaps:
1st Im going throught the pagination and to get all Pet details URLs:
- Sitemap1 (going from last page to first):
{"_id":"scrap_all_PETs-Links_from_SEARCH","startUrl":["https://napaluchu.waw.pl/zwierzeta/znalazly-dom?pet_page=178&pet_species=1&pet_date_from=2018-01-01&pet_date_to=2018-12-31"],"selectors":[{"id":"pagination","parentSelectors":["_root","pagination"],"paginationType":"auto","selector":"a.btn-info:nth-of-type(1)","type":"SelectorPagination"},{"id":"petsonPage","parentSelectors":["pagination"],"type":"SelectorElement","selector":"div.row div.inner-box li a","multiple":true},{"id":"Link","parentSelectors":["petsonPage"],"type":"SelectorLink","selector":"_parent_","multiple":false}]}
- Next I'm exporting all urls to Pet Detailed page to csv and running 2nd sitemap
-Sitemap2:
{"_id":"pet-2018","startUrl":["<<URL#1>>","<<URL#2>>",...,"<<URL#last>>"],"selectors":[{"id":"PET_nazwa","parentSelectors":["_root"],"type":"SelectorText","selector":".pets-container h2","multiple":false,"regex":""},{"id":"PET_details","parentSelectors":["_root"],"type":"SelectorText","selector":".name ul","multiple":false,"regex":""},{"id":"PET_foto","parentSelectors":["_root"],"type":"SelectorElementAttribute","selector":"img.pet-detail-main-image","multiple":false,"extractAttribute":"src"},{"id":"PET-W typie rasy","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('W typie rasy:') strong","multiple":false,"regex":""},{"id":"PET-Wiek","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Wiek') strong","multiple":false,"regex":""},{"id":"PET-Płeć","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Płeć') strong","multiple":false,"regex":""},{"id":"PET-Waga","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Waga') strong","multiple":false,"regex":""},{"id":"PET-Nr","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Nr') strong","multiple":false,"regex":""},{"id":"PET-Status","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Status') strong","multiple":false,"regex":""},{"id":"PET-Przyjęty","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Przyjęty') strong","multiple":false,"regex":""},{"id":"PET-Wydany","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Wydany') strong","multiple":false,"regex":""},{"id":"PET-Znaleziony","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Znaleziony') strong","multiple":false,"regex":""},{"id":"PET-Boks","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Boks') strong","multiple":false,"regex":""},{"id":"PET-Grupa","parentSelectors":["_root"],"type":"SelectorText","selector":".petdetails li:contains('Grupa') strong","multiple":false,"regex":""}]}
1. Is it possible to join it somehow to go with one sitemap and automate manual preparation of second sitemap?
2. is it possible to set parameters like "Request interval (ms)" and "Page load delay (ms)" to for e.g. 20000 within sitemap code?