Pagination array

Site I am scraping hase multiple pages. the "next" link is in an array but is element(4) on the first page then (5) on subsequent pages. I cannot get this to go past the second page and I only see results of the first or second page. Source below if some can help

{"_id":"proplist","startUrl":["https://www.proplist.com/"],"selectors":[{"id":"agentlist","type":"SelectorLink","selector":"nav.page-header__nav-main li.nav__item:nth-of-type(3) a.nav__link","parentSelectors":["_root"],"multiple":false,"delay":0},{"id":"wrapper","type":"SelectorElement","selector":"div.property-item","parentSelectors":["agentlist"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"span.name","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"tel","type":"SelectorText","selector":"span.telephone a","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"town","type":"SelectorText","selector":"span.city","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"postcode","type":"SelectorText","selector":"span.postcode","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"page_1","type":"SelectorElementClick","selector":"li.grid__item:nth-of-type(4) a.link span.text","parentSelectors":["_root"],"multiple":false,"delay":"2000","clickElementSelector":"","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"page_more","type":"SelectorElementClick","selector":"li.grid__item:nth-of-type(5) a.link span.text","parentSelectors":["page_1"],"multiple":true,"delay":0,"clickElementSelector":"li.grid__item:nth-of-type(5) a.link span.text","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

Try "li.grid__item:nth-last-of-type(1) a.link span.text"

I believe that should select the last element (next arrow) regardless of if here is 4 or 5 options in the pagination

It worked on the first 3 pages,

Hi bretfeig,

I still cannot get it to do a next page more than once! It also only reports elements from the first page. Where should the element selector by in the graph? i.e. parallel to the pagination. source again below.

{"_id":"proplist","startUrl":["https://www.proplist.com/"],"selectors":[{"id":"agentlist","type":"SelectorLink","selector":"nav.page-header__nav-main li.nav__item:nth-of-type(3) a.nav__link","parentSelectors":["_root"],"multiple":false,"delay":0},{"id":"wrapper","type":"SelectorElement","selector":"div.property-item","parentSelectors":["agentlist"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"span.name","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"tel","type":"SelectorText","selector":"span.telephone a","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"town","type":"SelectorText","selector":"span.city","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"postcode","type":"SelectorText","selector":"span.postcode","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"page_1","type":"SelectorElementClick","selector":"li.grid__item:nth-of-type(4) a.link span.text","parentSelectors":["agentlist"],"multiple":true,"delay":"2000","clickElementSelector":"li.grid__item:nth-of-type(4) a.link span.text","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

An alternative approach:

You can take advantage of the Website's own search pages, which uses a simple HTTP GET URL, eg,

https://www.proplist.com/agent/search/1/list
https://www.proplist.com/agent/search/2/list
...
https://www.proplist.com/agent/search/10/list

(Note it has 118 items, 12 per page, and I manually checked that page 10 is indeed the last page).

You can change, in Metadata, the Start URL to:
https://www.proplist.com/agent/search/[1-10]/list

Restructure your Sitemap to skip other Selectors, retaining only "wrapper" directly under root. This way, Webscraper will directly go through the search pages from 1 to 10.

This does not solve the technical issue you were describing, but is one shortcut given this Website structure.

Suggested sitemap:

{"_id":"a_test_pagination_list","startUrl":["https://www.proplist.com/agent/search/[1-10]/list"],"selectors":[{"id":"wrapper","type":"SelectorElement","selector":"div.property-item","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"span.name","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"tel","type":"SelectorText","selector":"span.telephone a","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"town","type":"SelectorText","selector":"span.city","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"postcode","type":"SelectorText","selector":"span.postcode","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0}]}

Note: Webscraper will move in reverse order, from page 10 to 1.

Hi jasond
Thanks for this as you say not an answer to the question but a solution to the problem
Cheers

This sitemap works:

{"_id":"a_test_prop_list_pagination","startUrl":["https://www.proplist.com/agent/search"],"selectors":[{"id":"wrapper","type":"SelectorElement","selector":"div.property-item","parentSelectors":["_root","pagination"],"multiple":true,"delay":"2000"},{"id":"name","type":"SelectorText","selector":"span.name","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"tel","type":"SelectorText","selector":"span.telephone a","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"town","type":"SelectorText","selector":"span.city","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"postcode","type":"SelectorText","selector":"span.postcode","parentSelectors":["wrapper"],"multiple":false,"regex":"","delay":0},{"id":"pagination","type":"SelectorLink","selector":"li.grid__item:last-child a.link","parentSelectors":["_root","pagination"],"multiple":false,"delay":"2000"}]}

I'm not sure what is the key difference with your sitemap. But these are the differences I can see:

(1) Metadata "Start URL" is directly at https://www.proplist.com/agent/search (hence I skip "agentlist")

(2) I changed "page_1" to "pagination" to reflect it should be a general selector, not just for page 1,

(3) "pagination" parents are BOTH set to "root" and itself ("pagination") to loop endlessly

(4) "wrapper" parents are BOTH set to "root" and "pagination", to repeat every page,

(5) Relay for "wrapper" set to 2000ms,

(6) "pagination" selector uses CSS selector "last-child", ie, li.grid__item:last-child a.link.

Make sure to do (3) and (4) first, before you delete "agentlist". So that "wrapper" and "pagination" (page_1) are moved out of "agentlist" before "agentlist" is deleted.