Trouble with pagination

Hey there,

For a few days I'm trying to scrape data from the following website: https://regisonline.de/de/unternehmen/suche/?

The data is scraped and saved until the second page, but from the third page on, only the page numbers are clicked through without saving any further information.

Although the site is in German, I hope you can help me further. As I watched a lot of tutorials and read through a lot of forum posts, I am really hoping for you to help me with this this issue.

Thank you very much in advance!

Best,
Elena

Here is the sitemap I am currently working on:

{"_id":"regisonline_v5","startUrl":["https://regisonline.de/de/unternehmen/suche/?"],"selectors":[{"id":"klick liste","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div#resultview-switches","multiple":false,"delay":"2000","clickElementSelector":"button:nth-of-type(2) span.ui-button-text","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"Kontakt elemente","type":"SelectorElement","parentSelectors":["pagination click"],"selector":" parent ","multiple":true,"delay":0},{"id":"unternehmen","type":"SelectorText","parentSelectors":["Kontakt elemente"],"selector":"h3","multiple":false,"regex":"","delay":0},{"id":"beschäftigte","type":"SelectorText","parentSelectors":["Kontakt elemente"],"selector":"span.i-data-bgkl","multiple":false,"regex":"","delay":0},{"id":"adresse","type":"SelectorText","parentSelectors":["Kontakt elemente"],"selector":"span.i-data-adresse","multiple":false,"regex":"","delay":0},{"id":"alle infos link","type":"SelectorLink","parentSelectors":["Kontakt elemente"],"selector":"a","multiple":false,"delay":0},{"id":"mail","type":"SelectorText","parentSelectors":["alle infos link"],"selector":"a.i-link-mailto","multiple":false,"regex":"","delay":0},{"id":"website","type":"SelectorText","parentSelectors":["alle infos link"],"selector":".i-unt-intro-homepage a","multiple":false,"regex":"","delay":0},{"id":"telefon","type":"SelectorText","parentSelectors":["alle infos link"],"selector":"div.i-unt-intro-phone","multiple":false,"regex":"","delay":0},{"id":"branche","type":"SelectorText","parentSelectors":["alle infos link"],"selector":".i-content > ul > li","multiple":false,"regex":"","delay":0},{"id":"ansprechpartner","type":"SelectorText","parentSelectors":["alle infos link"],"selector":".content > span","multiple":false,"regex":"","delay":0},{"id":"pagination click","type":"SelectorElementClick","parentSelectors":["_root","pagination click"],"selector":"div.i-result-item","multiple":true,"delay":"5000","clickElementSelector":"span[title='Nächste Seite']","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueCSSSelector"}]}

Ok - This was a fun one to sort out.

Let me see if I can begin to articulate how i arrived at the solution.

Firstly, this map requires a series of nested element clicks

  1. First Element Click - [MAP TO LIST]
  • Sets the click selector to turn from Map to LIST. The selector element is the entire page, needs to fully cover everything. It's a single pagination tied to unique HTML with a slight delay

(as a child to that)

Second Element Click (Pagination) -

  • CLICK SELECTOR to the "next page" arrow. I used [title='Next page'] font font

  • SELECTOR = each line of the list (.div.i-result-item )

  • Click type= Click more,

  • Click element = Unique HTML

  • [Multiple Checked]

  • Discard = Discard when click element exists (note: I am not entirely sure why, it was a guess that worked)

(as a child to the pagination)

This is where I piled on the text selectors grabbing everything. I also include a link selectror to go into each record. Inside (as a child to) I added all the ftext selectors from inside the record.)

{"_id":"aaatest","startUrl":["https://regisonline.de/de/unternehmen/suche/?page=2"],"selectors":[{"id":"Click list","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.container","multiple":false,"delay":"2000","clickElementSelector":"#resultview-switches > button:nth-child(2)","clickType":"clickOnce","discardInitialElements":"discard-when-click-element-exists","clickElementUniquenessType":"uniqueHTML"},{"id":"Paginate","type":"SelectorElementClick","parentSelectors":["Click list"],"selector":"div.i-result-item","multiple":true,"delay":0,"clickElementSelector":"[title='Next page'] font font","clickType":"clickMore","discardInitialElements":"discard-when-click-element-exists","clickElementUniquenessType":"uniqueHTML"},{"id":"Name","type":"SelectorText","parentSelectors":["Paginate"],"selector":"h3","multiple":false,"regex":"","delay":0},{"id":"Location","type":"SelectorText","parentSelectors":["Paginate"],"selector":".i-data-georef font font","multiple":false,"regex":"","delay":0},{"id":"Employees","type":"SelectorText","parentSelectors":["Paginate"],"selector":".i-data-bgkl font font","multiple":false,"regex":"","delay":0},{"id":"Address","type":"SelectorText","parentSelectors":["Paginate"],"selector":".i-data-adresse font font","multiple":false,"regex":"","delay":0},{"id":"Click into Profile","type":"SelectorLink","parentSelectors":["Paginate"],"selector":"a","multiple":false,"delay":0},{"id":"Formal Name","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":".i-unt-intro-name font font","multiple":false,"regex":"","delay":0},{"id":"Address 1","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":".i-unt-intro-address font:nth-of-type(1) font","multiple":false,"regex":"","delay":0},{"id":"address 2","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":".i-data font:nth-of-type(2) font","multiple":false,"regex":"","delay":0},{"id":"Phone","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":".i-unt-intro-phone","multiple":false,"regex":"","delay":0},{"id":"Fax","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":"div.i-unt-intro-fax","multiple":false,"regex":"","delay":0},{"id":"Email","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":"div.i-unt-intro-email","multiple":false,"regex":"","delay":0},{"id":"Webpage","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":"div.i-unt-intro-homepage","multiple":false,"regex":"","delay":0},{"id":"Branch","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":".i-content > ul > li > font font","multiple":false,"regex":"","delay":0},{"id":"Sub-Branch","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":"li li > font font","multiple":false,"regex":"","delay":0},{"id":"Management","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":"div.i-content-multiline:nth-of-type(1) .content div > font font","multiple":false,"regex":"","delay":0},{"id":"Management -Email","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":".i-ansprechpartner a","multiple":false,"regex":"","delay":0},{"id":"Last Updated","type":"SelectorText","parentSelectors":["Click into Profile"],"selector":".i-meta-last-update div","multiple":false,"regex":"","delay":0}]}

I'm looking at the preview and it looks like it's getting mostly everything. There are a few 'NULL' values that are popping up. I'd need to check to see if the data just isn't available or if the element selector I chose was relative (it changes from page to page).

Hey Bret,

First of all: thank you so much for taking the time and looking into this!!

Unfortunately, it is still only loading data from the first page and not moving on to the following 600 pages. :thinking:

Ok, I think the issue that I click on english and buil the sitemap from there.

try this out. it's paginating on my side. Itt's going to run through all 600 before returning any data.

{"_id":"reg_is_online_mapfix","startUrl":["https://regisonline.de/de/unternehmen/suche/?"],"selectors":[{"id":"Click list","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.container","multiple":false,"delay":"4000","clickElementSelector":"button.ui-state-default:nth-of-type(2)","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueHTML"},{"id":"Paginate","type":"SelectorElementClick","parentSelectors":["Click list"],"selector":"div.i-result-item","multiple":true,"delay":"500","clickElementSelector":"[title=\"Nächste Seite\"]","clickType":"clickMore","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"Name","type":"SelectorText","parentSelectors":["Paginate"],"selector":"h3","multiple":false,"regex":"","delay":0},{"id":"Location","type":"SelectorText","parentSelectors":["Paginate"],"selector":"span.i-data-georef","multiple":false,"regex":"","delay":0},{"id":"Employees","type":"SelectorText","parentSelectors":["Paginate"],"selector":".i-data-bgkl font font","multiple":false,"regex":"","delay":0},{"id":"Address","type":"SelectorElementClick","parentSelectors":["Paginate"],"selector":"div.i-result-detail","multiple":false,"delay":"2","clickElementSelector":"h3","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"address","type":"SelectorText","parentSelectors":["Address"],"selector":"span.i-data-adresse","multiple":false,"regex":"","delay":0},{"id":"beschaeftigte","type":"SelectorText","parentSelectors":["Address"],"selector":".i-beschaeftigte div.col-xs-12+","multiple":false,"regex":"","delay":0},{"id":"Click into profile","type":"SelectorLink","parentSelectors":["Address"],"selector":"a","multiple":false,"delay":0},{"id":"Full_Name","type":"SelectorText","parentSelectors":["Click into profile"],"selector":"div.i-unt-intro-name","multiple":false,"regex":"","delay":0},{"id":"full_address","type":"SelectorText","parentSelectors":["Click into profile"],"selector":".i-unt-intro-address span","multiple":false,"regex":"","delay":0},{"id":"full_phone","type":"SelectorText","parentSelectors":["Click into profile"],"selector":"div.i-unt-intro-phone","multiple":false,"regex":"","delay":0},{"id":"full_fax","type":"SelectorText","parentSelectors":["Click into profile"],"selector":"div.i-unt-intro-fax","multiple":false,"regex":"","delay":0},{"id":"full_email","type":"SelectorText","parentSelectors":["Click into profile"],"selector":"a.i-link-mailto","multiple":false,"regex":"","delay":0},{"id":"full_web","type":"SelectorText","parentSelectors":["Click into profile"],"selector":".i-unt-intro-homepage a","multiple":false,"regex":"","delay":0}]}