Scrolling not working

Describe the problem.

I'm trying to extract data (model and type of machine, green text only) from site with dynamic scrolling, but the webscraper do not scroll the page.

Can someone help me with this?

Url: https://partscatalog.deere.com/jdrc/search/type/parts/term/JD10436

Sitemap:
{"_id":"katalog_john_deere","startUrl":["https://partscatalog.deere.com/jdrc/search/type/parts/term/JD10436"],"selectors":[{"id":"Maszyna","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":".linkList a","multiple":true,"delay":"2000"},{"id":"kod","type":"SelectorText","parentSelectors":["_root"],"selector":".linkList a","multiple":true,"regex":"","delay":0}]}

This one has got me stumped; I can't figure out which selector will make the scroller work. Perhaps a more experienced user can help you.

Anyway, If this is just a one-off or small scrape, perhaps you can use the "manual scroll" method as a temporary workaround. Basically, you start the scrape with a very long delay, and you manually scroll the page yourself, with the mouse or the PageDn key.

Using this method, I was able to get all 161 lines with your code.

If you want to try this method, here is how to get the best results:

  1. Remove the scroller part from your code. You won't need it for now because YOU will be doing the scrolling.
  2. Zoom out your browser view to the smallest possible (press ctrl-minus or use Settings -> Zoom). This will force the server to load more lines into your browser.
  3. Maximise the height of your browser window, Same reason as No. 2,
  4. Set Page load delay to a sufficiently long period, about 15-20 secs (15000-20000)
  5. Click Start Scraping and wait for page to load (but not too long).
  6. Start scrolling the page with the mouse or the PageDn key. You will need to finish all scrolling before the Page load delay period 'cos WS will just scrape whatever has been loaded up till then.

2 Likes

Hello,

Tahnks for quick reply.
I know that manual scrolling works (try it before posting), but i need to scrap data for hundreds of parts, so manual work is not a good solution.

Maybe someone else can help with it ?

Good news, there is an undocumented feature which allows the scroller work. The feature is described here, with scroller code for the John Deere site: Scroller does not work on certain websites

Great news, it's works :slight_smile:
One more thing, i just want to scrap green text, but because of markup this can be not possible ?
Is it possible to take text from but without ?

Ya sure, just use this Regex:

.+\n

It means, match all text before the first linefeed, i.e. only match the first line.

thanks!
this is what I want :slight_smile:

Seems to be some renewed interested in this, so here's the updated john deere sitemap for 2021. Makes use of the under-documented scrollElementSelector parameter, plus I am using a couple of regex to separate Line 1 and Line 2 in the rows.

{"_id":"john-deere-scroll-2021","startUrl":["https://partscatalog.deere.com/jdrc/search/type/parts/term/JD10436"],"selectors":[{"id":"scroll","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":".linkList a","multiple":true,"delay":2000,"scrollElementSelector":"div.content"},{"id":"Link","type":"SelectorLink","parentSelectors":["scroll"],"selector":"_parent_","multiple":false},{"id":"Line 1","type":"SelectorText","parentSelectors":["scroll"],"selector":"_parent_","multiple":false,"regex":".+\\n"},{"id":"Line 2","type":"SelectorText","parentSelectors":["scroll"],"selector":"_parent_","multiple":false,"regex":"\\n.+"}]}