The scroller does not seem to work properly for this site, but there is "manual" workaround method as used in the example sitemap below. It will help prevent excessive scrolling. This example will scroll 4 times and click Load More 4 times, which should get about 500 results (depends on your browser zoom setting).
You can remove or add as many scroller/load more pairs as you need.
Notes
- It is better to filter search results to reduce the number, instead of trying to get all results. In this example, I am only scraping "Toyota Hilux". You can still have multiple URLs in one sitemap.
- When adding scroller/load more pairs, be sure to increment the scroller' selector according. Add 120 for each new scroller (each Load More will load 120 new results). Also make sure each new addition has a unique name, like "Scroller5".
- You can zoom out the page to force the website to load more results at a time.
- It is probably easier to edit the Json directly if you want to add scroller/load more pairs. Use a good editor which recognizes Json such as Sublime, Notepad++ or VS Code. Also good to validate your sitemap with a checker such as JSONLint.
- As you add more scrollers, you will probably need to increase the delay for the scrollers. In this example, Scroller4 has a longer delay compared to the 3 earlier scrollers.
Sitemap:
{
"_id": "webmotors-toyota-hilux",
"startUrl": ["https://www.webmotors.com.br/carros/estoque/toyota/hilux?estadocidade=estoque&marca1=TOYOTA&modelo1=HILUX&idcmpint=t1:c17:m07:webmotors:modelo::toyota%20hilux&autocomplete=toyota%20hilux&autocompleteTerm=TOYOTA%20HILUX"],
"selectors": [{
"delay": 3500,
"elementLimit": 300,
"id": "Scroller1",
"multiple": true,
"parentSelectors": ["_root"],
"selector": "div.ContainerCardVehicle > div > div[class^='sc'] > div[data-qa^='vehicle_card']:odd",
"type": "SelectorElementScroll"
}, {
"clickActionType": "real",
"clickElementSelector": "div.ContainerButtons button.Button--more-items",
"clickElementUniquenessType": "uniqueText",
"clickType": "clickOnce",
"delay": 4500,
"discardInitialElements": "do-not-discard",
"id": "Click Load More1",
"multiple": false,
"parentSelectors": ["_root"],
"selector": "div > div.ContainerCardVehicle",
"type": "SelectorElementClick"
},
{
"delay": 3500,
"elementLimit": 300,
"id": "Scroller2",
"multiple": true,
"parentSelectors": ["_root"],
"selector": "div.ContainerCardVehicle > div > div[class^='sc'] > div[data-qa^='vehicle_card']:odd:nth-of-type(n+122)",
"type": "SelectorElementScroll"
}, {
"clickActionType": "real",
"clickElementSelector": "div.ContainerButtons button.Button--more-items",
"clickElementUniquenessType": "uniqueText",
"clickType": "clickOnce",
"delay": 4500,
"discardInitialElements": "do-not-discard",
"id": "Click Load More2",
"multiple": false,
"parentSelectors": ["_root"],
"selector": "div > div.ContainerCardVehicle",
"type": "SelectorElementClick"
},
{
"delay": 3500,
"elementLimit": 300,
"id": "Scroller3",
"multiple": true,
"parentSelectors": ["_root"],
"selector": "div.ContainerCardVehicle > div > div[class^='sc'] > div[data-qa^='vehicle_card']:odd:nth-of-type(n+242)",
"type": "SelectorElementScroll"
}, {
"clickActionType": "real",
"clickElementSelector": "div.ContainerButtons button.Button--more-items",
"clickElementUniquenessType": "uniqueText",
"clickType": "clickOnce",
"delay": 4500,
"discardInitialElements": "do-not-discard",
"id": "Click Load More3",
"multiple": false,
"parentSelectors": ["_root"],
"selector": "div > div.ContainerCardVehicle",
"type": "SelectorElementClick"
},
{
"delay": 3700,
"elementLimit": 300,
"id": "Scroller4",
"multiple": true,
"parentSelectors": ["_root"],
"selector": "div.ContainerCardVehicle > div > div[class^='sc'] > div[data-qa^='vehicle_card']:odd:nth-of-type(n+362)",
"type": "SelectorElementScroll"
}, {
"clickActionType": "real",
"clickElementSelector": "div.ContainerButtons button.Button--more-items",
"clickElementUniquenessType": "uniqueText",
"clickType": "clickOnce",
"delay": 4500,
"discardInitialElements": "do-not-discard",
"id": "Click Load More4",
"multiple": false,
"parentSelectors": ["_root"],
"selector": "div > div.ContainerCardVehicle",
"type": "SelectorElementClick"
},
{
"id": "Car",
"multiple": false,
"parentSelectors": ["Result elements"],
"regex": "",
"selector": "h2",
"type": "SelectorText"
}, {
"id": "Price",
"multiple": false,
"parentSelectors": ["Result elements"],
"regex": "",
"selector": "div#valorVerParcelas strong",
"type": "SelectorText"
}, {
"id": "mileage",
"multiple": false,
"parentSelectors": ["Result elements"],
"regex": "",
"selector": "div > div[class^='sc'] > a[class^='sc'] > div[class^='sc'] > div:nth-child(2) > span:contains('km')",
"type": "SelectorText"
}, {
"id": "Result elements",
"multiple": true,
"parentSelectors": ["_root"],
"selector": "main div div[data-qa^='vehicle_card']",
"type": "SelectorElement"
}, {
"id": "Link",
"linkType": "linkFromHref",
"multiple": false,
"parentSelectors": ["Result elements"],
"selector": "div#valorVerParcelas a",
"type": "SelectorLink"
}]
}