Scroll Down and Click on Load More

Hi!

I'm trying to scrape a website and I need to get all the information of the whole website, the problem is that it stops to scroll down and then appears a "load more" button, how can I do this two elements work and get the data I need

Url: Webmotors | Compre, venda e financie carros usados, novos e motos

Sitemap: {"_id":"webmotors","startUrl":["Webmotors | Compre, venda e financie carros usados, novos e motos span.sc-dNLxif","multiple":false,"regex":""},{"id":"kilometragem","parentSelectors":["box"],"type":"SelectorText","selector":"div.sc-ksYbfQ:nth-of-type(2) span","multiple":false,"regex":""},{"id":"cidade","parentSelectors":["box"],"type":"SelectorText","selector":".sc-ksYbfQ a","multiple":false,"regex":""},{"id":"preco","parentSelectors":["box"],"type":"SelectorText","selector":"strong","multiple":false,"regex":""},{"id":"click","parentSelectors":["_root"],"type":"SelectorElementClick","clickElementSelector":"button.Button--more-items","clickElementUniquenessType":"uniqueText","clickType":"clickMore","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"div.ContainerButtons"}]}

Hello! I am facing the same situation, also in this website. Do you could find someway to combine "scroll down" and Load more?

I tried to combine "scroll down" and "element click" and it partially worked out. It was able to load all the list result in the browser, but it failed to open each link and scrape the data I needed. It only scraped 119 rows, I think it the maximum number of rows before asking to click "load more" bottom.

Would someone be able to help me ?

{"_id":"Webmotors3","startUrl":["Webmotors | Compre, venda e financie carros usados, novos e motos, button.Button--more-items","multiple":true,"delay":2000,"elementLimit":500},{"id":"anuncios","parentSelectors":["_root","scrow"],"type":"SelectorLink","selector":"a.CardVehicle__linkPhoto:nth-of-type(1)","multiple":true,"linkType":"linkFromHref"},{"id":"nome","parentSelectors":["anuncios"],"type":"SelectorText","selector":".VehicleDetails__header__title strong","multiple":false,"regex":""},{"id":"modelo","parentSelectors":["anuncios"],"type":"SelectorText","selector":"span.VehicleDetails__header__description","multiple":false,"regex":""},{"id":"ano","parentSelectors":["anuncios"],"type":"SelectorText","selector":"strong#VehiclePrincipalInformationYear","multiple":false,"regex":""},{"id":"quilometragem","parentSelectors":["anuncios"],"type":"SelectorText","selector":"strong#VehiclePrincipalInformatiOnodometer","multiple":false,"regex":""},{"id":"preco","parentSelectors":["anuncios"],"type":"SelectorText","selector":"strong.Forms__vehicleSendProposal__container__price","multiple":false,"regex":""},{"id":"fipe","parentSelectors":["anuncios"],"type":"SelectorText","selector":".VehicleDetailsFipe__price--fipe strong","multiple":false,"regex":""},{"id":"mediaweb","parentSelectors":["anuncios"],"type":"SelectorText","selector":".VehicleDetailsFipe__price--webmotors strong","multiple":false,"regex":""},{"id":"carregarmais","parentSelectors":["_root"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":"button.Button--more-items","clickElementUniquenessType":"uniqueText","clickType":"clickMore","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"div.dPEfoP"}]}

The scroller does not seem to work properly for this site, but there is "manual" workaround method as used in the example sitemap below. It will help prevent excessive scrolling. This example will scroll 4 times and click Load More 4 times, which should get about 500 results (depends on your browser zoom setting).
You can remove or add as many scroller/load more pairs as you need.

Notes

  1. It is better to filter search results to reduce the number, instead of trying to get all results. In this example, I am only scraping "Toyota Hilux". You can still have multiple URLs in one sitemap.
  2. When adding scroller/load more pairs, be sure to increment the scroller' selector according. Add 120 for each new scroller (each Load More will load 120 new results). Also make sure each new addition has a unique name, like "Scroller5".
  3. You can zoom out the page to force the website to load more results at a time.
  4. It is probably easier to edit the Json directly if you want to add scroller/load more pairs. Use a good editor which recognizes Json such as Sublime, Notepad++ or VS Code. Also good to validate your sitemap with a checker such as JSONLint.
  5. As you add more scrollers, you will probably need to increase the delay for the scrollers. In this example, Scroller4 has a longer delay compared to the 3 earlier scrollers.

Sitemap:

{
	"_id": "webmotors-toyota-hilux",
	"startUrl": ["https://www.webmotors.com.br/carros/estoque/toyota/hilux?estadocidade=estoque&marca1=TOYOTA&modelo1=HILUX&idcmpint=t1:c17:m07:webmotors:modelo::toyota%20hilux&autocomplete=toyota%20hilux&autocompleteTerm=TOYOTA%20HILUX"],
	"selectors": [{
		"delay": 3500,
		"elementLimit": 300,
		"id": "Scroller1",
		"multiple": true,
		"parentSelectors": ["_root"],
		"selector": "div.ContainerCardVehicle > div > div[class^='sc'] > div[data-qa^='vehicle_card']:odd",
		"type": "SelectorElementScroll"
	}, {
		"clickActionType": "real",
		"clickElementSelector": "div.ContainerButtons button.Button--more-items",
		"clickElementUniquenessType": "uniqueText",
		"clickType": "clickOnce",
		"delay": 4500,
		"discardInitialElements": "do-not-discard",
		"id": "Click Load More1",
		"multiple": false,
		"parentSelectors": ["_root"],
		"selector": "div > div.ContainerCardVehicle",
		"type": "SelectorElementClick"
	}, 
	
	{
		"delay": 3500,
		"elementLimit": 300,
		"id": "Scroller2",
		"multiple": true,
		"parentSelectors": ["_root"],
		"selector": "div.ContainerCardVehicle > div > div[class^='sc'] > div[data-qa^='vehicle_card']:odd:nth-of-type(n+122)",
		"type": "SelectorElementScroll"
	}, {
		"clickActionType": "real",
		"clickElementSelector": "div.ContainerButtons button.Button--more-items",
		"clickElementUniquenessType": "uniqueText",
		"clickType": "clickOnce",
		"delay": 4500,
		"discardInitialElements": "do-not-discard",
		"id": "Click Load More2",
		"multiple": false,
		"parentSelectors": ["_root"],
		"selector": "div > div.ContainerCardVehicle",
		"type": "SelectorElementClick"
	},

	{
		"delay": 3500,
		"elementLimit": 300,
		"id": "Scroller3",
		"multiple": true,
		"parentSelectors": ["_root"],
		"selector": "div.ContainerCardVehicle > div > div[class^='sc'] > div[data-qa^='vehicle_card']:odd:nth-of-type(n+242)",
		"type": "SelectorElementScroll"
	}, {
		"clickActionType": "real",
		"clickElementSelector": "div.ContainerButtons button.Button--more-items",
		"clickElementUniquenessType": "uniqueText",
		"clickType": "clickOnce",
		"delay": 4500,
		"discardInitialElements": "do-not-discard",
		"id": "Click Load More3",
		"multiple": false,
		"parentSelectors": ["_root"],
		"selector": "div > div.ContainerCardVehicle",
		"type": "SelectorElementClick"
	},

	{
		"delay": 3700,
		"elementLimit": 300,
		"id": "Scroller4",
		"multiple": true,
		"parentSelectors": ["_root"],
		"selector": "div.ContainerCardVehicle > div > div[class^='sc'] > div[data-qa^='vehicle_card']:odd:nth-of-type(n+362)",
		"type": "SelectorElementScroll"
	}, {
		"clickActionType": "real",
		"clickElementSelector": "div.ContainerButtons button.Button--more-items",
		"clickElementUniquenessType": "uniqueText",
		"clickType": "clickOnce",
		"delay": 4500,
		"discardInitialElements": "do-not-discard",
		"id": "Click Load More4",
		"multiple": false,
		"parentSelectors": ["_root"],
		"selector": "div > div.ContainerCardVehicle",
		"type": "SelectorElementClick"
	},				  
				  
	{
		"id": "Car",
		"multiple": false,
		"parentSelectors": ["Result elements"],
		"regex": "",
		"selector": "h2",
		"type": "SelectorText"
	}, {
		"id": "Price",
		"multiple": false,
		"parentSelectors": ["Result elements"],
		"regex": "",
		"selector": "div#valorVerParcelas strong",
		"type": "SelectorText"
	}, {
		"id": "mileage",
		"multiple": false,
		"parentSelectors": ["Result elements"],
		"regex": "",
		"selector": "div > div[class^='sc'] > a[class^='sc'] > div[class^='sc'] > div:nth-child(2) > span:contains('km')",
		"type": "SelectorText"
	}, {
		"id": "Result elements",
		"multiple": true,
		"parentSelectors": ["_root"],
		"selector": "main div div[data-qa^='vehicle_card']",
		"type": "SelectorElement"
	}, {
		"id": "Link",
		"linkType": "linkFromHref",
		"multiple": false,
		"parentSelectors": ["Result elements"],
		"selector": "div#valorVerParcelas a",
		"type": "SelectorLink"
	}]
}

@leemeng Thank you! It worked just perfect!

1 Like