All data shows in preview, but scraping is incomplete

I've tried adjusting the page load delay and request interval to as long as 10 seconds for these, as well as adding a scroll-down element but for some reason scraping only grabs some of the data for this sitemap.

{"_id":"berkshire-hathaway-ctcities","startUrl":["https://www.bhhsneproperties.com/find-real-estate-agents/ct/sort-na/[1-137]"],"selectors":[{"id":"name","type":"SelectorText","parentSelectors":["_root"],"selector":"[data-current-rt='AGENT'] .mdl-card__supporting-text a[itemprop='url']","multiple":true,"regex":"","delay":0},{"id":"email","type":"SelectorText","parentSelectors":["_root"],"selector":"[data-current-rt='AGENT'] .email-correct b","multiple":true,"regex":"","delay":0},{"id":"office","type":"SelectorText","parentSelectors":["_root"],"selector":"[data-current-rt='AGENT'] [data-material-icon='phone'][tabindex] b","multiple":true,"regex":"","delay":0},{"id":"mobile","type":"SelectorText","parentSelectors":["_root"],"selector":"[data-current-rt='AGENT'] .mdl-cell--hide-tablet [data-material-icon='phone_iphone'] b","multiple":true,"regex":"","delay":0},{"id":"location","type":"SelectorText","parentSelectors":["_root"],"selector":"div:nth-of-type(n+5) li[data-material-icon] b:nth-of-type(1)","multiple":true,"regex":"","delay":0}]}

Any ideas?

Hi. Performed a little edit. Hope it helps.

{"_id":"berkshire-hathaway-ctcities","startUrl":["https://www.bhhsneproperties.com/find-real-estate-agents/ct/sort-na/[1-137]"],"selectors":[{"id":"name","type":"SelectorText","parentSelectors":["wrapper"],"selector":"a[itemprop='url']","multiple":false,"regex":"","delay":0},{"id":"email","type":"SelectorText","parentSelectors":["wrapper"],"selector":".email-correct b","multiple":false,"regex":"","delay":0},{"id":"office","type":"SelectorText","parentSelectors":["wrapper"],"selector":"a[data-material-icon='phone'] b","multiple":false,"regex":"","delay":0},{"id":"mobile","type":"SelectorText","parentSelectors":["wrapper"],"selector":"a[data-material-icon='phone_iphone'] b","multiple":false,"regex":"","delay":0},{"id":"location","type":"SelectorText","parentSelectors":["wrapper"],"selector":"[data-material-icon=\"domain\"] b","multiple":false,"regex":"","delay":0},{"id":"wrapper","type":"SelectorElement","parentSelectors":["_root","pagination"],"selector":".agent-result-section > div > div","multiple":true,"delay":0},{"id":"pagination","type":"SelectorLink","parentSelectors":["_root","pagination"],"selector":"div.pagination a:contains(\">>\")","multiple":true,"delay":0}]}

That worked great @ViestursWS, THANK YOU! :clap:

1 Like

Welcome! All the best! :smiley:

Hi ViestursWS,

I’m experiencing a similar problem as described in the forum, where I’m still getting incomplete data despite adjusting my scraper configuration.

I’ve modified the configuration to scrape individual product pages directly via their URLs, but the issue persists. Some of the data is missing, and I can’t seem to figure out why.

Here’s my updated configuration: {"selectors":[{"id":"Reference |","parentSelectors":["_root"],"type":"SelectorText","selector":"span[itemprop='name']","multiple":false,"regex":""},{"id":"| Name","parentSelectors":["_root"],"type":"SelectorText","selector":"span[itemprop='name']","multiple":false,"regex":""},{"id":"Trace Availability","parentSelectors":["_root"],"type":"SelectorText","selector":".container-warehouses-and-delegation li:nth-of-type(1)","multiple":false,"regex":""},{"id":"Store Availability","parentSelectors":["_root"],"type":"SelectorText","selector":"li.item-warehouse-delegation.available","multiple":false,"regex":""},{"id":"Discounted Price","parentSelectors":["_root"],"type":"SelectorText","selector":"span.price-wrapper","multiple":false,"regex":""},{"id":"Full Price","parentSelectors":["_root"],"type":"SelectorText","selector":"[itemprop='offers'] span.old-price","multiple":false,"regex":""},{"id":"Discount Percentage","parentSelectors":["_root"],"type":"SelectorText","selector":"[itemprop='offers'] span.product__discount-items","multiple":false,"regex":""},{"id":"product-page","parentSelectors":["_root"],"type":"SelectorElement","selector":".product-detail-container","multiple":false,"delay":0}],"websiteStateSetup":{"enabled":true,"performWhenNotFoundSelector":"html:has(a.btn-account)","actions":[{"type":"openUrl","url":"https://pro.site.com/it_it/customer/account/login/"},{"selector":".login-container a.create.primary","type":"click"},{"selector":"input#username","type":"textInput","value":"myemail@address.com"},{"selector":"input#password","type":"passwordInput","value":"msomepassword111"},{"selector":"button:contains("Continue")","type":"click"}]}}

Despite these updates, I still can’t seem to scrape the full product data. Could you take a look and let me know if you have any suggestions for fixing this issue? Your insights would be greatly appreciated!

Thanks in advance,

Mathew.

Hi,

Can you please post the complete sitemap?

Hi, Thank you for your intervention. Do i post it inclusive of original urls and passwords or i can also dummies? the sitemap contains about 500 urls.

Thanks in advance.

All URLs in one sitemap should have the same layout? So one URL is fine. A dummy account would be good if the scraping requires to be logged in. Note that everything you post here is public.

Thank you very much @JanAp for clarifying.

Yes, all urls should have the same layout.

Here is the complete sitemap: {"_id":"Fluidra-substitue","startUrl":["https://pro.fluidra.com/it_it/catalog/product/view/id/63954532/s/para-empotrar-2-unidades-long-1220-mm/category/50011/"],"selectors":[{"id":"Reference |","parentSelectors":["_root"],"type":"SelectorText","selector":"span[itemprop='name']","multiple":false,"regex":""},{"id":"| Name","parentSelectors":["_root"],"type":"SelectorText","selector":"span[itemprop='name']","multiple":false,"regex":""},{"id":"Trace Availability","parentSelectors":["_root"],"type":"SelectorText","selector":".container-warehouses-and-delegation li:nth-of-type(1)","multiple":false,"regex":""},{"id":"Store Availability","parentSelectors":["_root"],"type":"SelectorText","selector":"li.item-warehouse-delegation.available","multiple":false,"regex":""},{"id":"Discounted Price","parentSelectors":["_root"],"type":"SelectorText","selector":"span.price-wrapper","multiple":false,"regex":""},{"id":"Full Price","parentSelectors":["_root"],"type":"SelectorText","selector":"[itemprop='offers'] span.old-price","multiple":false,"regex":""},{"id":"Discount Percentage","parentSelectors":["_root"],"type":"SelectorText","selector":"[itemprop='offers'] span.product__discount-items","multiple":false,"regex":""},{"id":"product-page","parentSelectors":["_root"],"type":"SelectorElement","selector":".product-detail-container","multiple":false,"scroll":false,"elementLimit":0},{"id":"not found","parentSelectors":["_root"],"type":"SelectorHTML","selector":"div.message","multiple":false,"regex":""}],"websiteStateSetup":{"enabled":true,"performWhenNotFoundSelector":"html:has(a.btn-account)","actions":[{"type":"openUrl","url":"https://pro.fluidra.com/it_it/customer/account/login/"},{"selector":".login-container a.create.primary","type":"click"},{"selector":"input#username","type":"textInput","value”:”my”email@mydomain.com},{“selector":"input#password","type":"passwordInput","value”:”some”password},{“selector":"button:contains("Continue")","type":"click"}]}}

Best regards.