The program does not scrape all data

Hi!

The program scrapes only the first 10 links. And I need all 205. I apply Link-Multily-Select. http://www.simon.com/mall/the-mills-at-jersey-gardens/stores
Can you advise me, please, how to solve the issue?

Best regards,
Maria

Maria,
could you post here on the forum your sitemap.
Thank's

Hi!
Do you need this information?

{"_id":"simon","startUrl":["http://www.simon.com/mall/the-mills-at-jersey-gardens/stores"],"selectors":[{"id":"list","type":"SelectorLink","selector":"div.col-lg-3 > div.card-secondary a.no-underline.cardImgLink, div.LazyLoad:nth-of-type(n+204) a.no-underline.cardImgLink","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"h1.header-md","parentSelectors":["list"],"multiple":false,"regex":"","delay":0},{"id":"website","type":"SelectorLink","selector":"div.store-social-desktop a.nav-link","parentSelectors":["list"],"multiple":false,"delay":0}]}

Hello,

it seems that this site can't be scrapped with this extension, even if you select the appropriate selector (normaly you have to begin with an "element scroll down" selector)

Hi!

Thank you very much for your help!
I chose "element scroll down" and correctly identified all the links.

Best regards,
Maria

Hello,

could you post here your sitemap.

Thanks

Hi!

Have you requested this information?

{"_id":"simon2","startUrl":["http://www.simon.com/mall/the-mills-at-jersey-gardens/stores"],"selectors":[{"id":"list","type":"SelectorLink","selector":"div.col-lg-3 > div.card-secondary a.no-underline.cardImgLink, div.LazyLoad:nth-of-type(n+11) a.no-underline.cardImgLink","parentSelectors":["_root","element"],"multiple":true,"delay":0},{"id":"NAME","type":"SelectorText","selector":"h1.header-md","parentSelectors":["list"],"multiple":false,"regex":"","delay":0},{"id":"element","type":"SelectorElementScroll","selector":"div.col-lg-3 > div.card-secondary div.card-secondary-text a.no-underline:nth-of-type(1), div.col-lg-3:nth-of-type(2) h2.card-secondary-title, div.col-lg-3:nth-of-type(n+2) > div.card-secondary div.header-xs, div.LazyLoad:nth-of-type(n+11) div.card-secondary-text a.no-underline:nth-of-type(1), div.LazyLoad:nth-of-type(12) h2.card-secondary-title, div.LazyLoad:nth-of-type(12) div.header-xs","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"WEBSITE","type":"SelectorLink","selector":"div.store-social-desktop a.nav-link","parentSelectors":["list"],"multiple":false,"delay":0},{"id":"STORE HOURS","type":"SelectorText","selector":"div.store-hours div.store-hours","parentSelectors":["list"],"multiple":false,"regex":"","delay":0},{"id":"BEST ENTRANCE","type":"SelectorText","selector":"div.store-entrance p:nth-of-type(1)","parentSelectors":["list"],"multiple":false,"regex":"","delay":0},{"id":"LOCATION IN MALL","type":"SelectorText","selector":"p.no-margin","parentSelectors":["list"],"multiple":false,"regex":"","delay":0},{"id":"MORE INFO","type":"SelectorText","selector":"li.nav-item.hidden-sm-down:nth-of-type(2) a.nav-link","parentSelectors":["list"],"multiple":false,"regex":"","delay":0}]}

Set a delay to your element scroll down selector to at least 3000ms and change selector to div.directory-store. Make list selector as a child selector only to element scroll down selector.

Hello Tajana,

yes I wanted that.

your scrape is interesting because it shows how it can be difficult to play with this extension to get all the results. I have made your sitemap easier to understand for those who would be interested to learn more about this extension.

This website is very interesting as an example. It works with an "Element scroll down" when using the elevator to unroll the elements down. The big difficulty is to be able to select the content "Element scroll down": do not take the logos, nor all the blocks, or only the title, or only the phone .... But you must select the "title + hours openings" without taking the phone below .... And in addition you have to select at least 11 lines to be sure of being able to have a complete scraping of 202 records.

Here is my sitemap:

{"_id":"test2","startUrl":["http://www.simon.com/mall/the-mills-at-jersey-gardens/stores"],"selectors":[{"id":"element","type":"SelectorElementScroll","selector":"div.col-lg-3 > div.card-secondary div.card-secondary-text a.no-underline:nth-of-type(1), div.LazyLoad:nth-of-type(n+11) div.card-secondary-text a.no-underline:nth-of-type(1)","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"link","type":"SelectorLink","selector":"parent","parentSelectors":["element"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"h1.header-md","parentSelectors":["link"],"multiple":false,"regex":"","delay":0},{"id":"hours","type":"SelectorText","selector":"div.store-hours div.store-hours","parentSelectors":["link"],"multiple":false,"regex":"","delay":0},{"id":"location","type":"SelectorText","selector":"p.no-margin","parentSelectors":["link"],"multiple":false,"regex":"","delay":0},{"id":"more-info","type":"SelectorText","selector":"li.nav-item.hidden-sm-down:nth-of-type(2) a.nav-link","parentSelectors":["link"],"multiple":false,"regex":"","delay":0}]}

Hi chefas!

Yes, it was difficult for me to highlight the correct links. Your variant is easier than my. Thank you so much!
Can you advise me how to extract an e-mail without the Mailto? example- mailto:info@advancedcryonyc.com

my sitemap:
{"_id":"nybeautysalons","startUrl":["https://www.yellowpages.com/search?search_terms=Beauty%20Salons&geo_location_terms=NoHo%2C%20New%20York%2C%20NY&refinements=headingtext%3AMedical%20Spas"],"selectors":[{"id":"list","type":"SelectorLink","selector":"div.result a.business-name","parentSelectors":["_root","pagination"],"multiple":true,"delay":0},{"id":"email","type":"SelectorLink","selector":"a.email-business","parentSelectors":["list"],"multiple":false,"delay":0},{"id":"address","type":"SelectorText","selector":"p.address span","parentSelectors":["list"],"multiple":false,"regex":"","delay":0},{"id":"phone","type":"SelectorText","selector":"p.phone","parentSelectors":["list"],"multiple":false,"regex":"","delay":0},{"id":"pagination","type":"SelectorLink","selector":"div.pagination a","parentSelectors":["_root"],"multiple":true,"delay":0}]}

Thank you for your advice!

Hi

perhaps it's possible to extract the email without "mailto:" directly inside web scraper extension but I don't know how to do it.

nevertheless, you can do it more easily with excel ( at the level of colum email-link-href)