Webscrap capability modal boxes

Hi,
I am trying to create a contact list for my outlook client from Membership Directory but I am not sure whether web scrape can do it.

When I click on one of the contacts a modalp windows appera but it seems to be disocnnected from the rest.
I have managed to get data from the contact card, but not after clicking on it.

Thank you if you have any idea

{"_id":"testpage","startUrl":["Membership Directory a","type":"SelectorPagination"},{"id":"clickboxes","parentSelectors":["pagelinks"],"type":"SelectorElementClick","clickElementSelector":"div.ant-card","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"div.ant-card"},{"id":"tsest","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"parent","multiple":false,"regex":""},{"id":"sdgfsd","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"p:first-child","multiple":false,"regex":""},{"id":"dsssdf","parentSelectors":["clickboxes"],"type":"SelectorElement","selector":"div.ant-modal-body","multiple":false}]}

any input on how to do it ? or whether it is even possible to know whether I should keep looking into it? thank you

@mika Hi, the sitemap JSON you sent is not valid. When pasting your sitemap, please, apply the preformatted text option.


Hi @ViestursWS , thanks for the tip, I think the preformatted text option below should provide the right thing:

{"_id":"testpage","startUrl":["https://dir.econference.io/main/maculasociety/d917aae0-97c5-4dd4-b773-801389d59771"],"selectors":[{"id":"pagelinks","parentSelectors":["_root","pagelinks"],"paginationType":"clickOnce","selector":"li.ais-Pagination-item--page:nth-of-type(n+3) a","type":"SelectorPagination"},{"id":"clickboxes","parentSelectors":["pagelinks"],"type":"SelectorElementClick","clickElementSelector":"div.ant-card","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"div.ant-card"},{"id":"tsest","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"_parent_","multiple":false,"regex":""},{"id":"sdgfsd","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"p:first-child","multiple":false,"regex":""},{"id":"asda","parentSelectors":["clickboxes"],"type":"SelectorText","selector":":contains(\"gx\")","multiple":false,"regex":""},{"id":"vdfcx","parentSelectors":["clickboxes"],"type":"SelectorElementAttribute","selector":".div gx-d-flex gx-align-items-center gx-mb-3","multiple":false,"extractAttribute":"href"}]}

Thanks for any feedback

@mika Hello, it appears that you have not set a selector that targets the modal window element.

Here's an example:

{"_id":"testpage","startUrl":["https://dir.econference.io/main/maculasociety/d917aae0-97c5-4dd4-b773-801389d59771"],"selectors":[{"clickElementSelector":"div.ant-card","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1900,"discardInitialElements":"discard-when-click-element-exists","id":"clickboxes","multiple":true,"parentSelectors":["_root"],"selector":"div.ant-modal-content","type":"SelectorElementClick"},{"id":"location","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"svg:has([class=\"member-location-1\"]) + p","type":"SelectorText"},{"id":"phone","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"svg:has([class=\"member-phone-1\"]) + p","type":"SelectorText"},{"id":"name","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"div.ant-modal-header","type":"SelectorText"},{"clickElementSelector":"span.ant-modal-close-x","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1000,"discardInitialElements":"discard-when-click-element-exists","id":"close-click","multiple":true,"parentSelectors":["clickboxes"],"selector":"_parent_","type":"SelectorElementClick"}]}

Thanks @ViestursWS I was thinking that selecting elements from the modal window would be enough.

I did some changes to integrate pagination. Without pagination, i.e just one page, I can get the info however when I integrate pagination it doesn't seem to work. Or is it because I stop the scape after a few papges ? i.e does the data get dumped at the end or in between ? Is there a way to test to see if it's working before running it all ? Thanks

{"_id":"xcombo","startUrl":["https://dir.econference.io/main/maculasociety/d917aae0-97c5-4dd4-b773-801389d59771"],"selectors":[{"id":"pages","parentSelectors":["_root","pages"],"paginationType":"clickOnce","selector":"li.ais-Pagination-item--page:nth-of-type(n+3) a","type":"SelectorPagination"},{"id":"clickboxes","parentSelectors":["pages"],"type":"SelectorElementClick","clickElementSelector":"div.ant-card","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1900,"discardInitialElements":"discard-when-click-element-exists","multiple":true,"selector":"div.ant-modal-content"},{"id":"close-click","parentSelectors":["clickboxes"],"type":"SelectorElementClick","clickElementSelector":"span.ant-modal-close-x","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1000,"discardInitialElements":"discard-when-click-element-exists","multiple":true,"selector":"_parent_"},{"id":"locationx","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"div.gx-d-flex:nth-of-type(1)","multiple":false,"regex":""},{"id":"statusx","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"div.gx-d-flex:nth-of-type(2) p","multiple":false,"regex":""},{"id":"titlex","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"div:nth-of-type(3) p","multiple":false,"regex":""},{"id":"joinedx","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"div:nth-of-type(4) p","multiple":false,"regex":""},{"id":"emailx","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"div:nth-of-type(5)","multiple":false,"regex":""},{"id":"chpaterx","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"p.gx-mb-1","multiple":false,"regex":""},{"id":"biox","parentSelectors":["clickboxes"],"type":"SelectorText","selector":".gx-mb-0 p","multiple":false,"regex":""},{"id":"namex","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"div.ant-modal-title","multiple":false,"regex":""},{"id":"imagex","parentSelectors":["clickboxes"],"type":"SelectorText","selector":"div.ant-image","multiple":false,"regex":""}]}

@mika Hello, to optimize the sitemap you can use lower delay values for the 'Element click' selectors, however knowing that the pagination is also based on the click execution, the results will be returned only after the scraper has clicked through all of the pagination pages and modals.

Learn more: My scraping job is running, although no results are being returned - Web Scraper Knowledge Base

Example:

{"_id":"xcombo","startUrl":["https://dir.econference.io/main/maculasociety/d917aae0-97c5-4dd4-b773-801389d59771"],"selectors":[{"id":"pages","paginationType":"clickOnce","parentSelectors":["_root","pages"],"selector":"li:has(a.ais-Pagination-link--selected) + li a","type":"SelectorPagination"},{"clickElementSelector":"div.ant-card","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":800,"discardInitialElements":"discard-when-click-element-exists","id":"clickboxes","multiple":true,"parentSelectors":["pages"],"selector":"div.ant-modal-content","type":"SelectorElementClick"},{"id":"locationx","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"div.gx-d-flex:nth-of-type(1)","type":"SelectorText"},{"id":"statusx","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"div.gx-d-flex:nth-of-type(2) p","type":"SelectorText"},{"id":"titlex","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"div:nth-of-type(3) p","type":"SelectorText"},{"id":"phone","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":".gx-align-items-center:has(.member-phone-1)","type":"SelectorText"},{"id":"emailx","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"a[href*=\"mailto\"]","type":"SelectorText"},{"id":"chpaterx","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"p.gx-mb-1","type":"SelectorText"},{"id":"biox","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":".gx-mb-0 p","type":"SelectorText"},{"id":"namex","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"div.ant-modal-title","type":"SelectorText"},{"id":"imagex","multiple":false,"parentSelectors":["clickboxes"],"selector":"img","type":"SelectorImage"},{"extractAttribute":"","id":"all-data","parentSelectors":["clickboxes"],"selector":"div.gx-d-flex","type":"SelectorGroup"},{"clickElementSelector":"span.ant-modal-close-x","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":800,"discardInitialElements":"discard-when-click-element-exists","id":"close-click","multiple":true,"parentSelectors":["clickboxes"],"selector":"_parent_","type":"SelectorElementClick"}]}

Hi @ViestursWS , thanks, as feared after letting it run for a while it didn't work. It is too bad we can't stop to test to see if it's working.

@mika Hi, you can't pause the selector execution, however you can limit the pagination selector to test whether the sitemap is functional.

Here's a sitemap example(limited to 2 pagination pages):

{"_id":"xcombo","startUrl":["https://dir.econference.io/main/maculasociety/d917aae0-97c5-4dd4-b773-801389d59771"],"selectors":[{"id":"pages","paginationType":"clickOnce","parentSelectors":["_root","pages"],"selector":"li:has(a.ais-Pagination-link--selected) + li a:not(:contains(\"3\"))","type":"SelectorPagination"},{"clickElementSelector":"div.ant-card","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":950,"discardInitialElements":"discard-when-click-element-exists","id":"clickboxes","multiple":true,"parentSelectors":["pages"],"selector":"div.ant-modal-content","type":"SelectorElementClick"},{"id":"locationx","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"div.gx-d-flex:nth-of-type(1)","type":"SelectorText"},{"id":"statusx","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"div.gx-d-flex:nth-of-type(2) p","type":"SelectorText"},{"id":"titlex","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"div:nth-of-type(3) p","type":"SelectorText"},{"id":"phone","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":".gx-align-items-center:has(.member-phone-1)","type":"SelectorText"},{"id":"emailx","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"a[href*=\"mailto\"]","type":"SelectorText"},{"id":"chpaterx","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"p.gx-mb-1","type":"SelectorText"},{"id":"biox","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":".gx-mb-0 p","type":"SelectorText"},{"id":"namex","multiple":false,"parentSelectors":["clickboxes"],"regex":"","selector":"div.ant-modal-title","type":"SelectorText"},{"id":"imagex","multiple":false,"parentSelectors":["clickboxes"],"selector":"img","type":"SelectorImage"},{"extractAttribute":"","id":"all-data","parentSelectors":["clickboxes"],"selector":"div.gx-d-flex","type":"SelectorGroup"},{"clickElementSelector":"span.ant-modal-close-x","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":900,"discardInitialElements":"discard-when-click-element-exists","id":"close-click","multiple":true,"parentSelectors":["clickboxes"],"selector":"_parent_","type":"SelectorElementClick"}]}