How to extract data from Wiley article profile

I need to scrape all profile Name, Email, orchid and Institution details

Url: https://onlinelibrary.wiley.com/doi/10.1111/acfi.13142

Sitemap:
{"_id":"Wiley_Journal_All_Issue_Article_email_scrape","startUrl":["https://onlinelibrary.wiley.com/loi/1467629x/year/2023"],"selectors":[{"id":"Issues","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":"a.visitable","type":"SelectorLink"},{"id":"Article","linkType":"linkFromHref","multiple":true,"parentSelectors":["Issues"],"selector":"div.issue-items-container:nth-of-type(2) a.issue-item__title","type":"SelectorLink"},{"clickActionType":"real","clickElementSelector":".accordion-tabbed .author-name span","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"discard-when-click-element-exists","id":"Click","multiple":true,"parentSelectors":["Article"],"selector":"body","type":"SelectorElementClick"},{"id":"name","multiple":false,"parentSelectors":["Click"],"regex":"","selector":".author-info .author-name","type":"SelectorText"},{"id":"email","multiple":false,"parentSelectors":["Click"],"regex":"","selector":".author-info .corr-email","type":"SelectorText"},{"id":"orchid","multiple":false,"parentSelectors":["Click"],"regex":"","selector":".author-info .sm-account__link[href*="orcid.org"]","type":"SelectorText"},{"id":"correspodence","multiple":false,"parentSelectors":["Click"],"regex":"","selector":".author-info p:contains("Correspondence")","type":"SelectorText"}]}

@JanAp I need your help

@don2010 Can you check

This sitemap doesn't work properly... I suppose you should devide your task onto 2 parts....
First of all - collect the links on your articles...
Secondly - to gather contact data of each author...

{"_id":"Wiley_Journal_All_Issue_Article_email_scrape","startUrl":["https://onlinelibrary.wiley.com/loi/1467629x/year/2023"],"selectors":[{"id":"Issues","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":"a.visitable","type":"SelectorLink"},{"id":"Article","linkType":"linkFromHref","multiple":true,"parentSelectors":["Issues"],"selector":"div.issue-items-container:nth-of-type(2) a.issue-item__title","type":"SelectorLink"},{"clickActionType":"real","clickElementSelector":"div.desktop-authors a.author-name","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1000,"discardInitialElements":"do-not-discard","id":"Click","multiple":true,"parentSelectors":["Article"],"selector":"body","type":"SelectorElementClick"},{"id":"name","multiple":false,"parentSelectors":["Click"],"regex":"","selector":".author-info .author-name","type":"SelectorText"},{"id":"email","multiple":false,"parentSelectors":["Click"],"regex":"","selector":"a.sm-account__link[href*=\"mailto\"]","type":"SelectorText"},{"id":"orchid","multiple":false,"parentSelectors":["Click"],"regex":"","selector":"a.sm-account__link[href*=\"orcid.org\"]","type":"SelectorText"},{"id":"correspodence","multiple":false,"parentSelectors":["Click"],"regex":"","selector":".author-info p:contains(\"Correspondence\") + p","type":"SelectorText"}]}

It provide first author data only for rest of the author same data provided

I wrote you that sitemap doesn't work properly... You should devide your task onto 2 different parts...

same. can you provide me the sitemap?

Here is mine after break into part 2: {"_id":"waley_pastelink","startUrl":["404 - Pastelink.net link","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":".body-display a","type":"SelectorLink"},{"clickActionType":"real","clickElementSelector":"div.desktop-authors a.author-name","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":1000,"discardInitialElements":"do-not-discard","id":"click","multiple":true,"parentSelectors":["article link"],"selector":"body","type":"SelectorElementClick"},{"id":"name","multiple":false,"parentSelectors":["click"],"regex":"","selector":".author-info .author-name","type":"SelectorText"}]}

only with one article
link: https://onlinelibrary.wiley.com/doi/10.1111/acfi.13142
sitemap:
{"_id":"wiley_single_article","startUrl":["https://onlinelibrary.wiley.com/doi/10.1111/acfi.13142"],"selectors":[{"clickActionType":"real","clickElementSelector":"div.desktop-authors a.author-name","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","id":"click","multiple":true,"parentSelectors":["_root"],"selector":"body","type":"SelectorElementClick"},{"id":"name","multiple":false,"parentSelectors":["click"],"regex":"","selector":".author-info .author-name","type":"SelectorText"},{"id":"mail","multiple":false,"parentSelectors":["click"],"regex":"","selector":".author-info .corr-email","type":"SelectorText"},{"id":"orcid","multiple":false,"parentSelectors":["click"],"regex":"","selector":".author-info .sm-account__link[href*="orcid.org"]","type":"SelectorText"},{"id":"corre","multiple":false,"parentSelectors":["click"],"regex":"","selector":".author-info p:contains("Correspondence")","type":"SelectorText"}]}

the last time I assist you:

{"_id":"wiley_single_article","startUrl":["https://onlinelibrary.wiley.com/doi/10.1111/acfi.13142"],"selectors":[{"id":"element","multiple":true,"parentSelectors":["_root"],"selector":".accordion .comma__list span.accordion-tabbed__tab-mobile","type":"SelectorElement"},{"id":"name","multiple":false,"parentSelectors":["element"],"regex":"","selector":"p.author-name","type":"SelectorText"},{"id":"mail","multiple":false,"parentSelectors":["element"],"regex":"","selector":"a[title=\"Link to email address\"]","type":"SelectorText"},{"id":"orcid","multiple":false,"parentSelectors":["element"],"regex":"","selector":".sm-account__link[href*=\"orcid.org\"]","type":"SelectorText"},{"id":"corre","multiple":false,"parentSelectors":["element"],"regex":"","selector":".author-info p:contains(\"Correspondence\") + p","type":"SelectorText"}]}
1 Like

Thank you so much. But I need your help on others topics

I told you - collect all necessary links as a first step.... The second step I just sent you...

Sorry, I don't understand previously. Again sorry

Hi,

You can try this setup:

{"_id":"Wiley_Journal_All_Issue_Article_email_scrape","startUrl":["https://onlinelibrary.wiley.com/loi/1467629x/year/2023"],"selectors":[{"id":"Issues","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":"a.visitable","type":"SelectorLink"},{"id":"Article","linkType":"linkFromHref","multiple":true,"parentSelectors":["Issues"],"selector":"div.issue-items-container:nth-of-type(2) a.issue-item__title","type":"SelectorLink"},{"clickActionType":"real","clickElementSelector":".accordion-tabbed .author-name span","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":0,"discardInitialElements":"discard-when-click-element-exists","id":"Click","multiple":true,"parentSelectors":["Article"],"selector":"body","type":"SelectorElementClick"},{"id":"name","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".author-info .author-name","type":"SelectorText"},{"id":"email","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".author-info .corr-email","type":"SelectorText"},{"id":"orchid","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".author-info .sm-account__link[href*='orcid.org']","type":"SelectorText"},{"id":"correspodence","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".author-info p:contains('Correspondence')","type":"SelectorText"},{"id":"wrapper","multiple":true,"parentSelectors":["Article"],"selector":".accordion-tabbed > span","type":"SelectorElement"}]}
1 Like