Scrape NIH Reporter Tool

Hi. I got a problem when scraping this web here:

https://reporter.nih.gov/search/woeaI6XSFUeNwW-A_NqA4g/project-details/10608989

I want to scrape the name of the PI/Project Leader, Title, and also the Email. However, I could not scrape the email part as I had to click on the button 'View Email' to show the actual email address.

Here is the screenshot:
image

What can I do to scrape the email?

Thank you!

Hi,

You can try this setup:

{"_id":"nih-gov","startUrl":["https://reporter.nih.gov/search/woeaI6XSFUeNwW-A_NqA4g/project-details/10608989"],"selectors":[{"clickActionType":"real","clickElementSelector":"button:contains('View Email')","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","id":"email-click","multiple":false,"parentSelectors":["project-leader"],"selector":"body","type":"SelectorElementClick"},{"id":"e-mail","multiple":false,"parentSelectors":["project-leader"],"regex":"","selector":"a[href*=\"mail\"]","type":"SelectorText"},{"id":"project-leader","multiple":true,"parentSelectors":["_root"],"selector":".data-section:contains('Project Leader')","type":"SelectorElement"}]}

Hope it helps :+1:

Thank you! It works!

After these steps, I get the mail address, but after 5-6 mail addresses, the page shows the captcha when Extension clicks on the View Email button.

What can I do to scrape all the emails?
Thank you!

Hi,

You can try to increase the delay for the click selector.

I have tried, but it doesn't work.

Have you tried solving the captcha once? Maybe the website will whitelist your IP

Yes, I tried, but they come again and again

OK, Thanks for the help. I think this is not possible :innocent:

Hi,
Yes, Please. I need help

Check your mail, and thanks for helping.