Hello,
I am trying to do some web scrapping on PubMed. I am trying to go through each page using pagination and then on each page to select the links of each article so that I can go onto each article and extract things like title, author, abstract etc (I want to end up with an Excel having the title, abstracts, author etc of all the articles shown on PubMed when I search for 'epidemiology and "squamous cell carcinoma" and cutaneous' for example) . The pagination seems to work but the scrapping does not extract any of the links on a given page.
Here is the sitemap for reference:
{"_id":"pubmed_epidemiology","startUrl":["[Invalid form] - Search Results - PubMed h1","multiple":false,"regex":""},{"id":"abstract_text","parentSelectors":["article_link"],"type":"SelectorText","selector":".abstract-content p","multiple":false,"regex":""},{"id":"firstautor_text","parentSelectors":["article_link"],"type":"SelectorText","selector":".inline-authors span.authors-list-item:nth-of-type(1)","multiple":false,"regex":""},{"id":"PMID_text","parentSelectors":["article_link"],"type":"SelectorText","selector":"#full-view-identifiers strong","multiple":false,"regex":""},{"id":"journal_text","parentSelectors":["article_link"],"type":"SelectorText","selector":"button#full-view-journal-trigger","multiple":false,"regex":""},{"id":"year_text","parentSelectors":["article_link"],"type":"SelectorText","selector":".full-view span.cit","multiple":false,"regex":""},{"id":"doi_text","parentSelectors":["article_link"],"type":"SelectorText","selector":".full-view span.citation-doi","multiple":false,"regex":""},{"id":"openaccess_text","parentSelectors":["article_link"],"type":"SelectorText","selector":"#full-view-identifiers .pmc a","multiple":false,"regex":""}]}.
I think there is a problem in how I connect article_el to the pagination, but I am not 100% sure. Any suggestions would be appreciated.
Thanks!
