Fidelity fund factsheets

MrB · August 24, 2021, 4:48pm

I am a Fidelity customer and should review my portfolio periodically, so I need to extract some key data from their fund factsheets…

On their website there is a paginated table of these factsheets (3,611 as I write) containing links to further detailed data on each fund.

I have managed to paginate through the whole table and extract the higer-level data, and also the url to follow for the more detailed data. (sitemap1).

{"_id":"fidelity-test1","startUrl":["https://www.fidelity.co.uk/clients/investment-finder/#?investmentType=funds&universeId=FOGBR$$ALL_3521&filtersSelectedValue={}&page=[1-74]&perPage=50&sortField=legalName&sortOrder=asc&subUniverseId=MFAC"],"selectors":[{"id":"tableBody","type":"SelectorElement","parentSelectors":["_root"],"selector":".ec-section__content--table-securities tbody","multiple":false,"delay":0},{"id":"tableRow","type":"SelectorElement","parentSelectors":["tableBody"],"selector":"tr","multiple":true,"delay":0},{"id":"keyStatsLink","type":"SelectorLink","parentSelectors":["tableRow"],"selector":"a.ec-table__investment-link","multiple":false,"delay":0},{"id":"assetClass","type":"SelectorText","parentSelectors":["tableRow"],"selector":"[data-title='Asset class'] div","multiple":false,"regex":"","delay":0}]}

However, when I then use the scraped url (linkSelector) to navigate to the full factsheet and extract some detailed data from there, I lose everything but a few results - about 100 or so. (sitemap2).

{"_id":"fidelity-test2","startUrl":["https://www.fidelity.co.uk/clients/investment-finder/#?investmentType=funds&universeId=FOGBR$$ALL_3521&filtersSelectedValue={}&page=[1-7]&perPage=50&sortField=legalName&sortOrder=asc&subUniverseId=MFAC"],"selectors":[{"id":"tableBody","type":"SelectorElement","parentSelectors":["_root"],"selector":".ec-section__content--table-securities tbody","multiple":false,"delay":0},{"id":"tableRow","type":"SelectorElement","parentSelectors":["tableBody"],"selector":"tr","multiple":true,"delay":0},{"id":"keyStatsLink","type":"SelectorLink","parentSelectors":["tableRow"],"selector":"a.ec-table__investment-link","multiple":false,"delay":0},{"id":"name","type":"SelectorText","parentSelectors":["keyStatsLink"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"iaSector","type":"SelectorText","parentSelectors":["keyStatsLink"],"selector":"tr:contains('Investment Association (IA) sector') td:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"ongoingCharge","type":"SelectorText","parentSelectors":["keyStatsLink"],"selector":"tr:contains('Ongoing charge (%)') td:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"txnCost","type":"SelectorText","parentSelectors":["keyStatsLink"],"selector":"tr:contains('Transaction cost (%)') td:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"perfFee","type":"SelectorText","parentSelectors":["keyStatsLink"],"selector":"tr:contains('Performance fee') td:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"performanceLink","type":"SelectorLink","parentSelectors":["keyStatsLink"],"selector":"li:Contains('Performance') a","multiple":false,"delay":0},{"id":"1yr","type":"SelectorText","parentSelectors":["performanceLink"],"selector":"#trailing-returns-table tr:nth-of-type(5) td:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"3yr","type":"SelectorText","parentSelectors":["performanceLink"],"selector":"#trailing-returns-table tr:nth-of-type(6) td:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"5yr","type":"SelectorText","parentSelectors":["performanceLink"],"selector":"#trailing-returns-table tr:nth-of-type(7) td:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"assetClass","type":"SelectorText","parentSelectors":["tableRow"],"selector":"[data-title='Asset class'] div","multiple":false,"regex":"","delay":0},{"id":"","type":"SelectorText","parentSelectors":["_root"],"selector":"","multiple":false,"regex":"","delay":0}]}

I also tried implementing the pagination via linkSelector and elementClick, but with similar results.

Any ideas?

scraper version: 0.5.4

Chrome version Version 92.0.4515.159 (Official Build) (64-bit)

Windows version Windows 10 Home 19043.1165

As an aside, I also thought that as I had already scraped the url to all 3611 factsheets I could use a 2-pass approach and put these urls into a pseudo sitemap.xml hoping to trick the sitemap xml selector into scraping each factshheet individually, but the sitemap xml selector doesn’t appear to support [file:///](file:///\) syntax, and setting up a dummy web-server to serve the dummy sitemap.xml seemed a bit OTT, especially since I'm not even sure this would work…

ViestursWS · August 25, 2021, 2:18pm

Hi @MrB

It seems that the best way to access the desired data is by using an "Element click" - .ec-section__content--table-securities tbody with a "Click" selector - .ec-pagination__item--next a.

Test results from Web Scraper Cloud:

MrB · September 5, 2021, 12:28pm

Many thanks viesturs for your help. That traverses the site nicely, but now I think I have additional issues, either with my internet connection, or the sites stability, as even though the logic of the scrape appears correct, I can never seem to get a full dataset (I'm playing with timeouts now). It doesn't help that fidelity seem to keep subtely changing their sites implementation, requiring slight config changes to scrape the required data!