Selector tags incremented

Hi all,

Need help scraping a webpage with iterated selectors like the one at the link below. After setting up an element with the wrappers (with Multiple selected), I added the attributes I wanted to scrape as children under the wrapper. However, the scrapper only yields one result, and everything else comes up as null. This is because the attribute tags are incremented, i.e. each tag within a wrapper has a tag with _0, _1, _2, ...., _n.

How do I go about scrapping something like that? Appreciate any help!

Url: https://mhcpproviderdirectory.dhs.state.mn.us/searchresults?cat=56&sub=85&sta=MN

Sitemap:
{"_id":"pca_list","startUrl":["https://mhcpproviderdirectory.dhs.state.mn.us/searchresults?cat=56&sub=85&sta=MN"],"selectors":[{"delay":0,"id":"provider_wrapper","multiple":true,"parentSelectors":["_root"],"selector":"td div","type":"SelectorElement"},{"delay":0,"id":"provider_name","multiple":false,"parentSelectors":["provider_wrapper"],"regex":"","selector":"span#MainContent_dlProviderList_lblProviderName_0","type":"SelectorText"},{"delay":0,"id":"provider_address","multiple":false,"parentSelectors":["provider_wrapper"],"regex":"","selector":"span#MainContent_dlProviderList_lblAddress_0","type":"SelectorText"},{"delay":0,"id":"provider_city_st_zip","multiple":false,"parentSelectors":["provider_wrapper"],"regex":"","selector":"span#MainContent_dlProviderList_lblCityStateZip_0","type":"SelectorText"},{"delay":0,"id":"provider_phone","multiple":false,"parentSelectors":["provider_wrapper"],"regex":"","selector":"span#MainContent_dlProviderList_lblProviderPhone_0","type":"SelectorText"},{"delay":0,"id":"provider_specialty","multiple":false,"parentSelectors":["provider_wrapper"],"regex":"","selector":"span#MainContent_dlProviderList_lblSpecialityDescription_0","type":"SelectorText"}]}

You're on the right track, and the sitemap structure is OK. For provider_wrapper you can use something like:
tr td div[id^='MainContent_dlProviderList']

which will handle the incremental digits. Actually it'll just ignore them cos ^ here means "begins with". Similarly, the other selectors need to be fixed too to handle incremental digits. E.g. provider_name should use something like:
span[id^='MainContent_dlProviderList_lblProviderName']

Ref: CSS Selectors Reference

1 Like

Worked like a charm! Thanks, @leemeng!