I'm trying to work out how to scrape items from a page of cafe listings. There are h2 headings for the cafe name, and separate text for the address/postcode (zip code). These are not contained in separate elements (like the convenient laptop listings in the webscraper tutorial videos). They are just in a flow of html text in a div content container.
I can scrape all the cafe names, but when I try to include the postcode the scraper either just scrapes the first postcode and repeats it for every cafe – or the cafe name scrapes first in the csv, and then all the postcodes below them. I can't get the two pieces of information in the same row of the csv for each cafe. Help! What am I not understanding? I am using the text scraper setting, with "multiple" checked.
Url: 20 Best Cafes In Bristol | Amber
Sitemap 1 – same postcode for all cafes:
{"_id":"bristol-cafes","startUrl":["https://amberstudent.com/blog/post/20-best-cafes-in-bristol"],"selectors":[{"id":"cafe-wrapper","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div.div-block-17","type":"SelectorHTML"},{"id":"cafe-name","multiple":true,"parentSelectors":["cafe-wrapper"],"regex":"","selector":"h2 strong","type":"SelectorText"},{"id":"cafe-postcode","multiple":true,"parentSelectors":["cafe-wrapper"],"regex":"([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})","selector":"p:nth-of-type(n+2) a:nth-of-type(1)","type":"SelectorText"}]}
Sitemap 2 – cafe names and poscodes do not line up on the same row:
{"_id":"bristol-cafes","startUrl":["https://amberstudent.com/blog/post/20-best-cafes-in-bristol"],"selectors":[{"id":"cafe-name","multiple":true,"parentSelectors":["_root"],"regex":"","selector":"h2 strong","type":"SelectorText"},{"id":"cafe-postcode","multiple":true,"parentSelectors":["_root"],"regex":"([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})","selector":"p:nth-of-type(n+2) a:nth-of-type(1)","type":"SelectorText"}]}
