How do I include line breaks or bullets from a list?

I am trying to scrape a bulleted list, however the results don't capture any of the line breaks or bullets. Unfortunately, the list itself doesn't have any periods or semicolons, so I can't use an Excel formula to insert line breaks in place of periods/semicolons/bullets/etc. Thus, I am hoping that there might be a way to capture the line breaks (or bullets) when scraping the data.

https://www.virtuoso.com/hotels/6163838/the-plaza#.XD-Sb1xKhPY

The list of amenities appears in two places, but both result in the same issue.

Sitemap:

{"_id":"plaza-sample","startUrl":["https://www.virtuoso.com/hotels/6163838/the-plaza#.XD-Sb1xKhPY"],"selectors":[{"id":"top-list-without-year","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-header__amenities ul","multiple":false,"regex":"","delay":0},{"id":"top-list-with-year","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-header__amenities div.tab-content div","multiple":false,"regex":"","delay":0},{"id":"bottom-list-without-year","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-tabs div.tab-content.virtuoso-amenities ul","multiple":false,"regex":"","delay":0},{"id":"bottom-list-with-year","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-tabs div.tab-content.virtuoso-amenities div","multiple":false,"regex":"","delay":0}]}

I am having a similar problem, so I am also interested in this answer - if anyone has it. Right now when there is a line break or a bullet list, and I extract the text, it does not include any spaces between the lines. So in my CSV file I get a lot of "knowldgeThis" - words mushed together because the scraper is not accounting for line breaks. Thanks in advance for any help!

If there are never more than a handful of bullet points, you can hard-code each one: Amenity1, Amenity2, Amenity3, ...

Not ideal.

Sitemap:

{"_id":"testing","startUrl":["https://www.virtuoso.com/hotels/6163838/the-plaza#.XD-Sb1xKhPY"],"selectors":[{"id":"amenity1","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-header__amenities li:nth-of-type(1)","multiple":false,"regex":"","delay":0},{"id":"amenity2","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-header__amenities li:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"amenity3","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-header__amenities li:nth-of-type(3)","multiple":false,"regex":"","delay":0},{"id":"amenity4","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-header__amenities li:nth-of-type(4)","multiple":false,"regex":"","delay":0},{"id":"amenity5","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-header__amenities li:nth-of-type(5)","multiple":false,"regex":"","delay":0},{"id":"amenity6","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-header__amenities li:nth-of-type(6)","multiple":false,"regex":"","delay":0}]}

If you don't mind each bullet point going on a separate line in your output, you can scrape the bullet list as a text/multiple.

(But for some reason this has double output? I'm a newb.)

Sitemap:

{"_id":"testing","startUrl":["https://www.virtuoso.com/hotels/6163838/the-plaza#.XD-Sb1xKhPY"],"selectors":[{"id":"amenities","type":"SelectorText","parentSelectors":["_root"],"selector":"div[id="amenities-content"] > p + ul > li","multiple":true,"regex":"","delay":0}]}

Hi toutopiawego - Nice to know that I'm not alone in struggling to figure this one out.

Hi Bort - Thanks for the great feedback!! I had tried to grab the items on the list one-by-one, however the number of items on the lists can vary slightly. I have found an example that has quite a few bulleted items, and built the scraper around that. It may prove to get the job done.

I am really intrigued by your second post, however I can't replicate it. Are you taking the original sitemap and adding your sitemap to it? When I try creating a sitemap using just the text you provided, I get an error message.

Thanks again!

Let me look into this. I did discover something cool.. Check out
https://wapi.engage.co/api/v2/getUsers?apiKey=5e5287ce9c4979cd6acf742850fd21af&categorySlug=Virtuoso-USCAN&syndicationCode=virtuoso&sig=6d3ecae505aad4e5abc7385da7c29e3d1e9c4197

Copy the output and paste it into https://json-csv.com/

Looks like a list of their employees

Change text selector to HTML selector. This will but

  • instead of the bullets. You can swap it out in excel later.

    You can also used Group Selector and use an asci code for the name like ■

    {"_id":"plaza-sample","startUrl":["https://www.virtuoso.com/hotels/6163838/the-plaza#.XD-Sb1xKhPY"],"selectors":[{"id":"■  ","type":"SelectorGroup","parentSelectors":["_root"],"selector":"div.product-header__amenities li","delay":0,"extractAttribute":""},{"id":"top-list-with-year","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-header__amenities div.tab-content div","multiple":false,"regex":"","delay":0},{"id":"bottom-list-without-year","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-tabs div.tab-content.virtuoso-amenities ul","multiple":false,"regex":"","delay":0},{"id":"bottom-list-with-year","type":"SelectorText","parentSelectors":["_root"],"selector":"div.product-tabs div.tab-content.virtuoso-amenities div","multiple":false,"regex":"","delay":0}]}
    
  • Anyone found a way to scrape data in bullet point or line breaks?

    Can’t seem to make it work. Tried all methods discussed above but still cannot make it work.