Love the program, it looks like EXACTLY what I need, and it's SO close to working, I'm hoping you can help me out. I'm trying to scrape data from https://datcp.wi.gov/Pages/Programs_Services/DataBreaches.aspx
I've created a sitemap and have selected as elements the company name, date of breach, date of notification, and all the other categories. When I click on "Data preview" everything looks great, in that each category is being pulled to a nice list. The problems arise when I try to export as a CSV, though. Unfortunately, when I export the data, it winds up on a spreadsheet with a single data entry per row (such that all the data that corresponds to each other is arrayed across multiple rows), rather than each data entry on a row, alongside the other corresponding data points. I followed the tutorial videos you've posted but the results don't seem to line up with my expectations. Any idea what I'm doing wrong? Thanks!
EDIT: It looks like this is a FAQ. I'm going to try to follow that advice, which is to select the wrapper containing the text as "element" and then select the text included within, and I'll report back!
EDIT2: Okay, I think that was the problem, but I still can't get it sorted. Anyone with any expertise want to point out what I'm doing wrong? If I select "Element" and try to select each main entry as an element, when I then try to add a child "Text" selector I'm not given the option to select the text I'd like, only a portion of the parent element.
Sitemap:
{"_id":"wisdatabreach","startUrl":["https://datcp.wi.gov/Pages/Programs_Services/DataBreaches.aspx#"],"selectors":[{"id":"coname","type":"SelectorText","selector":"div.ms-rtestate-field > div > table td:nth-of-type(3) p","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0},{"id":"DateOfBreach","type":"SelectorText","selector":"div.ms-rtestate-field > div > table tr:nth-of-type(2) td:nth-of-type(2) p","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0},{"id":"DateofNotification","type":"SelectorText","selector":"div.ms-rtestate-field > div > table tr:nth-of-type(2) td:nth-of-type(1) p","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0},{"id":"DataStolen","type":"SelectorText","selector":"div.ms-rtestate-field > div > table td:nth-of-type(4) p","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0},{"id":"whoaffected","type":"SelectorText","selector":"div.ms-rtestate-field > div > table tr:nth-of-type(4) td:nth-of-type(1) p:nth-of-type(1)","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0}]}