Scraping multiple data points to a single row?

Love the program, it looks like EXACTLY what I need, and it's SO close to working, I'm hoping you can help me out. I'm trying to scrape data from https://datcp.wi.gov/Pages/Programs_Services/DataBreaches.aspx

I've created a sitemap and have selected as elements the company name, date of breach, date of notification, and all the other categories. When I click on "Data preview" everything looks great, in that each category is being pulled to a nice list. The problems arise when I try to export as a CSV, though. Unfortunately, when I export the data, it winds up on a spreadsheet with a single data entry per row (such that all the data that corresponds to each other is arrayed across multiple rows), rather than each data entry on a row, alongside the other corresponding data points. I followed the tutorial videos you've posted but the results don't seem to line up with my expectations. Any idea what I'm doing wrong? Thanks!

EDIT: It looks like this is a FAQ. I'm going to try to follow that advice, which is to select the wrapper containing the text as "element" and then select the text included within, and I'll report back!

EDIT2: Okay, I think that was the problem, but I still can't get it sorted. Anyone with any expertise want to point out what I'm doing wrong? If I select "Element" and try to select each main entry as an element, when I then try to add a child "Text" selector I'm not given the option to select the text I'd like, only a portion of the parent element.

Sitemap:

{"_id":"wisdatabreach","startUrl":["https://datcp.wi.gov/Pages/Programs_Services/DataBreaches.aspx#"],"selectors":[{"id":"coname","type":"SelectorText","selector":"div.ms-rtestate-field > div > table td:nth-of-type(3) p","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0},{"id":"DateOfBreach","type":"SelectorText","selector":"div.ms-rtestate-field > div > table tr:nth-of-type(2) td:nth-of-type(2) p","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0},{"id":"DateofNotification","type":"SelectorText","selector":"div.ms-rtestate-field > div > table tr:nth-of-type(2) td:nth-of-type(1) p","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0},{"id":"DataStolen","type":"SelectorText","selector":"div.ms-rtestate-field > div > table td:nth-of-type(4) p","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0},{"id":"whoaffected","type":"SelectorText","selector":"div.ms-rtestate-field > div > table tr:nth-of-type(4) td:nth-of-type(1) p:nth-of-type(1)","parentSelectors":["_root"],"multiple":true,"regex":"","delay":0}]}

1 Like

You had to select the table with element selector and then navigate your child elements with:
tr:nth-of-type(n) td:nth-of-type(n)

Here is the fixed sitemap:

{"_id":"wisdatabreach","startUrl":["https://datcp.wi.gov/Pages/Programs_Services/DataBreaches.aspx#"],"selectors":[{"id":"coname","type":"SelectorText","selector":"tr:nth-of-type(2) td:nth-of-type(3)","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"DateOfBreach","type":"SelectorText","selector":"tr:nth-of-type(2) td:nth-of-type(2) ","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"DateofNotification","type":"SelectorText","selector":"tr:nth-of-type(2) td:nth-of-type(1)","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"DataStolen","type":"SelectorText","selector":"tr:nth-of-type(2) td:nth-of-type(4) ","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"whoaffected","type":"SelectorText","selector":"tr:nth-of-type(4) td:nth-of-type(1) ","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"element","type":"SelectorElement","selector":"div table","parentSelectors":["_root"],"multiple":true,"delay":""}]}

Thank you SO MUCH. It works! Now I just have to make one of these for every other state... So I'm sorry I'm not sure I totally understand. How did you use the element selector to select the table? And you navigated to the child elements with a manual entry of CSS?

1 Like

If other state pages have similar designs you can just add multiple start URLs to this sitemap. You can select table by writing div table in selector field. Sometimes you have to select elements manually by looking at page source code.