Table question - scraping select fields and recreating table in excel

oldedb · October 10, 2019, 1:12pm

Describe the problem.
Brand new here so apologize for the basic question and what is probably an inadequate description of what I am working with....

I have a table of compensation data that I am wanting to scrape just the employee name and the salary. I don't believe the table is a standard html table and there isn't any headers listed. When I look at the element inspector I see a lot of this....

I am using txt selector and scraping the employee name and that is working....
I am also using txt selector and scrape the employee salary and that is working...

When I export as csv I am getting all of the data but it obviously isn't structured with employee in one column and matching salary in adjacent column.

Is there a better way to scrape this so the data is structured in the csv or do I need to manipulate the data using web scraper order somehow?

Thanks,
oldedb

Url: http://example.com

Sitemap:
{id:"sitemap code"}

leemeng · October 10, 2019, 2:51pm

You've probably set all your text scrapers to "multiple", so this would not produce the results you want. You'll need to create a selector for the table rows instead, and place the text scrapers under the rows selector. The rows selector acts as a "container" for your text scrapers so they will be grouped together. The rows selector is the one that needs to be set to "multiple", while the text scrapers are not (unchecked).

Take a look at this example below where I scrape a table from the w3schools tables page at
https://www.w3schools.com/html/html_tables.asp

{"_id":"scrape-table-example","startUrl":["https://www.w3schools.com/html/html_tables.asp"],"selectors":[{"id":"table selector","type":"SelectorElement","parentSelectors":["_root"],"selector":"table#customers","multiple":false,"delay":0},{"id":"rows selector","type":"SelectorElement","parentSelectors":["table selector"],"selector":"tr:nth-of-type(n+2)","multiple":true,"delay":0},{"id":"Company","type":"SelectorText","parentSelectors":["rows selector"],"selector":"td:nth-of-type(1)","multiple":false,"regex":"","delay":0},{"id":"Contact","type":"SelectorText","parentSelectors":["rows selector"],"selector":"td:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"Country","type":"SelectorText","parentSelectors":["rows selector"],"selector":"td:nth-of-type(3)","multiple":false,"regex":"","delay":0}]}

This is what the scraper's structure (selector graph) looks like. Take note of what needs to be set to multiple, and what should not be:
table-scraper-structure

oldedb · October 15, 2019, 10:38am

How do I handle this if there isn't an option to select the table? I followed your example on some sites where a basic html and javascript table were present but in many instances, I can't set the selector to table and actually select the table I am wanting.

Thanks

leemeng · February 2, 2020, 9:45am

I'm using Type: Element here for the table, and not Type:Table. The name "table selector" was merely for my own reference. You can use this method for non-standard tables.