Data doesn't match up

When scraping multiple records from a single page you might get into a situation where each value is in a separate row. Scraped data looks like this:

title price
something
2.5
pear
3

This is a common mistake that users are making. You are probably using multiple Text selectors with multiple option checked. The solution is to use Element selector to select wrapper elements of the items. Then add child selectors to the element selector to extract each items data. These child selectors must not have multiple option checked.

http://webscraper.io/documentation#element-selector

4 Likes

So, I tried doing this for my particular dataset, and I'm still having problems with empty cells, data not matching up.

I created element selectors (set to "multiple"), and then child text selectors (NOT set to "multiple"), and while data preview shows that my data looks good, when I actually scrap the page, the resulting data is not lined up.

Any advice?

{"_id":"test","startUrl":["https://paintref.com/cgi-bin/colorcodedisplay.cgi?make=Smart&con=k&page=1&rows=50"],"selectors":[{"id":"yearelement","type":"SelectorElement","parentSelectors":["_root"],"selector":"center tr:nth-of-type(n+2) td:nth-of-type(3)","multiple":true,"delay":0},{"id":"year","type":"SelectorText","parentSelectors":["yearelement"],"selector":"a","multiple":false,"regex":"","delay":0},{"id":"makeelement","type":"SelectorElement","parentSelectors":["_root"],"selector":"center tr:nth-of-type(n+2) td:nth-of-type(4)","multiple":true,"delay":0},{"id":"make","type":"SelectorText","parentSelectors":["makeelement"],"selector":"a","multiple":false,"regex":"","delay":0},{"id":"colornameelement","type":"SelectorElement","parentSelectors":["_root"],"selector":"center tr:nth-of-type(n+2) td:nth-of-type(5)","multiple":true,"delay":0},{"id":"color name","type":"SelectorText","parentSelectors":["colornameelement"],"selector":"a","multiple":false,"regex":"","delay":0},{"id":"colorcodeelement","type":"SelectorElement","parentSelectors":["_root"],"selector":"center tr:nth-of-type(n+2) td:nth-of-type(6)","multiple":true,"delay":0},{"id":"color code","type":"SelectorText","parentSelectors":["colorcodeelement"],"selector":"a","multiple":false,"regex":"","delay":0},{"id":"sampleelement","type":"SelectorElement","parentSelectors":["_root"],"selector":"center > table[cellpadding] tr:nth-of-type(n+2) td[align]","multiple":true,"delay":0},{"id":"hexcode","type":"SelectorElementAttribute","parentSelectors":["sampleelement"],"selector":"parent","multiple":false,"extractAttribute":"bgcolor","delay":0}]}

This scraper would definitely not work the way you intended. I have restructured it based on rows as the selectors. Pls look at the Selector graph and modify scraper as needed:

{"_id":"paint_test","startUrl":["https://paintref.com/cgi-bin/colorcodedisplay.cgi?make=Smart&con=k&page=1&rows=50"],"selectors":[{"id":"yearelement","type":"SelectorElement","parentSelectors":["Row Selector"],"selector":"td:nth-of-type(3)","multiple":false,"delay":0},{"id":"year","type":"SelectorText","parentSelectors":["yearelement"],"selector":"a","multiple":false,"regex":"","delay":0},{"id":"makeelement","type":"SelectorElement","parentSelectors":["Row Selector"],"selector":"td:nth-of-type(4)","multiple":false,"delay":0},{"id":"make","type":"SelectorText","parentSelectors":["makeelement"],"selector":"a","multiple":false,"regex":"","delay":0},{"id":"colornameelement","type":"SelectorElement","parentSelectors":["Row Selector"],"selector":"td:nth-of-type(5)","multiple":false,"delay":0},{"id":"color name","type":"SelectorText","parentSelectors":["colornameelement"],"selector":"a","multiple":false,"regex":"","delay":0},{"id":"Row Selector","type":"SelectorElement","parentSelectors":["_root"],"selector":"center > table[cellpadding] tr:nth-of-type(n+2)","multiple":true,"delay":0}]}