I'm trying to scrape equip-bid.com. Ultimately I am looking to get 4 fields in my results. Those fields are 1) The description of the auction, 2) The Current High Bid, 3) The Current High Bidder and 4) A Link to the first image posted.
I have been able to get the current bid, high bid and image all on one line of the output, but due to the way the site is set up, I can't get the description in the same element as those other three items. Which also means that I can't get the description on the same line of the output file.
Here is what I have so far...
{"_id":"equip-bid-3","startUrl":["Current Auctions | Equip-Bid a","multiple":true,"delay":0},{"id":"pagination","parentSelectors":["auction-links","pagination"],"paginationType":"auto","selector":".lot-list div:nth-of-type(1) #pagination_wrapper a","type":"SelectorPagination"},{"id":"element","parentSelectors":["pagination"],"type":"SelectorElement","selector":".lot-list div.row:nth-of-type(n+3)","multiple":true,"delay":0},{"id":"description","parentSelectors":["pagination"],"type":"SelectorText","selector":"div.lot-description.description-wrap-fix","multiple":true,"delay":0,"regex":""},{"id":"current-bid","parentSelectors":["element"],"type":"SelectorText","selector":"span.lot-current-bid","multiple":false,"delay":0,"regex":""},{"id":"high-bidder","parentSelectors":["element"],"type":"SelectorText","selector":"small span.pull-right","multiple":false,"delay":0,"regex":""},{"id":"image","parentSelectors":["element"],"type":"SelectorImage","selector":"img","multiple":false,"delay":0}]}
You can also see that this results in 2/3 of the rows being "null", which isn't ideal but would be easy to fix once I have it in Google Sheets.
I've also tried something like this...
{"_id":"equip-bid-5","startUrl":["Current Auctions | Equip-Bid a","type":"SelectorLink"},{"id":"pagination","paginationType":"auto","parentSelectors":["auction-links","pagination"],"selector":".lot-list div:nth-of-type(1) #pagination_wrapper a","type":"SelectorPagination"},{"delay":0,"id":"full","multiple":true,"parentSelectors":["pagination"],"regex":"","selector":".lot-list > div.row:nth-of-type(n+2)","type":"SelectorText"}]}
But while that technically gets me all of the data, it results in an enormous amount of empty space. Maybe if there was a way to remove all that empty space?
Thanks for any help!