I'm building a scraper for a page that has the following structure (I've selected down to this element, and my goal is to get candidate-level information while retaining the precinct names from which I got it):
<div id="county">
<div class="county-name">...</div>
<div class="precinct-name">...</div>
<div class="candidate">...</div>
<div class="candidate">...</div>
<div class="candidate">...</div>
<div class="precinct-name">...</div>
<div class="candidate">...</div>
<div class="candidate">...</div>
<div class="candidate">...</div>
</div>
Where, all within one div, they have child elements for county (formatted to look like a title) then precinct name then the candidates. Every county has one county name, but each county has a variable number of precincts, and each county has a variable number of candidates. For what it's worth, each precinct under each county would have the same number of candidates.
If I use something like ">div.county div" I'm stuck with precinct titles in line with the candidate information. Ideally, I'd like to select the element like "> div.county > div.precinct-name" then subsidiary to that bang out some ">candidate" information, but obviously that's not an option.
If jQuery were available, I'd use something like nextUntil but again, dreams.
Is there any way for me to do this?