Hi, I'm trying to scrape specific data about a series of plants. The website appears to use AJAX, and I've tried a few different ways to get around it, but I have not been successful.
The data I need to get requires several different navigational steps:
- Select a letter A-Z to show a popup
- In the popup, open the list of Plant Families
- On each Plant Family page, click "Compare All" button
- On the Compare All page, scrape the list of Plant URLs and open the Plant URLs
- On the Plant URL, scrape details about the plant
- Go to the next letter in step 1.
I've tried to create the sitemap several different ways, but I usually end up with one of the following scenarios:
- Scraper opens the A popup, then stops - no data is saved
- Scraper opens the A popup, goes to the Plant Family Page, then stops - no data is saved
- Scraper opens every A-Z popup, then stops - no data is saved
This is my current sitemap and it's result is scenario #2 above.
{"_id":"gardenia","startUrl":["https://www.gardenia.net/"],"selectors":[{"id":"Hardiness","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Hardiness') td","type":"SelectorText"},{"id":"Heat Zones","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Heat Zones') td","type":"SelectorText"},{"id":"Climate Zones","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Climate Zones') td","type":"SelectorText"},{"id":"Plant Type","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Plant Type') td","type":"SelectorText"},{"id":"Plant Family","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Plant Family') td","type":"SelectorText"},{"id":"Exposure","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Exposure') td","type":"SelectorText"},{"id":"Season of Interest","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Season of Interest') td","type":"SelectorText"},{"id":"Height","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Height') td","type":"SelectorText"},{"id":"Spread","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Spread') td","type":"SelectorText"},{"id":"Water Needs","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Water Needs') td","type":"SelectorText"},{"id":"Maintenance","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Maintenance') td","type":"SelectorText"},{"id":"Soil Type","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Soil Type') td","type":"SelectorText"},{"id":"Soil pH","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Soil pH') td","type":"SelectorText"},{"id":"Soil Drainage","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Soil Drainage') td","type":"SelectorText"},{"id":"Characteristics","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Characteristics') td","type":"SelectorText"},{"id":"Native Plants","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Native Plants') td","type":"SelectorText"},{"id":"Tolerance","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Tolerance') td","type":"SelectorText"},{"id":"Attracts","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Attracts') td","type":"SelectorText"},{"id":"Garden Uses","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Garden Uses') td","type":"SelectorText"},{"id":"Garden Styles","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Garden Styles') td","type":"SelectorText"},{"id":"Spacing","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".d-none tr:contains('Spacing') td","type":"SelectorText"},{"id":"other-names","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":"h2 em","type":"SelectorText"},{"id":"plant-name","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".body-heading h1","type":"SelectorText"},{"id":"description-text","multiple":false,"parentSelectors":["name-plant"],"regex":"","selector":".detail-text-area div","type":"SelectorText"},{"clickElementSelector":"a.alpha-click","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","id":"click-letter","multiple":true,"parentSelectors":["_root"],"selector":"body > div.container","type":"SelectorElementClick"},{"id":"click-compare-all","multiple":false,"parentSelectors":["plant-parent-name"],"selector":"a.btn-block","type":"SelectorLink"},{"id":"name-plant","multiple":true,"parentSelectors":["click-compare-all"],"selector":"strong a","type":"SelectorLink"},{"id":"plant-parent-name","multiple":true,"parentSelectors":["click-letter"],"selector":".list-wrapper a","type":"SelectorPopupLink"}]}
I've read through several other forums that were helpful so that I no longer get the "Parent does not contain selected element", but I'm still not able to scrap correctly.
Thank you for any help!
