Hi there,
I'm enjoying learning how to use the tool for a variety of purposes. Just signed up for a subscription. Very useful!
I am stuck on one use. I've been trying to tweak things and figure it out myself for a good few hours now so I've finally given up and turning to the experts for help!
A mock up of the scenario is on the link below. There are multiple links to different job listings that are hosted on different domains. I'd like to be able to extract the job name, salary, location, description from each into a spreadsheet. As the code is slightly different on each website I'm hitting a roadblock - ie. the title of the job is using an h1 or one site, but an h3 on another.
Below is my sitemap just focussing on extracting the job title for now. As you can see the job titles are split across two different columns - 'name1' and 'name2'. How can I set up the sitemap so they all appear under a single column. It doesn't appear you can use the same selector more than once across any elements.
Url: http://rssbuilder.nfshost.com/bba/joblistingexample.html
Sitemap:
{"_id":"joblinks","startUrl":["http://rssbuilder.nfshost.com/bba/joblistingexample.html"],"selectors":[{"id":"grablink","type":"SelectorLink","parentSelectors":["_root"],"selector":"a","multiple":true,"delay":0},{"id":"indeed","type":"SelectorElement","parentSelectors":["grablink"],"selector":"div.jobsearch-ViewJobLayout-mainContent","multiple":false,"delay":0},{"id":"reed","type":"SelectorElement","parentSelectors":["grablink"],"selector":"article","multiple":false,"delay":0},{"id":"name1","type":"SelectorText","parentSelectors":["indeed"],"selector":"h3","multiple":false,"regex":"","delay":0},{"id":"name2","type":"SelectorText","parentSelectors":["reed"],"selector":"h1","multiple":false,"regex":"","delay":0}]}
I thought one solution may be to exports the above, and edit so both selectors were called 'name' then import. This results in one column, but only grabs half the names. This sitemap is below.
{"_id":"joblinks","startUrl":["http://rssbuilder.nfshost.com/bba/joblistingexample.html"],"selectors":[{"id":"grablink","type":"SelectorLink","parentSelectors":["_root"],"selector":"a","multiple":true,"delay":0},{"id":"indeed","type":"SelectorElement","parentSelectors":["grablink"],"selector":"div.jobsearch-ViewJobLayout-mainContent","multiple":false,"delay":0},{"id":"reed","type":"SelectorElement","parentSelectors":["grablink"],"selector":"article","multiple":false,"delay":0},{"id":"name1","type":"SelectorText","parentSelectors":["indeed"],"selector":"h3","multiple":false,"regex":"","delay":0},{"id":"name2","type":"SelectorText","parentSelectors":["reed"],"selector":"h1","multiple":false,"regex":"","delay":0}]}
Any help appreciated - I'm hoping it is possible and I'm just unable to figure it out!