How scrape where search results in iframe?

Hi,

This site has got me stumped - Search for registered migration agents  · OMARA Self-Service Portal

I am guessing the search result are in the iframe but I cant work out if it is possible to identify the path to the Start URL so the web scrape can commence.

I need to scrape the Family Name link then multiple fields on the linked page.

Any help greatly appreciated.

Cheers

Hi

I just put in an asterisk with "Agents given name" and did a search. I used that to start building the scraper. It actually started to scrape.

I did not wait for this to finish to see if all pages were scraped, but it started at least. I was just scraping the email for now.

Here is the starter site map.

{"_id":"aateast","startUrl":["https://portal.mara.gov.au/search-the-register-of-migration-agents/"],"selectors":[{"id":"pagaination","paginationType":"auto","parentSelectors":["_root","pagaination"],"selector":".pagination li:nth-of-type(n+2) a","type":"SelectorPagination"},{"delay":0,"id":"Element","multiple":true,"parentSelectors":["pagaination"],"selector":"td[data-attribute='lastname']","type":"SelectorElement"},{"delay":0,"id":"Link","multiple":false,"parentSelectors":["Element"],"selector":"a","type":"SelectorLink"},{"delay":0,"id":"Email","multiple":false,"parentSelectors":["Link"],"regex":"","selector":"tr:contains('Email address') a:nth-of-type(1)","type":"SelectorText"}]}

I hope this helps.

Hi,

Thanks for the suggestion. Unfortunately it does not progress to the next page.

I am having the same problem with my sitemap. Any suggestions to get this working are most welcome.

Cheers

{"_id":"omara","startUrl":["https://portal.mara.gov.au/search-the-register-of-migration-agents/"],"selectors":[{"id":"next","paginationType":"auto","parentSelectors":["_root","next"],"selector":"a[data-page='2'][data-toggle]","type":"SelectorPagination"},{"delay":0,"id":"agent","multiple":true,"parentSelectors":["_root","next"],"selector":"a.details-link","type":"SelectorLink"},{"delay":0,"id":"Name","multiple":false,"parentSelectors":["agent"],"regex":"","selector":"label.resgister-fullname","type":"SelectorText"},{"delay":0,"id":"MARN","multiple":false,"parentSelectors":["agent"],"regex":"","selector":"label:nth-of-type(3)","type":"SelectorText"},{"delay":0,"id":"Status","multiple":false,"parentSelectors":["agent"],"regex":"","selector":"label:nth-of-type(5)","type":"SelectorText"},{"delay":0,"id":"Start_Date","multiple":false,"parentSelectors":["agent"],"regex":"","selector":"label#regDate","type":"SelectorText"},{"delay":0,"id":"Business_Name","multiple":false,"parentSelectors":["agent"],"regex":"","selector":".col-sm-8 tr:contains('Business name') td:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"Business_Address","multiple":false,"parentSelectors":["agent"],"regex":"","selector":"tr:contains('Business address') td:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"Phone","multiple":false,"parentSelectors":["agent"],"regex":"","selector":"tr:contains('Phone') td:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"Email","multiple":false,"parentSelectors":["agent"],"regex":"","selector":"tr:contains('Email address') a:nth-of-type(1)","type":"SelectorText"}]}

I managed to work it out. Here is my Sitemap if anyone else happens to want to scrape the OMARA site :grinning:

{"_id":"omara_ele","startUrl":["https://portal.mara.gov.au/search-the-register-of-migration-agents/"],"selectors":[{"clickElementSelector":".pagination li:nth-of-type(n+2) a","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"discard","id":"pagination","multiple":true,"parentSelectors":["_root"],"selector":"div.col-md-9","type":"SelectorElementClick"},{"delay":0,"id":"Agent","multiple":true,"parentSelectors":["pagination"],"selector":"a.details-link","type":"SelectorLink"},{"delay":0,"id":"Name","multiple":false,"parentSelectors":["Agent"],"regex":"","selector":"label.resgister-fullname","type":"SelectorText"},{"delay":0,"id":"Registration","multiple":false,"parentSelectors":["Agent"],"regex":"","selector":"label:nth-of-type(3)","type":"SelectorText"},{"delay":0,"id":"Status","multiple":false,"parentSelectors":["Agent"],"regex":"","selector":"label:nth-of-type(5)","type":"SelectorText"},{"delay":0,"id":"Commenced","multiple":false,"parentSelectors":["Agent"],"regex":"","selector":"label#regDate","type":"SelectorText"},{"delay":0,"id":"Business_Name","multiple":false,"parentSelectors":["Agent"],"regex":"","selector":".col-sm-8 tr:contains('Business name') td:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"Business_Address","multiple":false,"parentSelectors":["Agent"],"regex":"","selector":"tr:contains('Business address') td:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"Phone","multiple":false,"parentSelectors":["Agent"],"regex":"","selector":"tr:contains('Phone') td:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"Email","multiple":false,"parentSelectors":["Agent"],"regex":"","selector":"tr:contains('Email address') a:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"Website","multiple":false,"parentSelectors":["Agent"],"regex":"","selector":"a#weburl","type":"SelectorText"}]}

@woteva Well done! Your answer might just help us, and especially me who happens to have a similar problem. Thank you.

No worries, total newb here so happy to provide some assistance if I can.
Working on another site now which is causing me problems, may make another post seeking assistance if I cant work it out.

1 Like