Scrap data from new tab (open after the link is opened)

Hi everyone,

I'm currently trying to scrape tender data from a website, but I ran into an issue when trying to extract information that appears in a new tab (popup page).

Problem:
I have successfully created a sitemap to go through pagination and click on each tender title on the main list. However, when the link is clicked, it opens in a new tab, and whenever I try to create a child selector to extract data like "Kode Tender" from that page, the scraping fails — possibly due to the new tab being blocked or inaccessible from the scraper.

URL:
https://lpse.kemkes.go.id/eproc4/lelang?kategoriId=&tahun=2024&instansiId=&rekanan=&kontrak_status=&kontrak_tipe=

Example detail page that opens in a new tab:
https://lpse.kemkes.go.id/eproc4/lelang/47576047/pengumumanlelang

Sitemap:
{"_id":"LPSE2","startUrl":["https://lpse.kemkes.go.id/eproc4/lelang?kategoriId=&tahun=2024&instansiId=&rekanan=&kontrak_status=&kontrak_tipe="],"selectors":[{"clickActionType":"real","clickElementSelector":".next a","clickElementUniquenessType":"uniqueText","clickType":"clickMore","delay":2000,"discardInitialElements":"discard-when-click-element-exists","id":"page","multiple":true,"parentSelectors":["_root"],"selector":"td p a","type":"SelectorElementClick"},{"id":"link","linkType":"linkFromHref","multiple":false,"parentSelectors":["page"],"selector":"parent","type":"SelectorLink"},{"clickActionType":"real","clickElementSelector":"parent","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"discard-when-click-element-exists","id":"Buka Link","multiple":true,"parentSelectors":["page"],"selector":"parent","type":"SelectorElementClick"}]}

It is probably easier to scrape this site in two stages, where in stage 1, you get all the URLs of the tender title pages, and then in stage 2 you have a different sitemap which uses all those stage 1 URLs as Starturls.

I find that this method solves a lot of pagination- and navigation-related issues.

got it. but i have another problem, the URLs that gathered from stage 1 needed a 'refferer' so if the URLs open directly from the browser (copy paste) the access is blocked. do have any solution? :smiling_face_with_tear:

found your comment with the exact same problem in here

but the referer control extension is not available anymore in chrome. do you know the alternatives?

You can test this. I haven't had time to try:

already tried and it works, thanks!

1 Like