How to get text data inside from clicked row?

surya · June 19, 2018, 11:36am

First i wanna say sorry for my bad in english. I want to get text from "Bentuk Sediaan". But first i must click every row of product. How can i do that? Thanks you so much.
Url: http://cekbpom.pom.go.id/index.php/home/produk/bij2e8vh4iengj0no5qg9vkp64/all/row/1000/page/1/order/6/ASC/search/6/ABBOT

Sitemap:
{id:"sitemap code"}

iconoclast · June 19, 2018, 4:25pm

Hi!

The link you've brought doesn't show the results, it opens the site.

Since it's an AJAX table, some of the stuff is generated on-the-fly when you click, the scraping can be done using Element Click selector.

I've done a test search and made you an example sitemap:

{"_id":"cekbpom","startUrl":["http://cekbpom.pom.go.id/index.php/home/produk/p5jct1st5aggl19nqrosulqj54/all/row/10/page/0/order/4/DESC/search/4/1"],"selectors":[{"id":"info_inside_div_click","type":"SelectorElementClick","selector":"td#filltd > table.normal > tbody > tr","parentSelectors":["_root"],"multiple":true,"delay":"1000","clickElementSelector":"table.tabelajax > tbody > tr:nth-of-type(n+3) > td:nth-of-type(1)","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Nomor Registrasi","type":"SelectorText","selector":"tr:contains('Nomor Registrasi') > td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Tanggal Terbit","type":"SelectorText","selector":"tr:nth-of-type(2):contains('Tanggal Terbit') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Diterbitkan Oleh","type":"SelectorText","selector":"tr:nth-of-type(3):contains('Diterbitkan Oleh') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Produk","type":"SelectorText","selector":"tr:nth-of-type(5):contains('Produk') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Nama Produk","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Nama Produk') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Bentuk Sediaan","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Bentuk Sediaan') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Komposisi","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Komposisi') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Merk","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Merk') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Kemasan","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Kemasan') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Pendaftar","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Pendaftar') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Diproduksi Oleh","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Diproduksi Oleh') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0}]}

All you left to do is do a correct search, then copy the URL, and put it into sitemap Metadata (replace your URL with new one).

Please note that URL of website contains very useful information regarding number of rows to show and a page number, so basically if you want to scrape all results without using pagination, you can put number of all possible rows (for example, there's 500 rows of data) into a row number to show (Data Per-Halaman).
The resulting page will display all 500 results on one page.

P.S. please note that results will be generated if WebScraper scrape all the rows and finishes, if you stop it manually no results will be shown

surya · June 20, 2018, 6:48am

Oh thanks you so much sir.
It's work like what i want. I am so grateful you answer my question.
And if i can ask question, when scrape can i open other tab or i must focus on scrape? And is the my internet connection affect the scrape process? Cause i have a low internet connection. Sorry for my bad English and thanks again

iconoclast · June 20, 2018, 8:51am

I'm glad i've helped.
You can surely open as much extra tabs and browse the internet as you want, the bandwidth required for the website you scrape should be enough.

P.S. i usually do two scrapes at once, and also browse internet without any problems.

surya · June 20, 2018, 5:18pm

Oke thanks you so much
Can i ask you one more question sir? I found something funny. When I scrape 1000 or 500 data per-halaman (row) then scrape already done I export to CSV, but nothing scrape. But when I scrape under 100 it's fine. What happen? I am so confused.

iconoclast · June 22, 2018, 1:27am

@surya

Hi!

It seems I've picked a wrong wrapper for a table, should have picked it for element click in the first place (with rows > 100).

Here's the working one (replace the start URL with your search):

{"_id":"cekbpom","startUrl":["http://cekbpom.pom.go.id/index.php/home/produk/oeaak4hrli7abbrueijln2j3q1/all/row/200/page/1/order/4/DESC/search/0/1"],"selectors":[{"id":"info_inside_div_click","type":"SelectorElementClick","selector":"td#filltd","parentSelectors":["_root"],"multiple":true,"delay":"300","clickElementSelector":"table.tabelajax > tbody > tr:nth-of-type(n+3) > td:nth-of-type(1)","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Nomor Registrasi","type":"SelectorText","selector":"table.normal table.normal tr:contains('Nomor Registrasi') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Tanggal Terbit","type":"SelectorText","selector":"tr:nth-of-type(2):contains('Tanggal Terbit') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Diterbitkan Oleh","type":"SelectorText","selector":"tr:nth-of-type(3):contains('Diterbitkan Oleh') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Produk","type":"SelectorText","selector":"tr:nth-of-type(5):contains('Produk') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Nama Produk","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Nama Produk') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Bentuk Sediaan","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Bentuk Sediaan') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Komposisi","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Komposisi') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Merk","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Merk') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Kemasan","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Kemasan') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Pendaftar","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Pendaftar') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Diproduksi Oleh","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Diproduksi Oleh') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0}]}

Or just replace a Selector in your Element Click selector with td#filltd instead of td#filltd1 > table.normal > tbody > tr

surya · June 23, 2018, 5:18am

OK, now i have another problem. When I test with 5 row. It's scrape all 5 row data. But when I scrape 25 data, it's only 16. When I scrape 1000 row, it's only 675 data. Why like that? Sometime the table click it's skip some row. Thanks before

surya · June 23, 2018, 5:57am

After I see, the scrape skip the row with same data. But i want still scrape it. How to do that?

iconoclast · June 23, 2018, 10:50am

Hi, please try increasing click delay so data will load fully on click.
I've expanded wrapper to table and it's navigation.
It will skip data if data is absolutely identical, perhaps table contains duplicates? Registration numbers are all different

Please try this one:

{"_id":"cekbpom","startUrl":["http://cekbpom.pom.go.id/index.php/home/produk/akh0ntijr9hrcfp2jeqa5dn0n5/all/row/200/page/1/order/4/DESC/search/0/1"],"selectors":[{"id":"info_inside_div_click","type":"SelectorElementClick","selector":"table","parentSelectors":["_root"],"multiple":true,"delay":"1000","clickElementSelector":"table.tabelajax > tbody > tr:nth-of-type(n+3) > td:nth-of-type(1)","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Nomor Registrasi","type":"SelectorText","selector":"table.normal table.normal tr:contains('Nomor Registrasi') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Tanggal Terbit","type":"SelectorText","selector":"tr:nth-of-type(2):contains('Tanggal Terbit') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Diterbitkan Oleh","type":"SelectorText","selector":"tr:nth-of-type(3):contains('Diterbitkan Oleh') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Produk","type":"SelectorText","selector":"table.normal td:nth-of-type(1) tr:contains('Produk') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Nama Produk","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Nama Produk') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Bentuk Sediaan","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Bentuk Sediaan') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Komposisi","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Komposisi') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Merk","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Merk') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Kemasan","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Kemasan') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Pendaftar","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Pendaftar') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0},{"id":"Diproduksi Oleh","type":"SelectorText","selector":"td:nth-of-type(2) tr:contains('Diproduksi Oleh') td:nth-of-type(2)","parentSelectors":["info_inside_div_click"],"multiple":false,"regex":"","delay":0}]}

surya · June 23, 2018, 5:52pm

thanks for you answer sir. But the result is scrape nothing. And when i export to csv the result is null. I want to scrape all data without skip the same unique text or id.

iconoclast · June 23, 2018, 6:05pm

Did you close window when scrape was done yourself? Or it closed by itself?

P.S. please note that if you did a search, then closed it for a long time the link will not open, you have to do search, then copy link and use it in your scrape sitemap as fast as you can, it will last for a limited time until website cache clears.

surya · June 23, 2018, 6:47pm

nope. I try something and it works fine. I change click selector unique to unique css selector and it will scrape all row or data without skip the same id. Thanks you for your help for all sir ;D