Show all / show less

I'm a newbie with minimal html knowledge.

I'm trying to scrape what appears to be a table. When the user first clicks into the page containing the table, only the first dozen or so table entries are displayed, and only those are scraped. There is a "link" called "show all", which when clicked shows the entire table. It does not change the URL in the browser URL field.

Web Scraper will access the entire table if the entire table is visible, and the root of the search is the page containing the table. But I haven't been able to figure out how to get Web Scraper to click on "show all", and successfully access the entire table. I get zero results from my scrape.

I've tried the "link" selector, and the "Element click selector". I don't think the link selector is the right one to use. I'm not sure about Element click selector, or if I've chosen the correct options for it.

https://www.newport.com/f/n-bk7-right-angle-prisms

Sitemap:
{"_id":"newport_rap2","startUrl":["https://www.newport.com/c/right-angle-prisms"],"selectors":[{"id":"link_rap","type":"SelectorLink","parentSelectors":["_root"],"selector":"div.product_item:nth-of-type(n+2) h2.title a.blkclr","multiple":true,"delay":0},{"id":"show_all","type":"SelectorElementClick","parentSelectors":["link_rap"],"selector":"div.show_more a","multiple":false,"delay":0,"clickElementSelector":"div.show_more a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"table_rap","type":"SelectorTable","parentSelectors":["show_all"],"selector":"div.dataTables_wrapper table.table","multiple":true,"columns":[{"header":"Compare","name":"Compare","extract":false},{"header":"Model","name":"Model","extract":true},{"header":"Drawings, CAD & Specs","name":"Drawings, CAD & Specs","extract":false},{"header":"Wavelength Range","name":"Wavelength Range","extract":true},{"header":"Size","name":"Size","extract":true},{"header":"Material","name":"Material","extract":true},{"header":"Coating Type","name":"Coating Type","extract":true},{"header":"Coating Code","name":"Coating Code","extract":true},{"header":"Availability","name":"Availability","extract":false},{"header":"Featured Item","name":"Featured Item","extract":false},{"header":"Price","name":"Price","extract":true},{"header":"Brand","name":"Brand","extract":false}],"delay":0,"tableDataRowSelector":"tr:nth-of-type(n+7)","tableHeaderRowSelector":"thead tr"}]}

You were very close.

You need to use the element click selector (which has two elements to add it)

The Click Selector get's tied to "#show_all_table a
The other Selector - you need to highlight the entire table (so the red box encompasses all of it) which ends up being div.product_table div.col-lg-12 (though their may be a cleaner element to use)
Set this to click once and uncheck multiple.

Everything else you put works fine.

{"_id":"newport_rap2","startUrl":["https://www.newport.com/c/right-angle-prisms"],"selectors":[{"id":"link_rap","type":"SelectorLink","parentSelectors":["_root"],"selector":"h2.title a.blkclr","multiple":true,"delay":0},{"id":"show_all","type":"SelectorElementClick","parentSelectors":["link_rap"],"selector":"div.product_table div.col-lg-12","multiple":false,"delay":0,"clickElementSelector":"#show_all_table a","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"table_rap","type":"SelectorTable","parentSelectors":["show_all"],"selector":"div.dataTables_wrapper table.table","multiple":true,"columns":[{"header":"Compare","name":"Compare","extract":false},{"header":"Model","name":"Model","extract":true},{"header":"Drawings, CAD & Specs","name":"Drawings, CAD & Specs","extract":false},{"header":"Wavelength Range","name":"Wavelength Range","extract":true},{"header":"Size","name":"Size","extract":true},{"header":"Material","name":"Material","extract":true},{"header":"Coating Type","name":"Coating Type","extract":true},{"header":"Coating Code","name":"Coating Code","extract":true},{"header":"Availability","name":"Availability","extract":false},{"header":"Featured Item","name":"Featured Item","extract":false},{"header":"Price","name":"Price","extract":true},{"header":"Brand","name":"Brand","extract":false}],"delay":0,"tableDataRowSelector":"tr:nth-of-type(n+7)","tableHeaderRowSelector":"thead tr"}]}

I feel like an idiot. I with either continuing to the next page or going from page to page as this example needs to.

Is there a way to select the 2 separate options? for going to next page?

{"_id":"idaho","startUrl":["https://iacp.wildapricot.org/directory"],"selectors":[{"id":"Click name","type":"SelectorLink","parentSelectors":["_root","next 50"],"selector":"h5 a","multiple":true,"delay":0},{"id":"copy First name","type":"SelectorText","parentSelectors":["Click name"],"selector":"span#FunctionalBlock1_ctl00_ctl00_memberProfile_MemberForm_memberFormRepeater_ctl00_TextBoxLabel8396507","multiple":false,"regex":"","delay":0},{"id":"copy last name","type":"SelectorText","parentSelectors":["Click name"],"selector":"span#FunctionalBlock1_ctl00_ctl00_memberProfile_MemberForm_memberFormRepeater_ctl01_TextBoxLabel8396508","multiple":false,"regex":"","delay":0},{"id":"copy email","type":"SelectorText","parentSelectors":["Click name"],"selector":"#FunctionalBlock1_ctl00_ctl00_memberProfile_MemberForm_memberFormRepeater_ctl02_TextBoxLabel8396506 a","multiple":false,"regex":"","delay":0},{"id":"next 50","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"#idPagingData select","multiple":false,"delay":0,"clickElementSelector":"#idPagingData select","clickType":"clickMore","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"}]}

Dropdowns can be tricky. Try the sitemap below. I used Page load delay: 5000

{"_id":"forum-wildapricot","startUrl":["https://iacp.wildapricot.org/directory"],"selectors":[{"id":"Row wrappers","type":"SelectorElement","parentSelectors":["_root","Click Show dropdown"],"selector":"tr.normal:nth-of-type(n+1)","multiple":true,"delay":0},{"id":"Name","type":"SelectorLink","parentSelectors":["Row wrappers"],"selector":"div.memberValue a","multiple":false,"delay":0},{"id":"City","type":"SelectorText","parentSelectors":["Row wrappers"],"selector":".memberDirectoryColumn2 div","multiple":false,"regex":"","delay":0},{"id":"Click Show dropdown","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div[id][class='memberDirectory']","multiple":true,"delay":"1500","clickElementSelector":"select[onchange^='MemberDirectoryListRenderer'] > option:nth-of-type(n+2)","clickType":"clickOnce","discardInitialElements":"discard-when-click-element-exists","clickElementUniquenessType":"uniqueText"}]}

Hi lee -

by any chance do you want to explain how you ended up locating and arriving on your click selector 'select[onchange^='MemberDirectoryListRenderer'] > option:nth-of-type(n+2)'

I'm trying reverse engineer the thought process.

O hey Bret, it's been months since I put that up. Lemme load it in again.

After looking thru some past postings on dropdowns (probably by @iconoclast), I realized most dropdowns can be treated like buttons, where the first option is button 1, the second option is button 2, etc. So it is a matter of finding the correct "button" element for Element click.

Looking at the wildapricot site, it is quite clear that the Show: dropdown element controls pagination so that would be a good place to inspect (right-click -> inspect). When the Elements tab opens, you'll see that the dropdown is called select and it has a number of child elements called option. The text in option matches up with dropdown text seen on the page, so it seems clear that the option elements can be "button" elements for Element click.

I structured the sitemap with the data scrapers at root level, above the paginator. The data scrapers have also been made child of the paginator. This is to ensure data is scraped even if there is no paginator present. E,g, on the wildapricot site, if the number of members should drop below 50, there would be no need for page 2 or page 3, hence no need for a paginator.

As there are already scrapers at the root level, and we are already on the first page, there is no need to click on option 1 in the dropdown, hence the use of :nth-of-type(n+2) which means "only click on option 2 and onwards".