How to Paginate pages on a site that the URL does not change?

Hi, I am trying to scrape info of lawyers, but I cannot seem to get this scrape to paginate across the pages, the site change sthe tables of information with out changing the url.

Any help would be great!

Url: http://members.calbar.ca.gov/fal/MemberSearch/AdvancedSearch?LastNameOption=b&LastName=a&FirstNameOption=b&FirstName=&MiddleNameOption=b&MiddleName=&FirmNameOption=b&FirmName=&CityOption=b&City=los+angeles&State=CA&Zip=&District=&County=&LegalSpecialty=&LanguageSpoken=

Sitemap:
{"_id":"copyaussie","startUrl":["http://members.calbar.ca.gov/fal/MemberSearch/AdvancedSearch?LastNameOption=b&LastName=a&FirstNameOption=b&FirstName=&MiddleNameOption=b&MiddleName=&FirmNameOption=b&FirmName=&CityOption=b&City=Los+angeles&State=CA&Zip=&District=&County=&LegalSpecialty=&LanguageSpoken="],"selectors":[{"id":"links","type":"SelectorLink","selector":"tr.rowASRLodd a","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"pagnation","type":"SelectorElementClick","selector":"div.dataTables_paginate","parentSelectors":["_root","links"],"multiple":true,"delay":"","clickElementSelector":"dataTables_paginate paging_simple_numbers","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"elements","type":"SelectorElement","selector":"div#moduleMemberDetail","parentSelectors":["links"],"multiple":false,"delay":0},{"id":"name/status","type":"SelectorText","selector":"h3","parentSelectors":["elements"],"multiple":false,"regex":"","delay":0},{"id":"address","type":"SelectorText","selector":"tr:nth-of-type(2) span","parentSelectors":["elements"],"multiple":false,"regex":"","delay":0},{"id":"phone","type":"SelectorText","selector":"tr:nth-of-type(2) td:nth-of-type(4)","parentSelectors":["elements"],"multiple":false,"regex":"","delay":0},{"id":"email","type":"SelectorHTML","selector":"tr:contains('Email:') > td > span:visible > a","parentSelectors":["elements"],"multiple":false,"regex":"","delay":0}]}

You have to select table rows as selector and pagination as click selector for element click selector. Then select links as child selector for pagination. Here is the fixed sitemap:

{"_id":"copyaussie","startUrl":["http://members.calbar.ca.gov/fal/MemberSearch/AdvancedSearch?LastNameOption=b&LastName=a&FirstNameOption=b&FirstName=&MiddleNameOption=b&MiddleName=&FirmNameOption=b&FirmName=&CityOption=b&City=Los+angeles&State=CA&Zip=&District=&County=&LegalSpecialty=&LanguageSpoken="],"selectors":[{"id":"links","type":"SelectorLink","selector":"a","parentSelectors":["pagnation"],"multiple":false,"delay":0},{"id":"pagnation","type":"SelectorElementClick","selector":"tr.rowASRLodd","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"a.paginate_button.next","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"elements","type":"SelectorElement","selector":"div#moduleMemberDetail","parentSelectors":["links"],"multiple":true,"delay":0},{"id":"name/status","type":"SelectorText","selector":"h3","parentSelectors":["elements"],"multiple":false,"regex":"","delay":0},{"id":"address","type":"SelectorText","selector":"tr:nth-of-type(2) span","parentSelectors":["elements"],"multiple":false,"regex":"","delay":0},{"id":"phone","type":"SelectorText","selector":"tr:nth-of-type(2) td:nth-of-type(4)","parentSelectors":["elements"],"multiple":false,"regex":"","delay":0},{"id":"email","type":"SelectorHTML","selector":"tr:contains('Email:') > td > span:visible > a","parentSelectors":["elements"],"multiple":false,"regex":"","delay":0}]}

1 Like

thank you so much!!! ive been stuck on this!

hi.. i think i have the same case with you, just follow the instruction from @KristapsWS , but i have no idea why my sitemap just stuck at 'loading page'?
trying to scrape product information from : https://www.tokopedia.com/p/fashion-wanita/perhiasan
and this is the sitemap i used

{"_id":"paginaton","startUrl":["https://www.tokopedia.com/p/fashion-wanita/perhiasan?page=2"],"selectors":[{"id":"pagination","type":"SelectorElementClick","selector":"div.r3-intermediary-box div.product-summary a, div.category-product-box div.product-summary a","parentSelectors":["_root"],"multiple":true,"delay":"2220","clickElementSelector":"li.ng-scope:nth-of-type(1) a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"links","type":"SelectorLink","selector":"_parent_","parentSelectors":["pagination"],"multiple":true,"delay":"2220"},{"id":"title","type":"SelectorText","selector":"a.p_title","parentSelectors":["links"],"multiple":false,"regex":"","delay":0}]}

any help ? thanks :slight_smile:

Your link selector had delay and checked "multiple" option. You have to set delay only on action selectors. Here is the fixed sitemap:

{"_id":"paginaton","startUrl":["https://www.tokopedia.com/p/fashion-wanita/perhiasan?page=2"],"selectors":[{"id":"pagination","type":"SelectorElementClick","selector":"div.r3-intermediary-box div.product-summary a, div.category-product-box div.product-summary","parentSelectors":["_root"],"multiple":true,"delay":"2220","clickElementSelector":"li.ng-scope:nth-of-type(1) a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"links","type":"SelectorLink","selector":"a","parentSelectors":["pagination"],"multiple":false,"delay":""},{"id":"title","type":"SelectorText","selector":"a.p_title","parentSelectors":["links"],"multiple":false,"regex":"","delay":0}]}

hi.. thank you very much for the help, this is work, :slight_smile:

Hi, I could sure use some help with Wep Scraper. I am new to scraping, and I am trying to scrape team defense vs opponent averages for NBA, but I cannot seem to get this to scrape properly from this site which is dynamic and the tables paginate via tabs. The first table, PG, does not get scraped. It utilizes jQuery 2.0 table sorter with sticky headers (table.tablesorter.hasStickyHeaders). So, I can only scrape the SG, SF, PF, and C position. The PG table is stuck as the primary first link with the DVP, DraftKings, and Season auto page start.

Any help would be appreciated!

URL: https://www.rotowire.com/daily/nba/defense-vspos.php?site=DraftKings&sport=NBA

-Nathan Marek

Hi,
have the same problem. I try your solution @KristapsWS all day, but i dont understand. My site map :

{"_id":"lavvvvv","startUrl":["https://j2c-com.com/bourget17/catalogueWeb/publication.php"],"selectors":[{"id":"pagination","type":"SelectorElementClick","selector":"div.colorP","parentSelectors":["_root"],"multiple":true,"delay":"2100","clickElementSelector":"span.bouton","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"exposant","type":"SelectorLink","selector":"div.col-xs-12 > a","parentSelectors":["pagination"],"multiple":false,"delay":""},{"id":"nom","type":"SelectorText","selector":"div.col-xs-12 div.colorP","parentSelectors":["exposant"],"multiple":false,"regex":"","delay":0},{"id":"email","type":"SelectorText","selector":"a.colorP:nth-of-type(1)","parentSelectors":["exposant"],"multiple":false,"regex":"","delay":0}]}

can you help me ?
TY
Florian

No one to help me ? it's very long page by page...:frowning:

In my case the sitemap that i built paginates correctñy but it doesen´t go further than page 2.
This is my current sitemap:
{"_id":"lubricantesdscomponentes","startUrl":["http://www.dscomponentes.es/web/articulo/familia.php?id=83&selSubFamilia=222&selSubSubFamilia=544"],"selectors":[{"id":"Links","type":"SelectorLink","selector":"div.pr_imagen a","parentSelectors":["_root","Next"],"multiple":true,"delay":0},{"id":"Elemento","type":"SelectorElement","selector":"div.fichaart","parentSelectors":["Links"],"multiple":false,"delay":"3000"},{"id":"Referencia","type":"SelectorText","selector":"div.REFERENCIA","parentSelectors":["Elemento"],"multiple":false,"regex":"","delay":0},{"id":"Precio","type":"SelectorText","selector":"div.precio","parentSelectors":["Elemento"],"multiple":false,"regex":"","delay":0},{"id":"Titulo","type":"SelectorText","selector":"div#descPP","parentSelectors":["Elemento"],"multiple":false,"regex":"","delay":0},{"id":"Next","type":"SelectorElementClick","selector":"div.pr_imagen img","parentSelectors":["_root"],"multiple":true,"delay":"7000","clickElementSelector":"div.numerosProductos a:nth-of-type(3)","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

Im trying to build a new one with your previous example shown to "kmorris0123" but it doesent work anyway!

I appreciate your help dude!

@florianrisi for element click selector it is better to select next button if there is one so scraper doesn't go back to already scraped page. Your link selector wasn't working because you had to select it as _parent_. I changed your element click selector and everything should work fine now.

Try this sitemap:

{"_id":"lavvvvv","startUrl":["https://j2c-com.com/bourget17/catalogueWeb/publication.php"],"selectors":[{"id":"pagination","type":"SelectorElementClick","selector":"div.col-xs-12 div.row","parentSelectors":["_root"],"multiple":true,"delay":"2100","clickElementSelector":"span.bouton:contains("»")","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"exposant","type":"SelectorLink","selector":"div.col-xs-12 > a","parentSelectors":["pagination"],"multiple":false,"delay":""},{"id":"nom","type":"SelectorText","selector":"div.col-xs-12 div.colorP","parentSelectors":["exposant"],"multiple":false,"regex":"","delay":0},{"id":"email","type":"SelectorText","selector":"a.colorP:nth-of-type(1)","parentSelectors":["exposant"],"multiple":false,"regex":"","delay":0}]}

BAM ! Thank you, it's crazy.
Just a change for the click selector : span.bouton:contains('»')

Thank you again

Trying to get all the scores to show up, it clicks through to the second page, but only gets the data for the first. I tried playing with the delays but it doesn't pick it up no matter how high I set it.

{
"_id": "osu",
"startUrl": ["https://osu.ppy.sh/u/10198475"],
"selectors": [{
"id": "accuracy",
"type": "SelectorText",
"selector": "td > div.h",
"parentSelectors": ["showMore"],
"multiple": false,
"regex": "[0-9]+\.[0-9]+",
"delay": "0"
}, {
"id": "song",
"type": "SelectorText",
"selector": "td > div.h a",
"parentSelectors": ["showMore"],
"multiple": false,
"regex": "",
"delay": "0"
}, {
"id": "TopRanks",
"type": "SelectorElementClick",
"selector": "div#leader.expanded",
"parentSelectors": ["_root"],
"multiple": false,
"delay": "1000",
"clickElementSelector": "td#_leader.sectionHeading",
"clickType": "clickOnce",
"discardInitialElements": false,
"clickElementUniquenessType": "uniqueText"
}, {
"id": "diff",
"type": "SelectorText",
"selector": "td > div.h",
"parentSelectors": ["showMore"],
"multiple": false,
"regex": "\[[a-zA-Z'., ]+\]",
"delay": "0"
}, {
"id": "mods",
"type": "SelectorText",
"selector": "td > div.h b",
"parentSelectors": ["showMore"],
"multiple": false,
"regex": " \+[A-Z,]+",
"delay": "0"
}, {
"id": "link",
"type": "SelectorElementAttribute",
"selector": "td > div.h a",
"parentSelectors": ["showMore"],
"multiple": false,
"extractAttribute": "href",
"delay": "0"
}, {
"id": "showMore",
"type": "SelectorElementClick",
"selector": "div.prof-beatmap",
"parentSelectors": ["TopRanks"],
"multiple": true,
"delay": "2000",
"clickElementSelector": "div#more-performance-0 a",
"clickType": "clickMore",
"discardInitialElements": false,
"clickElementUniquenessType": "uniqueText"
}, {
"id": "fullPP",
"type": "SelectorText",
"selector": "div.pp-display",
"parentSelectors": ["showMore"],
"multiple": false,
"regex": "[0-9]+",
"delay": "0"
}, {
"id": "weight",
"type": "SelectorText",
"selector": "div.pp-display-weight",
"parentSelectors": ["showMore"],
"multiple": false,
"regex": "[0-9]+%",
"delay": "0"
}
]
}

I think I may have a similar issue and would really appreciate some help.
The site I'm scraping has a table showing 25 results. Each result is a link to a separate page with information that I'm scraping. I need to scrape as many of the 804 available pages that I can. I'm able to scrape the first page with 25 results, but script ends after that and doesn't advance to the next page with results 26 through 50. Thanks!

{"_id":"kalkaska_working","startUrl":["http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo"],"selectors":[{"id":"Content Block","type":"SelectorElement","selector":"section#content","parentSelectors":["Parcel Number Links"],"multiple":false,"delay":0},{"id":"Jurisdiction","type":"SelectorText","selector":" div.lrcmenutext + h3 + table tr:nth-child(1) td:nth-child(2)","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Owner Name","type":"SelectorText","selector":" div.lrcmenutext + h3 + table tr:nth-child(2) td:nth-child(2)","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Prop Address","type":"SelectorText","selector":" div.lrcmenutext + h3 + table tr:nth-child(3) td:nth-child(2)","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Owner Address","type":"SelectorText","selector":" div.lrcmenutext + h3 + table tr:nth-child(4) td:nth-child(2)","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Current Taxable Value","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(1) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"School District","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(2) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Current Assessment","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(3) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Current SEV","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(4) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Current PRE","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(5) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Property Class","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(6) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Property Sale Information","type":"SelectorText","selector":"table + div + h3 + div.PDBlegalarea","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Tax Desc","type":"SelectorText","selector":"h3 ~ h3 ~ h3 ~ h3 + div.PDBlegalarea","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Parcel Number Links","type":"SelectorLink","selector":"table tbody a.PDBlistlink","parentSelectors":["_root","Pagination"],"multiple":true,"delay":0},{"id":"Pagination","type":"SelectorElementClick","selector":"tr[class*="PDBListRow"]","parentSelectors":["_root"],"multiple":true,"delay":0,"clickElementSelector":"a.DBVpagelink[href*="next"]","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Acreage","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(7) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0}]}

I can' import your sitemap (Json is not valid).

hmm, I'm not getting an error when I try to import this (below). Also, my start url should be http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo . I think you have to hit this page (http://maps.kalkaskacounty.net/propertysearch.asp) and then type 0 to 9999999999 in the address field and click search.

{"_id":"kalkaska_working","startUrl":["http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo"],"selectors":[{"id":"Content Block","type":"SelectorElement","selector":"section#content","parentSelectors":["Parcel Number Links"],"multiple":false,"delay":0},{"id":"Jurisdiction","type":"SelectorText","selector":" div.lrcmenutext + h3 + table tr:nth-child(1) td:nth-child(2)","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Owner Name","type":"SelectorText","selector":" div.lrcmenutext + h3 + table tr:nth-child(2) td:nth-child(2)","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Prop Address","type":"SelectorText","selector":" div.lrcmenutext + h3 + table tr:nth-child(3) td:nth-child(2)","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Owner Address","type":"SelectorText","selector":" div.lrcmenutext + h3 + table tr:nth-child(4) td:nth-child(2)","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Current Taxable Value","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(1) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"School District","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(2) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Current Assessment","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(3) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Current SEV","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(4) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Current PRE","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(5) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Property Class","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(6) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Property Sale Information","type":"SelectorText","selector":"table + div + h3 + div.PDBlegalarea","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Tax Desc","type":"SelectorText","selector":"h3 ~ h3 ~ h3 ~ h3 + div.PDBlegalarea","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0},{"id":"Parcel Number Links","type":"SelectorLink","selector":"table tbody a.PDBlistlink","parentSelectors":["_root","Pagination"],"multiple":true,"delay":0},{"id":"Pagination","type":"SelectorElementClick","selector":"tr[class*=\"PDBListRow\"]","parentSelectors":["_root"],"multiple":true,"delay":0,"clickElementSelector":"a.DBVpagelink[href*=\"next\"]","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Acreage","type":"SelectorText","selector":"div.DBVdisclaimer + h3 + table tr:nth-child(7) td:nth-child(2) ","parentSelectors":["Content Block"],"multiple":false,"regex":"","delay":0}]}

I have a similar issue.

I´m trying to scrape a page with static URL,i mean that when a click firs in one category that i want to scrape te URL is: http://www.dscomponentes.es/web/articulo/familia.php?id=79&selSubFamilia=214&selSubSubFamilia=531

But when I paginate manually between pages the URL changes an then remains static In all the pagination,all pages that contains the products have the same URL no matter what page I am,the url is the following: http://www.dscomponentes.es/web/articulo/familia.php?id=79

So this is my current sitemap.It has a loop for pagination...but it goes into the page and it doesen´t scrape anything...as soon as it enters it exits without any data scraped!

{"_id":"dscomponentes","startUrl":["http://www.dscomponentes.es/web/articulo/familia.php?id=79"],"selectors":[{"id":"Links","type":"SelectorLink","selector":"div.pr_imagen a","parentSelectors":["_root","pagination2"],"multiple":true,"delay":"2000"},{"id":"pagination2","type":"SelectorElementClick","selector":"div.numerosProductos:nth-of-type(5), div.numerosProductos a","parentSelectors":["_root","pagination2"],"multiple":true,"delay":"2000","clickElementSelector":"div.numerosProductos a","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"DescripciĂłn","type":"SelectorText","selector":"div#descPP","parentSelectors":["Links"],"multiple":true,"regex":"","delay":0},{"id":"Precio","type":"SelectorText","selector":"div.precio","parentSelectors":["Links"],"multiple":true,"regex":"","delay":0},{"id":"Referencia","type":"SelectorText","selector":"div.REFERENCIA","parentSelectors":["Links"],"multiple":true,"regex":"","delay":0}]}

I will appreciate so much the help!

@Jaajh and @Ithalik unfortunately it is not possible to scrape both of these sites because the pages reload after changing them and element click selector "loses its state" and can't continue to traverse through pagination.

So If the web scraper extension doesen´t work for this project:What other program should I use to make the job done?

Thanks for the answer!

issue resolved thank you