Below is an improved version of the sitemap I previously made. This version includes a limited paginator. In the example below, it will stop at page 8. The paginator is limited by a :not
condition at the end:
div.pagination ul > li:nth-of-type(4):not([p='9'])
You can change p=
to the page number you want to stop at, plus 1. For example, if you want to stop at page 42, then make p=43
.
To completely remove the limiter, just delete the :not
condition, leaving only:
div.pagination ul > li:nth-of-type(4)
Note that scraping may take a very long time without a limiter, or Chrome may run out of RAM and crash.
Sitemap:
{"_id":"forum-elephrame-blm-sept","startUrl":["https://elephrame.com/textbook/BLM/chart"],"selectors":[{"id":"Results wrapper","type":"SelectorElement","parentSelectors":["_root","Limited paginator"],"selector":"div#blm-results","multiple":false,"delay":0},{"id":"Row wrappers","type":"SelectorElement","parentSelectors":["Results wrapper"],"selector":"div.items-list > div.item","multiple":true,"delay":0},{"id":"Location","type":"SelectorText","parentSelectors":["Row wrappers"],"selector":"div.item-protest-location","multiple":false,"regex":"","delay":0},{"id":"Date","type":"SelectorText","parentSelectors":["Row wrappers"],"selector":"div.item-protest-date","multiple":false,"regex":"","delay":0},{"id":"Limited paginator","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div[itemprop='text']","multiple":false,"delay":"3600","clickElementSelector":"div.pagination ul > li:nth-of-type(4):not([p='9'])","clickType":"clickMore","discardInitialElements":"discard-when-click-element-exists","clickElementUniquenessType":"uniqueHTML"},{"id":"Subjects","type":"SelectorText","parentSelectors":["Row wrappers"],"selector":"li.item-protest-subject","multiple":false,"regex":"(?<=Subject\\(s\\)\\: ).+","delay":0},{"id":"Participants","type":"SelectorText","parentSelectors":["Row wrappers"],"selector":"li.item-protest-participants","multiple":false,"regex":"(?<=Participant\\(s\\)\\: ).+","delay":0},{"id":"Description","type":"SelectorText","parentSelectors":["Row wrappers"],"selector":"li.item-protest-description","multiple":false,"regex":" (?<=Description: ).+","delay":0},{"id":"Sources","type":"SelectorText","parentSelectors":["Row wrappers"],"selector":"li.item-protest-url","multiple":false,"regex":"","delay":0},{"id":"From page","type":"SelectorElementAttribute","parentSelectors":["Results wrapper"],"selector":"ul > li[class*='active-page-choice']","multiple":false,"extractAttribute":"p","delay":0}]}