hello, im wanting to scrape this entire profile. it includes the title, size and the magnet link. it has over 6000 pages though, which i think crashes the process and prevents me from scraping it all and thus nothing is saved. is there a technique or method that can scrape 500 or so pages at a time, save the data, scrape 500 more, add the additional data etc? or is there a way to remove performance limits? i have pretty decent amount of resources so that would be unlikely to cause any issues.
i saw this post, but unfortunately it didnt have any responses to it.
Url: https://torrentgalaxy.to/profile/TGxTV/torrents/6562
i'm doing the pagination backwards for simplicity as doing it recent to old pages causes the "next" page button to change positions, so It was just simpler this way.
Sitemap:
{"_id":"torrentgalaxytgxtv","startUrl":["https://torrentgalaxy.to/profile/TGxTV/torrents/6562"],"selectors":[{"id":"pagination","parentSelectors":["_root","pagination"],"paginationType":"clickMore","selector":".tab-pane > nav li:nth-of-type(1) a:contains(\"Previous\")","type":"SelectorPagination"},{"id":"Title","parentSelectors":["Row"],"type":"SelectorText","selector":"a b","multiple":false,"regex":""},{"id":"Magnet","parentSelectors":["Row"],"type":"SelectorElementAttribute","selector":".tgxtablecell a[role]","multiple":false,"extractAttribute":"href"},{"id":"Size","parentSelectors":["Row"],"type":"SelectorText","selector":"span.badge.txlight","multiple":false,"regex":""},{"id":"Row","parentSelectors":["pagination"],"type":"SelectorElement","selector":"div.tgxtablerow:nth-of-type(n+2)","multiple":true}]}