Scraper a Page - But stops after page 10

mysterdee888 · March 10, 2024, 1:05pm

Im trying to scrape a games website , so i can download all the torrents without having to manually click them all.

I have looked at a few scripts but im very new to Python so im opting out of using python and using websraper.io again - the main reason being i need to be ableto put the data into a SQL database also. and being very new to python im not sure how i owuld do this at present, So yes, definatly website.io for this job.

However when i scrape this website, after about 10 pages it stops scraping and starts cycling through the result pages but not scraping data from the actuall listings on those pages

Im sure this is well known issue to regular uses of webscraper.io so im hoping maybe somebody knows how i can fix this so i can scrape beyond 10 pages.

im assuming its a Verification popup maybe or some other coding done by the developer of the website to stop scraping. i dont know im very new to this

here is the code and website im trying to scrap.

this isnt really a matter of the Json code im using though, this is a server issue i think.

maybe tampermonkey is needed.

Eitherway any help would be greatly appreciated, even if just links to some info.

thanks again, your all great.

website -> All My Repacks, A-Z - FitGirl Repacks

and JSON Code

`

{"_id":"FitGirl","startUrl":["https://fitgirl-repacks.site/all-my-repacks-a-Z/?lcp_page[0-85]#lcp_instance_0"],"selectors":[{"id":"linker","linkType":"linkFromHref","multiple":false,"parentSelectors":["wrapper"],"selector":"a","type":"SelectorLink"},{"id":"title","multiple":false,"parentSelectors":["linker"],"regex":"","selector":"h1.entry-title","type":"SelectorText"},{"id":"image","multiple":false,"parentSelectors":["linker"],"selector":"img.alignleft","type":"SelectorImage"},{"id":"file info","multiple":false,"parentSelectors":["linker"],"regex":"","selector":"strong:nth-of-type(4)","type":"SelectorText"},{"id":"info","multiple":false,"parentSelectors":["linker"],"regex":"","selector":"div.su-spoiler-content","type":"SelectorText"},{"id":"catagory","multiple":false,"parentSelectors":["linker"],"regex":"","selector":"x","type":"SelectorText"},{"id":"dateadded","multiple":false,"parentSelectors":["linker"],"regex":"","selector":"time","type":"SelectorText"},{"id":"video","multiple":false,"parentSelectors":["linker"],"regex":"https://www.youtube.com/embed/[^\" ]+","selector":"div.fluid-width-video-wrapper","type":"SelectorHTML"},{"id":"soundcloud","multiple":false,"parentSelectors":["linker"],"regex":"","selector":"x","type":"SelectorText"},{"id":"mp3","multiple":false,"parentSelectors":["linker"],"regex":"https://cdn-prd.sounds.com[^\" ]+","selector":".dleaudioplayer","type":"SelectorHTML"},{"id":"wrapper","multiple":true,"parentSelectors":["_root"],"selector":".lcp_catlist li","type":"SelectorElement"},{"extractAttribute":"href","id":"homepage","multiple":false,"parentSelectors":["linker"],"selector":".entry-content > div > p:nth-of-type(1) a","type":"SelectorElementAttribute"},{"id":"price","multiple":false,"parentSelectors":["linker"],"regex":"","selector":"x","type":"SelectorText"},{"id":"genre","multiple":false,"parentSelectors":["linker"],"regex":"","selector":"p strong:nth-of-type(1)","type":"SelectorText"},{"extractAttribute":"href","id":"torrent","multiple":false,"parentSelectors":["linker"],"selector":"li a:nth-of-type(3)","type":"SelectorElementAttribute"},{"id":"bestseller","multiple":false,"parentSelectors":["linker"],"regex":"","selector":"x","type":"SelectorText"}]}

`

don2010 · March 10, 2024, 8:47pm

A little advice: try to scan all pages to retrieve URLs you need.
Afterwards, you can visit each page to scrape what you want. Don't make a huge task to scrape everything at once....
Here is your URLs file...
Here is your correct pagination setup:

{"_id":"FitGirl","startUrl":["https://fitgirl-repacks.site/all-my-repacks-a-z/?ref=driverlayer.com.&lcp_page0=1#lcp_instance_0"],"selectors":[{"id":"link","linkType":"linkFromHref","multiple":true,"parentSelectors":["next"],"selector":"ul.lcp_catlist a","type":"SelectorLink"},{"id":"next","paginationType":"auto","parentSelectors":["_root","next"],"selector":"a.lcp_nextlink","type":"SelectorPagination"}]}

Now all you need is to use different pastebin web sites to post all your URLs and to scrape what you need. You have almost all you need in your sitemap...

mysterdee888 · March 10, 2024, 10:32pm

ok and so i use this in webscraper.io

the xlx file ?

don2010 · March 10, 2024, 11:38pm

you can use these URLs using service: https://pastelink.net/
you should rewrite your sitemap to collect all data from your URLs.

mysterdee888 · March 11, 2024, 3:41am

i understand what your saying, it does make sense, i will look into this.