Hi guys,
So i set away my first scraping job of a forum using the interactive selector and it seems to be grabbing everything i want.
Basically:
Topic Pages 1-n > then within each topic > Replies pages 1-n
it captures around 179,000 posts of the 220,000 topics/replies but i don't know where its going wrong with the remaining missed posts. when i preview the data it seems to highlight all the right stuff. I paste my sitemap and details below. Im saving to a couchDB as it does not handle it using CSV and i can't seem to figure out how to export it so i cant see where its gone wrong really so that will be another post most likely!
is there anything below that seems to pop out to you that is incorrect?
{"_id":"living3","startUrl":["http://sjogrensworld.org/forums/index.php?PHPSESSID=89f6f7a2480d9312151be0bdc2e3cb3c&board=1.0"],"selectors":[{"id":"paginationSubRootPages","type":"SelectorLink","parentSelectors":["_root","paginationSubRootPages"],"selector":".pagelinks a:nth-of-type(n+2)","multiple":true,"delay":0},{"id":"paginationThread","type":"SelectorLink","parentSelectors":["_root","paginationSubRootPages"],"selector":".windowbg2 span a","multiple":true,"delay":0},{"id":"paginationInThread","type":"SelectorLink","parentSelectors":["paginationThread","paginationInThread"],"selector":"a.navPages","multiple":true,"delay":0},{"id":"Replies","type":"SelectorText","parentSelectors":["paginationThread"],"selector":"div.inner","multiple":true,"regex":"","delay":0}]}
Any guidance appreciated.
Cheers,
Kris