Forum Scrape Runs Short

I'm trying to scrape through a forum for a research project and I seem to be able to capture a small amount of 'topics' and their 'replies' whilst paginating through the list of topics based on a search. I'm saving to the legacy CouchDB on this (in case thats the things thats ruining this)

Essentially, it appears to paginate through and open topics, but it just stops suddenly and I'm not getting anywhere near the amount of posts/replies etc I was expecting. This may be something to do with parents/child selectors maybe? any help appreciated, cheers guys.

Url: Search results for 'School' - Family Lives forum

Sitemap:
{"_id":"familylives","startUrl":["https://familylives.forumcommunity.co.uk/search?keywords=School&action=doSearch&search=true&searchin=Topics&unfiltered_forums=&member=&threadid=&do="],"selectors":[{"delay":0,"id":"TopicSelectorLink","multiple":true,"parentSelectors":["_root"],"selector":".post-body .post-body-author a","type":"SelectorLink"},{"id":"paginationTopic","paginationType":"auto","parentSelectors":["_root","TopicSelectorLink","paginationTopic"],"selector":"a.pagination-next-page","type":"SelectorPagination"},{"delay":0,"id":"TopicTitle","multiple":false,"parentSelectors":["paginationTopic"],"regex":"","selector":"span.editable","type":"SelectorText"},{"delay":0,"id":"topicRepliesText","multiple":true,"parentSelectors":["TopicSelectorLink","paginationTopic"],"regex":"","selector":"span#post_message_1322612102, span .post-body-content > span","type":"SelectorText"}]}

Hi @kmccart.

It seems like the issue lies in the sitemap selector setup. By the order, the pagination selector should always be first and the reply text seems to be functional only for one of the pages.

Updated sitemap example:

{"_id":"familylives","startUrl":["https://familylives.forumcommunity.co.uk/search?keywords=School&action=doSearch&search=true&searchin=Topics&unfiltered_forums=&member=&threadid=&do="],"selectors":[{"id":"pagination","paginationType":"auto","parentSelectors":["_root","pagination"],"selector":"a.pagination-next-page","type":"SelectorPagination"},{"delay":0,"id":"TopicTitle","multiple":false,"parentSelectors":["reply-page"],"regex":"","selector":"span.editable","type":"SelectorText"},{"delay":0,"id":"topicRepliesText","multiple":false,"parentSelectors":["first-post-wrapper","other-posts"],"regex":"","selector":"div.post-body-content","type":"SelectorText"},{"delay":0,"id":"topic-link","multiple":true,"parentSelectors":["pagination"],"selector":".post-body .post-body-author a","type":"SelectorLink"},{"delay":0,"id":"first-post-wrapper","multiple":true,"parentSelectors":["reply-page"],"selector":".first-post ","type":"SelectorElement"},{"delay":0,"id":"other-posts","multiple":true,"parentSelectors":["reply-page"],"selector":"#main_posts_container div.unSelectableRow","type":"SelectorElement"},{"delay":0,"id":"reply-page","multiple":true,"parentSelectors":["topic-link"],"selector":"body:has(span.category-of-topic )","type":"SelectorElement"}]}

Hi @ViestursWS,

Many many thanks for taking a look for me, its much appreciated. I have set this away and all seems to working well. I will take a closer look at your structure to figure out where I went wrong initially. Thanks for the tip regarding pagination is always first.