Pagination works but only last page content is saved

The scraper is going through the pages all fine but somehow only the last page's content shows up in the data (I get one row of output).

Url: http://principlesofaccounting.com/chapter-1/

Sitemap:
{"_id":"principlesofaccounting","startUrl":["https://www.principlesofaccounting.com/chapter-1/"],"selectors":[{"id":"text","type":"SelectorHTML","selector":"article.single-page-content","parentSelectors":["_root","paging"],"multiple":false,"regex":"","delay":0},{"id":"paging","type":"SelectorLink","selector":"li.next a","parentSelectors":["_root","paging"],"multiple":false,"delay":0},{"id":"title","type":"SelectorText","selector":"h1","parentSelectors":["_root","paging"],"multiple":false,"regex":"","delay":0}]}

Alternatively, this startUrl can be used (so it will only go through the last two pages): https://www.principlesofaccounting.com/chapter-24/compound-interest/

Hi!

You have to tick multiple on your pagination selector.

What it looks like on my end:


39 PM

You have to use element selector next to recursive link selector if you are scraping only text. Here is the updated sitemap:

{"_id":"principlesofaccounting2","startUrl":["https://www.principlesofaccounting.com/chapter-1/"],"selectors":[{"id":"text","type":"SelectorHTML","parentSelectors":["element"],"selector":"article.single-page-content","multiple":false,"regex":"","delay":0},{"id":"title","type":"SelectorText","parentSelectors":["element"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"paging","type":"SelectorLink","parentSelectors":["_root","paging"],"selector":"li.next a","multiple":true,"delay":0},{"id":"element","type":"SelectorElement","parentSelectors":["_root","paging"],"selector":"body","multiple":true,"delay":0}]}

Excellent, thank you! Could you explain to me why this is, though, or provide me with a link that explains it further? Does that mean if I were to include an image it would work just fine? :thinking:

Is there a way to scrape elements on a page based on some condition? For example, every time the <h1> on the page includes "Chapter", scrape a link with a specific class/ID from the same page.

Also, is it normal that the order of rows appears to be random after scraping?

thank you very much. Recursion is powerful