How to limit a Twitter scrape

DP76 · March 13, 2018, 9:12pm

Hi guys,

I've created a sitemap for a Twitter search but I'm having trouble limiting the number of results that are scraped. Ideally I'd like to limit the scrape to a couple of hundred tweets but the scrape continues to load new tweets.

I've tried using div.tweet:not(:nth-child(n+200)) and div.tweet:nth-child(-n+200) in the selector field but neither of these work. Anyone got any ideas?

Sitemap:
{"_id":"twitter-new","startUrl":["https://twitter.com/search?l=en&q=trump%20exclude%3Aretweets%20exclude%3Areplies%20since%3A2017-09-05%20until%3A2017-09-06&src=typd"],"selectors":[{"id":"Tweet","type":"SelectorElementScroll","selector":"div.tweet:not(:nth-child(n+20))","parentSelectors":["_root"],"multiple":true,"delay":"3000"},{"id":"Username","type":"SelectorText","selector":"span.username","parentSelectors":["Tweet"],"multiple":false,"regex":"","delay":0},{"id":"Tweet text","type":"SelectorText","selector":"p.TweetTextSize","parentSelectors":["Tweet"],"multiple":false,"regex":"","delay":0},{"id":"Tweet URL","type":"SelectorElementAttribute","selector":"parent","parentSelectors":["Tweet"],"multiple":false,"extractAttribute":"data-permalink-path","delay":0},{"id":"Retweet","type":"SelectorText","selector":"span.js-retweet-text","parentSelectors":["Tweet"],"multiple":false,"regex":"","delay":0},{"id":"Date","type":"SelectorText","selector":"small.time","parentSelectors":["Tweet"],"multiple":false,"regex":"","delay":0}]}

DP76 · March 14, 2018, 8:49am

Alternatively, is there a way to stop a scrape and export the data found up until then?

KristapsWS · March 14, 2018, 10:49am

Change your "Tweet" selector to li.stream-item:nth-of-type(-n+200)

DP76 · March 14, 2018, 11:58am

Sweet! Worked a charm. Thanks a lot!