Okay, I have had some very intermittent success with Tweetdeck
There are two versions of sitemap I've done using undocumented scroll feature from here Scroller does not work on certain websites - #2 by martins
Version one, using "general" selector
{"_id":"showtime8-select-through-chirp","startUrl":["https://tweetdeck.twitter.com/"],"selectors":[{"id":"Scrolley","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.js-chirp-container","multiple":true,"delay":5000,"scrollElementSelector":"div.js-column-scroller"},{"id":"Textey","type":"SelectorText","parentSelectors":["Scrolley"],"selector":"div.js-tweet","multiple":true,"regex":"","delay":0}]}
version two
{"_id":"test-after-single-shorter","startUrl":["https://tweetdeck.twitter.com/"],"selectors":[{"id":"Scrolley","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.js-tweet","multiple":true,"delay":2000,"scrollElementSelector":"div.js-column-scroller"},{"id":"Textey","type":"SelectorText","parentSelectors":["Scrolley"],"selector":"parent","multiple":true,"regex":"","delay":0}]}
Neither works quite right, and both exhibit following problems:
-
while scrolling works, part of the tweets don't get "grabbed" by the scraper, and process seems very hit-or-miss.
This seems to be related to the fact that both twitter web and twitter tweetdeck unload material as you scroll, so tweets "out of sight" are lost
-
very few tweets are generally captured
likely related to previous problem
-
the scraping process terminates spontaneously 5-15 seconds in. Not sure why that happens, but scrolling through all 40 000 tweets seems solidly out of reach
After this I am quite lost and would appreciate any help
EDITED TO ADD:
Tweetdeck is configured to single column, which is configured to show my own timeline (add column -> user->pick your own account)