Scraping Captions on YouTube is impossible now... right?

tfac · November 23, 2024, 6:35pm

As of August 2024, YouTube updated it's page content loading such that if you attempt to scrape captions by fetching the content of a video page from a server, there will be no captions available. This would be a shut and done case IF it wasn't also true that scrapers still function from MY LOCAL ENVIRONMENT

There is a node package called youtube-caption-scraper which just does a simple fetch on the HTML content of a video page, pulls the language of choice (or auto-generated captions) and returns it. This package works great if I'm running the code from my own PC, but doesn't work when run from deployed code somewhere.

ALSO I can do a normal fetch from a script locally without any packages and see the caption text right there in the resulting data. So my question stands... is it really impossible to scrape from an automated app/server? I've tried:

Running the script from a raspberry pi to emulate a local environment (didn't work)
Manipulating my headers when sending the request to make YouTube think I'm a PC and not a server (didn't work)
Using a YouTube video downloading library (youtube-dl-exec) to try and only extract the subtitles .vtt file (worked, but got rate limited after 5 tries)

Any ideas from a different perspective are appreciated, I've banged my head enough over this.