Hi all: just trying to get this to work one time for this one thing!

I realize this is likely explained in a guide somewhere but I've read around and I'm just not understanding something... I also have no foreseeable use for the tool outside of this one single thing. I'm trying to take the Chinese sentences off of the lessons from www.immersivechinese.com - there's a Github explaining how to do it here but at the web scraping step it just says, "scrape the website." There are files that seem to be direct instructions for what to put in the sitemap. One looks like

{"_id":"pronounciation","startUrl":["https://console.immersivechinese.com/pronunciation"],"selectors":[{"delay":0,"id":"lesson","multiple":true,"parentSelectors":["_root"],"selector":"a.list-group-item","type":"SelectorLink"},{"delay":0,"id":"exercise","multiple":true,"parentSelectors":["lesson"],"selector":"div.main-swipe-slide","type":"SelectorElement"},{"delay":0,"id":"pinyin","multiple":false,"parentSelectors":["exercise"],"regex":"","selector":"label.show_reveal","type":"SelectorText"},{"delay":0,"id":"description","multiple":false,"parentSelectors":["exercise"],"regex":"","selector":"div.lesson_note_div","type":"SelectorText"},{"delay":0,"extractAttribute":"data-audio-fast","id":"audioFastUrl","multiple":false,"parentSelectors":["exercise"],"selector":"parent","type":"SelectorElementAttribute"},{"delay":0,"extractAttribute":"data-audio-slow","id":"audioSlowUrl","multiple":false,"parentSelectors":["exercise"],"selector":"parent","type":"SelectorElementAttribute"},{"delay":0,"extractAttribute":"id","id":"number","multiple":false,"parentSelectors":["exercise"],"selector":"parent","type":"SelectorElementAttribute"}]}

for example. But no matter how I've tried putting it in I just can't get it working. I just want to get these cards out to help my Chinese study, Chinese is more than enough headache as it is :laughing:

Hi,

Could you elaborate on the purpose of these selectors:

image

Hmm.

I only have a strong idea about the audioSlowUrl - there are three voices options; female, male, and slowed down. I would be happy without the slowed down audio (as long as lacking it doesn't mess something up in the automated card building process later). You can see the first lesson here Immersive Chinese - Simple Chinese Spoken Clearly - the main console where you can see all the lessons is just main console and if you're unable to access something I'm happy to share my log-in in a PM.

There is no fast audio per se (that would be absurd) so I assume that must have something to do with turning the slow audio back off.

Number surely has something to do with labeling the lessons in order, because the Github code is going to automate turning this scraped data into flashcards, and it's important that they go in order because the lessons progress from basic words into sentences and eventually stories.

When you open that first lesson you see pinyin: wǒ character: 我 meaning: Me, and at the top you see "1/25" - in the Absolute Beginner set of lessons you just opened there are 25 words (progressing into sentences), you move between them by clicking the > arrow next to the audio play button in the center, and maybe it's important to know that the URL doesn't change when you click through them. I don't know of anywhere else the lessons are numbered, so as far as I know it has to be looking for the 1/25.