Organizing French Dictionary Scrape

I am trying to scrape from a French dictionary site to make flashcards, and in order to make my life a whole lot easier, I cannot, for the life of me, figure out how to format the data properly.

Here's what I mean:

Current Progress:

Title | Phonetic | POS | Label | Quote Span | Form | Quote
Title | Phonetic | POS | Label | Quote Span | Form | Another Quote
Title | Phonetic | POS | Label | Quote Span | Another Form | Quote
Title | Phonetic | POS | Label | Quote Span | Another Form | Another Quote

Aim:

Title | Phonetic | POS | Label | Quote Span | Form | Quote | POS | Label Quote Span | Form | Quote etc. etc. with all of the data for one word on one line.

Here's an explanation of the selectors:

URL: https://www.collinsdictionary.com/dictionary/french-english/partir (just an example)

Sitemap:
{"_id":"frenchcollinsdictionarybase","startUrl":["https://pastelink.net/z8mt2ijb"],"selectors":[{"id":"dictionary","parentSelectors":["links"],"type":"SelectorElement","selector":"div.dc","multiple":true,"delay":0},{"id":"title","parentSelectors":["dictionary"],"type":"SelectorText","selector":".h2_entry span","multiple":false,"delay":0,"regex":""},{"id":"phonetic","parentSelectors":["dictionary"],"type":"SelectorText","selector":".form span.pron","multiple":false,"delay":0,"regex":""},{"id":"hom","parentSelectors":["dictionary"],"type":"SelectorElement","selector":"div.hom","multiple":true,"delay":0},{"id":"pos","parentSelectors":["hom"],"type":"SelectorText","selector":"span.pos","multiple":false,"delay":0,"regex":""},{"id":"sense","parentSelectors":["hom"],"type":"SelectorElement","selector":"div.sense","multiple":true,"delay":0},{"id":"label","parentSelectors":["sense"],"type":"SelectorText","selector":"> span.gramGrp, span.lbl","multiple":false,"delay":0,"regex":""},{"id":"quote span","parentSelectors":["sense"],"type":"SelectorText","selector":"> span span.quote","multiple":false,"delay":0,"regex":""},{"id":"divre","parentSelectors":["sense"],"type":"SelectorElement","selector":"div.re","multiple":true,"delay":0},{"id":"form","parentSelectors":["divre"],"type":"SelectorText","selector":"span.form","multiple":false,"delay":0,"regex":""},{"id":"quote","parentSelectors":["divre"],"type":"SelectorText","selector":"span.quote","multiple":false,"delay":0,"regex":""},{"id":"links","parentSelectors":["_root"],"type":"SelectorLink","selector":".body-display a","multiple":true,"delay":0}]}

Hey there, just checking again if anyone knows how to reorganize data this way in any other program?

@rembrandt Hi, if are you looking to extract all of the available titles, labels, quotes, etc. in a single line you should be using the 'Grouped' selector instead. You can also create multiple selector variants by dividing them with a comma.

Learn more: Grouped selector | Web Scraper Documentation

Hey there, I am really sorry for all of the trouble, but I don't think I quite understand.

  • Whenever I use the grouped selector, for instance on the senses, I don't receive any word breaks.

  • When I use commas in the selector, then the data output does not come in rows.

Again, I am very sorry for all of this trouble, and the late reply, I appreciated your magnanimous efforts.

@rembrandt Can you send a practical example of what the final data should be like for any of these words?

This is what the data should look like for the entry "partir":

partir | paʀtiʀ | intransative verb | (aller vers un lieu) | to go | J’aimerais partir quelque part au soleil. | I’d like to go somewhere sunny. | partir en vacances | to go on holiday etc. etc. until Il est parti de Nice à sept heures. | He left Nice at 7 o’clock. then it returns to the next "sense" (quitter un lieu) | to go | to leave | Partez, vous allez être en retard. You should go, or you’ll be late. ⧫ You should leave, or you’ll be late.

etc. ect. until reaching the next sense, at which point the sequence repeats
etc. ect. until reaching the next part of speech (pos), at which point the sequence repeats

And so everything should be on one line.

If I explained it badly, please explain where it was confusing, otherwise, thank you for replying so quickly!

@rembrandt If you are not looking to separate each of the 'divre' elements you can use the 'Grouped' selector instead.

{"_id":"frenchcollinsdictionarybase","startUrl":["https://pastelink.net/z8mt2ijb"],"selectors":[{"delay":0,"id":"dictionary","multiple":true,"parentSelectors":["links"],"selector":"div.dc","type":"SelectorElement"},{"delay":0,"id":"title","multiple":false,"parentSelectors":["dictionary"],"regex":"","selector":".h2_entry span","type":"SelectorText"},{"delay":0,"id":"phonetic","multiple":false,"parentSelectors":["dictionary"],"regex":"","selector":".form span.pron","type":"SelectorText"},{"delay":0,"id":"hom","multiple":true,"parentSelectors":["dictionary"],"selector":"div.hom","type":"SelectorElement"},{"delay":0,"id":"pos","multiple":false,"parentSelectors":["hom"],"regex":"","selector":"span.pos","type":"SelectorText"},{"delay":0,"id":"sense","multiple":true,"parentSelectors":["hom"],"selector":"div.sense","type":"SelectorElement"},{"delay":0,"id":"label","multiple":false,"parentSelectors":["sense"],"regex":"","selector":"> span.gramGrp, span.lbl","type":"SelectorText"},{"delay":0,"id":"quote span","multiple":false,"parentSelectors":["sense"],"regex":"","selector":"> span span.quote","type":"SelectorText"},{"delay":0,"extractAttribute":"","id":"divre","parentSelectors":["sense"],"selector":"div[class=\"cit type-example\"], div[class=\"re type-phr\"]","type":"SelectorGroup"},{"delay":0,"id":"links","multiple":true,"parentSelectors":["_root"],"selector":".body-display a","type":"SelectorLink"}]}

@ViestursWS Yes, I am looking to join the 'divre' elements, but also all of the other elements to fit on one line. This includes the 'pos' to be next to each other rather than descending in a table. Basically, for one word, all of the data should be on one line. I am not sure this is possible, but thank you for trying.