I am trying to extract data which are documents in different languages. The structure is like this:
- start url —> year —> pagination —> title —> document in different languages.
What I get in the CSV file is:
Row 1: start url, year, page, title 1, language 1, document 1 in language 1.
Row 2: start url, year, page, title 1, language 2, document 1 in language 2.
Row 3: start url, year, page, title 1, language 3, document 1 in language 3.
Row 4: start url, year, page, title 2, language 1, document 2 in language 1.
Row 5: start url, year, page, title 2, language 2, document 2 in language 2.
...
But I need to get the data like this:
Row 1: start url, year, page, title 1, language 1, document 1 in language 1, language 2, document 1 in language 2, language 3, document 1 in language 3...
Row 2: start url, year, page, title 2, language 1, document 2 in language 1, language 2, document 2 in language 2, language 3, document 2 in language 3...
Start Url: http://w2.vatican.va/content/benedict-xvi/es/homilies.index.html <— this is where I am trying to extract the data.