Creating a list of startups; getting null returns while previews work fine

I'm trying to scrape all the startup names, the about info, and the information from the left side bar from this page - https://lehub.web.bpifrance.fr/search?page=1

When i look at the data preview, everything shows up perfectly well. However, the scraping run returns a "null" for all the columns :frowning:

Moreover, the pagination ends up repeating all the names. I tried putting in https://lehub.web.bpifrance.fr/search?page=[1-30] but it doesnt seem to work as well.

I'm lost here any help would be much appreciated.

Sitemap:
{"_id":"lehub","startUrl":["https://lehub.web.bpifrance.fr/search"],"selectors":[{"id":"Link","type":"SelectorLink","parentSelectors":["_root","Pagination"],"selector":"a.sc-jbKcbu","multiple":true,"delay":0},{"id":"about","type":"SelectorElement","parentSelectors":["Link"],"selector":"div.sc-iELTvK","multiple":true,"delay":0},{"id":"Address","type":"SelectorText","parentSelectors":["about"],"selector":".sc-kfGgVZ > span","multiple":false,"regex":"","delay":0},{"id":"Tech","type":"SelectorText","parentSelectors":["about"],"selector":"div:nth-of-type(3) div.sc-eXEjpC","multiple":false,"regex":"","delay":0},{"id":"Domain","type":"SelectorText","parentSelectors":["about"],"selector":"div:nth-of-type(4) div.sc-eXEjpC","multiple":false,"regex":"","delay":0},{"id":"Market","type":"SelectorText","parentSelectors":["about"],"selector":"div:nth-of-type(5) div.sc-ibxdXY","multiple":false,"regex":"","delay":0},{"id":"BusinessModel","type":"SelectorText","parentSelectors":["about"],"selector":"div:nth-of-type(6) div.sc-eXEjpC","multiple":false,"regex":"","delay":0},{"id":"Presence","type":"SelectorText","parentSelectors":["about"],"selector":"div:nth-of-type(7) div.sc-eXEjpC","multiple":false,"regex":"","delay":0},{"id":"Info","type":"SelectorText","parentSelectors":["Link"],"selector":"div.sc-eilVRo","multiple":false,"regex":"","delay":0},{"id":"Pagination","type":"SelectorElementClick","parentSelectors":["_root","Pagination"],"selector":"div.sc-dVhcbM","multiple":false,"delay":"2000","clickElementSelector":".sc-dVhcbM button","clickType":"clickMore","discardInitialElements":"discard","clickElementUniquenessType":"uniqueText"}]}

Hi,

Have a look at this one Data as "null" though fine in preview
This should help.

David

1 Like

Something like this:

{"_id":"lehub","startUrl":["https://lehub.web.bpifrance.fr/search"],"selectors":[{"id":"Link","type":"SelectorLink","parentSelectors":["click-more"],"selector":"a","multiple":true,"delay":0},{"id":"about","type":"SelectorElement","parentSelectors":["Link"],"selector":"body:has(.startup__presentation)","multiple":true,"delay":0},{"id":"Address","type":"SelectorText","parentSelectors":["about"],"selector":".startup__identity i + span","multiple":false,"regex":"","delay":0},{"id":"Tech","type":"SelectorGroup","parentSelectors":["about"],"selector":"h4:contains(\"Technologies\") + div div","delay":0,"extractAttribute":""},{"id":"Domain","type":"SelectorGroup","parentSelectors":["about"],"selector":"h4:contains(\"Métiers\") + div div","delay":0,"extractAttribute":""},{"id":"Market","type":"SelectorGroup","parentSelectors":["about"],"selector":"h4:contains(\"Marchés\") + div div","delay":0,"extractAttribute":""},{"id":"BusinessModel","type":"SelectorGroup","parentSelectors":["about"],"selector":"h4:contains(\"Business Model\") + div div","delay":0,"extractAttribute":""},{"id":"Presence","type":"SelectorGroup","parentSelectors":["about"],"selector":"h4:contains(\"Présence\") + div div","delay":0,"extractAttribute":""},{"id":"Info","type":"SelectorText","parentSelectors":["about"],"selector":".startup__products div:contains(\"Réduire\") > div > div > div + div","multiple":false,"regex":"","delay":0},{"id":"click-more","type":"SelectorElementClick","parentSelectors":["_root"],"selector":".sc-dVhcbM > div > div","multiple":true,"delay":"2000","clickElementSelector":"button[type=\"button\"]:contains(\"Charger plus\")","clickType":"clickMore","discardInitialElements":"discard","clickElementUniquenessType":"uniqueCSSSelector"}]}

The classes are generated, so they are not the same for all of the listing, hence no data scraped.

1 Like