Not Scraping Pages Beyond 1st on Google Maps Businesses

Hi! Based off of the reply by @leemeng here PAGINATION fail for Google Maps Business DETAILS? I created the sitemap below with an extended delay and selector that works well for the next page. The scraper goes through all pages as it should very well but it doesn't record any business name/website on any page except the first one. Also, this is less important since I can manually fix it but the names are all recorded individually and then all the sites – it would be nice to have them all grouped name and site on the same row for each business... Thank you in advance!

{"_id":"WORKING_1_PAGE","startUrl":["translation companies montreal - Google Search]];tbs:lrf:!1m4!1u3!2m2!3m1!1e1!1m4!1u2!2m2!2m1!1e1!2m1!1e2!2m1!1e3!3sIAE,lf:1,lf_ui:14"],"selectors":[{"id":"name","parentSelectors":["_root","next-page"],"type":"SelectorText","selector":"div.dbg0pd","multiple":true,"regex":""},{"id":"website","parentSelectors":["_root","next-page"],"type":"SelectorLink","selector":"a.L48Cpd","multiple":true,"linkType":"linkFromHref"},{"id":"next-page","parentSelectors":["_root"],"type":"SelectorElementClick","clickElementSelector":"#pnnext span","clickElementUniquenessType":"uniqueText","clickType":"clickMore","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"#pnnext span"}]}

Hi, the selectors were a bit messed up. See the sitemap below for reference

{"_id":"WORKING_1_PAGE","startUrl":["https://www.google.com/search?q=translation+companies+montreal&sca_esv=d25cc1ac68398341&biw=1440&bih=426&tbm=lcl&sxsrf=ACQVn09SK04rk9fLLjLtz3iSB85t0WrKwQ%3A1708190893223&ei=rezQZbOKDceo5NoPwPu5sAs&ved=0ahUKEwjzlJj78rKEAxVHFFkFHcB9DrYQ4dUDCAk&uact=5&oq=translation+companies+montreal&gs_lp=Eg1nd3Mtd2l6LWxvY2FsIh50cmFuc2xhdGlvbiBjb21wYW5pZXMgbW9udHJlYWwyBRAAGIAEMgYQABgWGB4yBhAAGBYYHjILEAAYgAQYigUYhgMyCxAAGIAEGIoFGIYDMgsQABiABBiKBRiGA0jxYVDbM1jiYHAEeACQAQCYAaABoAHdE6oBBDI3LjS4AQPIAQD4AQHCAgQQIxgnwgIKEAAYgAQYigUYQ8ICCxAAGIAEGIoFGJECwgIREAAYgAQYigUYkQIYsQMYgwHCAg4QABiABBiKBRiRAhjJA8ICEBAAGIAEGIoFGEMYsQMYgwHCAgsQABiABBiKBRiSA8ICEBAAGIAEGBQYhwIYsQMYgwHCAhQQABiABBiKBRiRAhixAxiDARjJA8ICCxAAGIAEGLEDGIMBwgINEAAYgAQYFBiHAhixA8ICCBAAGIAEGLEDwgIIEAAYgAQYyQPCAggQABgWGB4YD4gGAQ&sclient=gws-wiz-local#rlfi=hd:;si:;mv:%5B%5B45.533272499999995,-73.5381242%5D,%5B45.459381699999994,-73.69291989999999"],"selectors":[{"id":"next-page","paginationType":"auto","parentSelectors":["_root","next-page"],"selector":"a#pnnext","type":"SelectorPagination"},{"id":"name","multiple":false,"parentSelectors":["listing"],"regex":"","selector":"div.dbg0pd","type":"SelectorText"},{"id":"website","linkType":"linkFromHref","multiple":false,"parentSelectors":["listing"],"selector":"a.L48Cpd","type":"SelectorLink"},{"id":"listing","multiple":true,"parentSelectors":["next-page"],"selector":"[id*=\"tsuid_\"]:not(:contains(\"Sponsored\"))","type":"SelectorElement"}]}

To display data points in the same row, you have to arrange them as child selectors of a wrapper selector with 'multiple' option checked.

1 Like

Your sitemap worked for a few weeks but it's no longer working! It seems like the layout is changing after the first page.. the listings show pictures and you have to click on each to get a sidewindow with the website link. How could it be adjusted or how can I get the first page google mybusiness layout format with the website links directly visible and scrapable on the next pages? It only switches when I run a scrape. When I just browse myself it doesn't change as if they know i am scraping.

Thank you very much in advance!

Scrapable normal layout i see when browing manually:

New layout that happens when i start the scrape on the second page:

Hi,

Please check if this setup works for you:

{"_id":"WORKING_1_PAGE1","startUrl":["https://www.google.com/search?q=optometrist+toronto&sca_esv=d25cc1ac68398341&biw=1968&bih=847&tbm=lcl&ei=K6kXZoi3F-CrwPAP-ragsAw&ved=0ahUKEwjI-MjI6LmFAxXgFRAIHXobCMYQ4dUDCAk&uact=5&oq=optometrist+toronto&gs_lp=Eg1nd3Mtd2l6LWxvY2FsIhNvcHRvbWV0cmlzdCB0b3JvbnRvMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgYQABgWGB5IoSxQAFjxJnAAeACQAQCYAeYDoAGoIKoBCjAuMTEuNC4yLjK4AQPIAQD4AQGYAhOgAt4gwgILEAAYgAQYigUYkQLCAgoQABiABBiKBRhDwgIHEAAYgAQYCpgDAJIHCjAuMTEuNC4yLjKgB5Zm&sclient=gws-wiz-local#rlfi=hd:;si:;mv:[[43.787736699999996,-79.30042440000001],[43.629986699999996,-79.4244966]];tbs:lrf:!1m4!1u3!2m2!3m1!1e1!1m4!1u2!2m2!2m1!1e1!2m1!1e2!2m1!1e3!3sIAE,lf:1,lf_ui:10"],"selectors":[{"id":"next-page","paginationType":"auto","parentSelectors":["_root","next-page"],"selector":"a#pnnext","type":"SelectorPagination"},{"id":"name","multiple":false,"parentSelectors":["listing"],"regex":"","selector":"div.dbg0pd","type":"SelectorText"},{"id":"website","linkType":"linkFromHref","multiple":false,"parentSelectors":["listing"],"selector":"a:contains(\"Website\")","type":"SelectorLink"},{"id":"listing","multiple":true,"parentSelectors":["next-page"],"selector":"[id*=\"tsuid_\"]:not(:contains(\"Sponsored\"))","type":"SelectorElement"}]}

It didn't work. I'm still only getting the names of the companies scraped because the layout is the one with the pictures...

The layout issue seems to be location-specific. If you have a VPN, try opening the site with, let's say, a German proxy.

Do you have the correct layout? I tried logging out and having a VPN with many locations and it didn't show the original one that is scrapable. It now directly goes to the new one actually without even the first page being scrapable. Is there a way to adjust the sitemap to work on this layout? I appreciate your help very much @JanAp !!

Hi,

You can try this setup:

{"_id":"WORKING_1_PAGE2","startUrl":["https://www.google.com/search?q=optometrist+toronto&sca_esv=d25cc1ac68398341&biw=1968&bih=847&tbm=lcl&ei=K6kXZoi3F-CrwPAP-ragsAw&ved=0ahUKEwjI-MjI6LmFAxXgFRAIHXobCMYQ4dUDCAk&uact=5&oq=optometrist+toronto&gs_lp=Eg1nd3Mtd2l6LWxvY2FsIhNvcHRvbWV0cmlzdCB0b3JvbnRvMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgYQABgWGB5IoSxQAFjxJnAAeACQAQCYAeYDoAGoIKoBCjAuMTEuNC4yLjK4AQPIAQD4AQGYAhOgAt4gwgILEAAYgAQYigUYkQLCAgoQABiABBiKBRhDwgIHEAAYgAQYCpgDAJIHCjAuMTEuNC4yLjKgB5Zm&sclient=gws-wiz-local#rlfi=hd:;si:;mv:[[43.787736699999996,-79.30042440000001],[43.629986699999996,-79.4244966]];tbs:lrf:!1m4!1u3!2m2!3m1!1e1!1m4!1u2!2m2!2m1!1e1!2m1!1e2!2m1!1e3!3sIAE,lf:1,lf_ui:10"],"selectors":[{"id":"next-page","paginationType":"auto","parentSelectors":["_root","next-page"],"selector":"a#pnnext","type":"SelectorPagination"},{"id":"name","multiple":false,"parentSelectors":["listing"],"regex":"","selector":"[data-attrid=\"title\"]","type":"SelectorText"},{"extractAttribute":"href","id":"website","multiple":false,"parentSelectors":["listing"],"selector":"a:contains(\"Website\")","type":"SelectorElementAttribute"},{"clickActionType":"real","clickElementSelector":"#search [id*=\"tsuid_\"]:not(:contains(\"Sponsored\")) [role=\"heading\"]","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":3000,"discardInitialElements":"discard-when-click-element-exists","id":"listing","multiple":true,"parentSelectors":["next-page"],"selector":"body","type":"SelectorElementClick"}]}

Thank you so much, this works for now!