Facebook Pages / Places

Nathaaan · July 13, 2021, 6:19am

Hi all,

I tried to scrape the data from facebook pages but I can't. I want to retrieve the name, address, email address, site and number of followers.

When I try to scrape the data from the index of the page it gets mixed up. For example, I have the page category rather than the email address. I think it is because some pages did not fill in this information.

I tried to use the click element to access the "about" sections but this doesn't work either.

If someone was successful and could share their Sitemap with me that would be really cool.

Thx a lot !

Start URL : https://www.facebook.com/search/places?q=bar&filters=eyJlbmFibGVfcGxhY2VfbG9jYXRpb25faWRzOjAiOiJ7XCJuYW1lXCI6XCJwbGFjZV9sb2NhdGlvblwiLFwiYXJnc1wiOlwiMTA5OTUwNjE5MDI3NzQzXCJ9In0%3D

Sitemap:
{id:"sitemap code"}

world33 · October 3, 2022, 12:08am

I am looking for the same solution. Up to now I have managed to create this sitemap. It is not perfect because it adds some parts of numbers to emails and web addresses but better than nothing. If anyone can improve it I would appreciate it.

{"_id":"facebookpagestemplate","startUrl":["https://www.facebook.com/nsuase.sibstrin","https://www.facebook.com/nsuem.rus","https://www.facebook.com/NSUFlorida","https://www.facebook.com/nsuheadoffice","https://www.facebook.com/nsuniv","https://www.facebook.com/nsuniversity.official","https://www.facebook.com/NSURiverHawks","https://www.facebook.com/nta.isny","https://www.facebook.com/NTCManila","https://www.facebook.com/nthu.tw","https://www.facebook.com/ntnu.no","https://www.facebook.com/NTNU.Taiwan","https://www.facebook.com/NTPU1949","https://www.facebook.com/ntu.edu.iq","https://www.facebook.com/ntu.xpi","https://www.facebook.com/ntuanews1017","https://www.facebook.com/NTUB1917"],"selectors":[{"id":"pagetitle","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span.ircgss63,span.kcqno65y","type":"SelectorText"},{"id":"likes","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"a.rtxb060y.innypi6y:nth-of-type(1),div.jcxyg2ei:nth-of-type(4) span.b6ax4al1","type":"SelectorText"},{"id":"followers","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"a.rse6dlih:nth-of-type(2),div:nth-of-type(5) .t7p7dqev span.k1z55t6l.pbevjfx6","type":"SelectorText"},{"id":"address","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div.f36a8esv","type":"SelectorText"},{"id":"phone","multiple":false,"parentSelectors":["_root"],"regex":"\\(?\\+[0-9]{1,3}\\)? ?-?[0-9]{1,3} ?-?[0-9]{3,5} ?-?[0-9]{4}( ?-?[0-9]{3})? ?(\\w{1,10}\\s?\\d{1,6})?","selector":"div.svm27lag","type":"SelectorText"},{"id":"email","multiple":false,"parentSelectors":["_root"],"regex":"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])","selector":"div.svm27lag","type":"SelectorText"},{"id":"website","multiple":false,"parentSelectors":["_root"],"regex":"https?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{1,256}\\.[a-zA-Z0-9()]{1,6}\\b([-a-zA-Z0-9()@:%_\\+.~#?&//=]*)","selector":"div.svm27lag","type":"SelectorText"},{"id":"pagelogourl","multiple":false,"parentSelectors":["_root"],"selector":".lcfup58g a.o9erhkwx","type":"SelectorLink"},{"id":"pagecoverphotourl","multiple":false,"parentSelectors":["_root"],"selector":".nuz1ool1.lq84ybu9 img.bdao358l","type":"SelectorImage"},{"id":"about","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".km253p1d > div.p8bdhjjv:contains(\"About\"),div[data-pagelet='ProfileTilesFeed_0']:contains(\"Intro\")","type":"SelectorText"}]}