Facebook pages scraping

Hello, since Facebook changed to the New Page Experience and removed the possibility to retrieve some public data with their updated API the only way to retrieve those information is by scraping them (I used to use a nice little free software called Facepager to retrieve those API data in the past).
I have now managed to create the Facebook pages scraping template below to retrieve Page title, location, some individual About/Intro information (phone, email, website), page logo, page cover photo and the entire About/Intro section. For the About/Intro information section (phone, email, website) I had to use regex I found online because Facebook does not use unique identifiers within that section other than the gray icons image urls
phone
email
webasite

Unfortunately the regex are not perfect and in some records add part of the phone numbers to the emails (ex. 6634press.ntu.kpi@gmail.comkpi.kharkov.ua) or website addresses (ex. http://www.ntpu.edu.tw/+886). Moreover it only extracts website addresses that uses http/https at the beginning leaving out those that start with www.
My questions are:

  1. Is there any way to take advantage of the gray icons/images unique filenames next to each data type (phone, email, website etc.) in the About/Intro sections to retrieve properly those information without having to use imperfect regex formulas?
  2. If not possible how would you modify the regex to avoid to get part of the phone numbers added to the emails or websites or to include websites that do not contain http/https?
  3. Is there anyway to add paragraph or manual break lines in the last about selector to divide different type of information? At the moment it scrapes all content due to the lack of unique identifiers and manual break lines and it appears like this:
    Intro Офіційна сторінка Національного технічного університPage · University2, Kyrpychova Str., Kharkiv, Ukraine+380 57 707 6634press.ntu.kpi@gmail.comkpi.kharkov.uaRating · 4.8 (37 reviews)Suggest Edits
    I would like it to appear like this instead:
    Intro
    Офіційна сторінка Національного технічного університ
    Page · University
    2, Kyrpychova Str., Kharkiv, Ukraine
    +380 57 707 6634
    press.ntu.kpi@gmail.com
    kpi.kharkov.ua
    Rating · 4.8 (37 reviews)
    Suggest Edits
{"_id":"facebookpagestemplate","startUrl":["https://www.facebook.com/nsuase.sibstrin","https://www.facebook.com/nsuem.rus","https://www.facebook.com/NSUFlorida","https://www.facebook.com/nsuheadoffice","https://www.facebook.com/nsuniv","https://www.facebook.com/nsuniversity.official","https://www.facebook.com/NSURiverHawks","https://www.facebook.com/nta.isny","https://www.facebook.com/NTCManila","https://www.facebook.com/nthu.tw","https://www.facebook.com/ntnu.no","https://www.facebook.com/NTNU.Taiwan","https://www.facebook.com/NTPU1949","https://www.facebook.com/ntu.edu.iq","https://www.facebook.com/ntu.xpi","https://www.facebook.com/ntuanews1017","https://www.facebook.com/NTUB1917"],"selectors":[{"id":"pagetitle","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span.ircgss63,span.kcqno65y","type":"SelectorText"},{"id":"likes","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"a.rtxb060y.innypi6y:nth-of-type(1),div.jcxyg2ei:nth-of-type(4) span.b6ax4al1","type":"SelectorText"},{"id":"followers","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"a.rse6dlih:nth-of-type(2),div:nth-of-type(5) .t7p7dqev span.k1z55t6l.pbevjfx6","type":"SelectorText"},{"id":"address","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div.f36a8esv","type":"SelectorText"},{"id":"phone","multiple":false,"parentSelectors":["_root"],"regex":"\\(?\\+[0-9]{1,3}\\)? ?-?[0-9]{1,3} ?-?[0-9]{3,5} ?-?[0-9]{4}( ?-?[0-9]{3})? ?(\\w{1,10}\\s?\\d{1,6})?","selector":"div.svm27lag","type":"SelectorText"},{"id":"email","multiple":false,"parentSelectors":["_root"],"regex":"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])","selector":"div.svm27lag","type":"SelectorText"},{"id":"website","multiple":false,"parentSelectors":["_root"],"regex":"https?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{1,256}\\.[a-zA-Z0-9()]{1,6}\\b([-a-zA-Z0-9()@:%_\\+.~#?&//=]*)","selector":"div.svm27lag","type":"SelectorText"},{"id":"pagelogourl","multiple":false,"parentSelectors":["_root"],"selector":".lcfup58g a.o9erhkwx","type":"SelectorLink"},{"id":"pagecoverphotourl","multiple":false,"parentSelectors":["_root"],"selector":".nuz1ool1.lq84ybu9 img.bdao358l","type":"SelectorImage"},{"id":"about","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".km253p1d > div.p8bdhjjv:contains(\"About\"),div[data-pagelet='ProfileTilesFeed_0']:contains(\"Intro\")","type":"SelectorText"}]}

Thank you for any help or improvement you might suggest.

Interesting problem. Assuming they always use the same icon images (which seems to be the case), you could narrow down the correct div with the :has selector, no regex needed. So for the email, something like:
div > ul > div:has(img[src*='W4m-1QXtJyK.png'])

Example sitemap and results:

{"_id":"facebook-test-2023","startUrl":["https://www.facebook.com/NSUFlorida","https://www.facebook.com/nsuase.sibstrin","https://www.facebook.com/nta.isny","https://www.facebook.com/NTCManila"],"selectors":[{"id":"Name","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div[role='main'] div > div span h1","type":"SelectorText"},{"id":"Phone","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div > div > div > div > div:nth-child(2) > div > ul > div:has(img[src*='7KDVc3hw483.png']) span","type":"SelectorText"},{"id":"Website","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div > div > div > div > div:nth-child(2) > div > ul > div:has(img[src*='DzX7o-tOmJ6.png']) span","type":"SelectorText"},{"id":"Email","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div > div > div > div > div:nth-child(2) > div > ul > div:has(img[src*='W4m-1QXtJyK.png']) span","type":"SelectorText"}]}

Note: Some of those pages are lacking email or other info. It is not because of the sitemap.