Help with Scrape Data with Links

Hi guys,

Can someone help me with getting just the data that has link.

Like in the screenshot.

Here is the page link: BrokerSnapshot - MSP STAR TRANSPORTATION LLC

Here is the sitemap that I use, it's for scraping by position, how can I edit the sitemap so it can gather just the data that has links that are with blue mark like on the screenshots.

Thanks.

Sitemap:

{"_id":"Brokersnapshot3","startUrl":["https://brokersnapshot.com/SearchCompanies/Advanced?new=true&new-date=2024-08-01&limit=100","https://brokersnapshot.com/SearchCompanies/Advanced?new=true&new-date=2024-08-01&limit=100&page=2","https://brokersnapshot.com/SearchCompanies/Advanced?new=true&new-date=2024-08-01&limit=100&page=3","https://brokersnapshot.com/SearchCompanies/Advanced?new=true&new-date=2024-08-01&limit=100&page=4"],"selectors":[{"id":"company","parentSelectors":["pagination"],"type":"SelectorLink","selector":"td:nth-of-type(5) div:nth-of-type(1) a","multiple":true,"linkType":"linkFromHref"},{"id":"email","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(1) td:has([title=\"Email\"]) a","multiple":true,"regex":""},{"id":"email2","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(2) td:has([title=\"Email\"]) a","multiple":false,"regex":""},{"id":"email3","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(3) td:has([title=\"Email\"]) a","multiple":false,"regex":""},{"id":"email4","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(4) td:has([title=\"Email\"]) a","multiple":false,"regex":""},{"id":"cellphone","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(1) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"cellphone2","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(2) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"cellphone3","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(3) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"cellphone4","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(4) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"phone1","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(1) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"phone2","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(2) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"phone3","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(3) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"phone4","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(4) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"Contact Name1","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(1) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"Contactname2","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(2) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"contactname3","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(3) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"contactname4","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(4) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"email5","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(5) td:has([title=\"Email\"]) a","multiple":false,"regex":""},{"id":"email6","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(6) td:has([title=\"Email\"]) a","multiple":false,"regex":""},{"id":"email7","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(7) td:has([title=\"Email\"]) a","multiple":false,"regex":""},{"id":"email8","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(8) td:has([title=\"Email\"]) a","multiple":false,"regex":""},{"id":"email9","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(9) td:has([title=\"Email\"]) a","multiple":false,"regex":""},{"id":"email10","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Email\"]):nth-of-type(10) td:has([title=\"Email\"]) a","multiple":false,"regex":""},{"id":"cellphone5","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(5) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"cellphone6","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(6) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"cellphone7","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(7) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"cellphone8","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(8) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"cellphone9","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(9) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"cellphone10","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Cell Phone\"]):nth-of-type(10) td:has([title=\"Cell Phone\"]) a","multiple":false,"regex":""},{"id":"phone5","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(5) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"phone6","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(6) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"phone7","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(7) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"phone8","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(8) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"phone9","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(9) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"phone10","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Phone\"]):nth-of-type(10) td:has([title=\"Phone\"]) a","multiple":false,"regex":""},{"id":"contactname5","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(5) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"contactname6","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(6) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"contactname7","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(7) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"contactname8","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(8) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"contactname9","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(9) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"contactname10","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(10) td:has([title=\"Contact Name\"]) a","multiple":false,"regex":""},{"id":"pagination","parentSelectors":["_root","pagination"],"paginationType":"auto","type":"SelectorPagination","selector":".paginator a:contains(\"»\")"},{"id":"Operating Status","parentSelectors":["company"],"type":"SelectorText","selector":"#OperatingStatusHtml span","multiple":false,"regex":""},{"id":"Total Number of Trucks","parentSelectors":["company"],"type":"SelectorText","selector":"span#TOT_TRUCKS","multiple":false,"regex":""},{"id":"DBA Name","parentSelectors":["company"],"type":"SelectorText","selector":".five div:nth-of-type(6) .ten span","multiple":false,"regex":""},{"id":"City","parentSelectors":["company"],"type":"SelectorText","selector":".large div.sub","multiple":false,"regex":""},{"id":"contactname11","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(11) td:has([title=\"Contact Name\"]) a\t","multiple":false,"regex":""},{"id":"caontact name2","parentSelectors":["company"],"type":"SelectorText","selector":"tr:has([title=\"Contact Name\"]):nth-of-type(12) td:has([title=\"Contact Name\"]) a\t","multiple":false,"regex":""},{"id":"Added","parentSelectors":["company"],"type":"SelectorText","selector":"div:nth-of-type(10) .eleven span","multiple":false,"regex":""},{"id":"Changed","parentSelectors":["company"],"type":"SelectorText","selector":"div:nth-of-type(11) .eleven span","multiple":false,"regex":""}]}```

Hi, have you tried:

tr:has([href])

Hi, thanks for reply.

What do I need to change here: tr:has([title="Email"]):nth-of-type(1) td:has([title="Email"]) a

also is it possible not to be based on position but to be if there is Email with link to scrape, same with phone number and name ?

Thanks.

You can try this:

tr:has([title="Email"]):has([href]) a

Sorry, I cannot test if it works since the website requires a login.

This is working like charm.

Thank you !!

One more quick question, how can I modify it if there is like 2 emails with links, or names to gather both of them?

Is it possible ?

Thanks once again

This should work with Multiple checked or using the Grouped selector.

Thanks for the help, appreciate it.

Hi,

Sorry to bother one more question, how can I add one more selection so it can click on the inspection selector on the page like on the screenshot.

Here is one link for example.

Also my updated sitemap:

{"_id":"Brokersnapshot33","startUrl":["https://brokersnapshot.com/SearchCompanies/Advanced?cargo-transported=34&min-units=2&max-units=3&new=true&new-date=2024-04-01","https://brokersnapshot.com/SearchCompanies/Advanced?cargo-transported=34&min-units=2&max-units=3&new=true&new-date=2024-04-01&page=2","https://brokersnapshot.com/SearchCompanies/Advanced?cargo-transported=34&min-units=2&max-units=3&new=true&new-date=2024-04-01&page=3"],"selectors":[{"id":"company","linkType":"linkFromHref","multiple":true,"parentSelectors":["pagination"],"selector":"td:nth-of-type(5) div:nth-of-type(1) a","type":"SelectorLink"},{"id":"pagination","paginationType":"auto","parentSelectors":["_root","pagination"],"selector":".paginator a:contains(\"»\")","type":"SelectorPagination"},{"extractAttribute":"","id":"Contact Name","parentSelectors":["company"],"selector":"tr:has([title=\"Contact Name\"]):has([href]) a","type":"SelectorGroup"},{"extractAttribute":"","id":"email","parentSelectors":["company"],"selector":"tr:has([title=\"Email\"]):has([href]) a","type":"SelectorGroup"},{"extractAttribute":"","id":"Cell Phone","parentSelectors":["company"],"selector":"tr:has([title=\"Cell Phone\"]):has([href]) a","type":"SelectorGroup"},{"extractAttribute":"","id":"Phone Number","parentSelectors":["company"],"selector":"tr:has([title=\"Phone\"]):has([href]) a","type":"SelectorGroup"},{"id":"Operating Status","multiple":false,"parentSelectors":["company"],"regex":"","selector":"#OperatingStatusHtml span","type":"SelectorText"},{"id":"City","multiple":false,"parentSelectors":["company"],"regex":"","selector":".large div.sub","type":"SelectorText"},{"id":"DBA Name","multiple":false,"parentSelectors":["company"],"regex":"","selector":".five div:nth-of-type(6) .ten span","type":"SelectorText"},{"id":"Added","multiple":false,"parentSelectors":["company"],"regex":"","selector":"div:nth-of-type(10) .eleven span","type":"SelectorText"},{"id":"Changed","multiple":false,"parentSelectors":["company"],"regex":"","selector":"div:nth-of-type(11) .eleven span","type":"SelectorText"},{"id":"Truck","multiple":false,"parentSelectors":["company"],"regex":"","selector":"#details-row > div.six.wide.computer.sixteen.wide.column > div > div > div:nth-child(2) > div:nth-child(2)","type":"SelectorText"},{"id":"Tractor","multiple":false,"parentSelectors":["company"],"regex":"","selector":"#details-row > div.six.wide.computer.sixteen.wide.column > div > div > div:nth-child(3) > div:nth-child(2) > span","type":"SelectorText"},{"id":"Trailers","multiple":false,"parentSelectors":["company"],"regex":"","selector":"#details-row > div.six.wide.computer.sixteen.wide.column > div > div > div:nth-child(4) > div:nth-child(2) > span","type":"SelectorText"}]}```

Hi, a simple link selector with the value a[href*="/Inspections"] should do

Hi,

Sorry for my late reply, but it wont click the inspection tab. I need it click so I can gather some data from there as well.

The click on the Inspections opens a new unique URL thus it is recommended to use the Link selector instead of a click.