Help with sortlist.de

I am desperately trying to scrape sortlist.de.

What I would like to have: All agencies with all details.
Problems for me:
The start page: https://www.sortlist.de/search? query=%5B%7B%22name%22%3A%22Off+Page+SEO%22%2C%22uuid%22%3A%22c33c3553-9d1f-4cfa-a76f-3decc407ec13%22%7D%2C%7B%22uuid%22%3A%2226ad9e00-8143-40b9-ad0f- cae7c28be38e%22%2C%22name%22%3A%22Linkbuilding%22%7D%2C%7B%22uuid%22%3A%2226ad9e00-8143-40b9-ad0f-cae7c28be38e%22%2C%22name%22%3A%22SEO+Linkbuilding%22%7D%5D

The description here: https://www.sortlist.de/agency/driza

The details of the references in Portfolio (popups and the urls always remain the same): https://www.sortlist.de/agency/driza

Url: http://www.sortlist.de

Sitemap:
{"_id":"sortlist","startUrl":["https://www.sortlist.de/search?query=[{"name"%3A"Off+Page+SEO"%2C"uuid"%3A"c33c3553-9d1f-4cfa-a76f-3decc407ec13"}%2C{"uuid"%3A"26ad9e00-8143-40b9-ad0f-cae7c28be38e"%2C"name"%3A"Linkbuilding"}%2C{"uuid"%3A"26ad9e00-8143-40b9-ad0f-cae7c28be38e"%2C"name"%3A"SEO+Linkbuilding"}]"],"selectors":[{"id":"loadmore","parentSelectors":["_root","loadmore"],"paginationType":"clickMore","type":"SelectorPagination","selector":".btn-primary.btn-sm span.layout-align-start-center"},{"id":"agency","parentSelectors":["loadmore"],"type":"SelectorLink","selector":"a.m-8","multiple":true,"linkType":"linkFromHref"},{"id":"mehranzeigen","parentSelectors":["agency"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":".p-16 a.underline","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":".p-16 a.underline"}]}

Hi,

it seems that opening the agency links via the listing view will not be viable, since the actual URL is generated only after clicking on the listing.

As an alternative, you can scrape all agencies using the sitemap.xml selector:

{"_id":"sortlist","startUrl":["https://www.sortlist.de/"],"selectors":[{"clickActionType":"real","clickElementSelector":".p-16 a.underline","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":0,"discardInitialElements":"do-not-discard","id":"mehranzeigen","multiple":false,"parentSelectors":["sitemap"],"selector":".p-16 a.underline","type":"SelectorElementClick"},{"id":"description","multiple":false,"parentSelectors":["sitemap"],"regex":"","selector":"span.display-block.lh-2","type":"SelectorText"},{"id":"sitemap","parentSelectors":["_root"],"sitemapXmlMinimumPriority":"0.1","sitemapXmlUrlRegex":"","sitemapXmlUrls":["https://www.sortlist.de/sitemaps/3/agencies.xml.gz"],"type":"SelectorSitemapXmlLink"}]}

Thanks ! Looks great.
As I have mentioned before, I am struggling to scrape the website link (“Visit website”) and the details of the references in the “Portfolio”. The problem is on the one hand that these are popups and on the other hand that I can't select the selector...
Sitemap:
{"_id":"sortlist2","startUrl":["https://www.sortlist.de/"],"selectors":[{"clickActionType":"real","clickElementSelector":".p-16 a.underline","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":0,"discardInitialElements":"do-not-discard","id":"mehranzeigen","multiple":false,"parentSelectors":["sitemap"],"selector":".p-16 a.underline","type":"SelectorElementClick"},{"id":"description","multiple":false,"parentSelectors":["sitemap"],"regex":"","selector":"span.display-block.lh-2","type":"SelectorText"},{"id":"sitemap","parentSelectors":["_root"],"sitemapXmlMinimumPriority":"0.1","sitemapXmlUrlRegex":"","sitemapXmlUrls":["https://www.sortlist.de/sitemaps/3/agencies.xml.gz"],"type":"SelectorSitemapXmlLink"},{"id":"services","multiple":true,"parentSelectors":["sitemap"],"regex":"","selector":"div.px-gt-xs-32","type":"SelectorText"},{"id":"languages","multiple":false,"parentSelectors":["sitemap"],"regex":"","selector":"b.text-capitalize","type":"SelectorText"},{"clickActionType":"real","clickElementSelector":"a.small.underline","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","id":"references","multiple":true,"parentSelectors":["sitemap"],"selector":"a.small.underline","type":"SelectorElementClick"}]}

Please provide an example URL where the data is present.

This one: https://www.sortlist.de/agency/driza
Url does not change when the popup appears...

Changed the selector value of the references click to body, otherwise the child elements would be in a different scope.

Added pagination to click through the references.

It appears that the Website can be easily scraped from the Kontakt section

{"_id":"sortlist2","startUrl":["https://www.sortlist.de/"],"selectors":[{"clickActionType":"real","clickElementSelector":".p-16 a.underline","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":0,"discardInitialElements":"do-not-discard","id":"mehranzeigen","multiple":false,"parentSelectors":["sitemap"],"selector":".p-16 a.underline","type":"SelectorElementClick"},{"id":"description","multiple":false,"parentSelectors":["sitemap"],"regex":"","selector":"span.display-block.lh-2, [class=\"p-16 text-break-word\"]","type":"SelectorText"},{"id":"sitemap","parentSelectors":["_root"],"sitemapXmlMinimumPriority":"0.1","sitemapXmlUrlRegex":"","sitemapXmlUrls":["https://www.sortlist.de/sitemaps/3/agencies.xml.gz"],"type":"SelectorSitemapXmlLink"},{"id":"services","multiple":true,"parentSelectors":["sitemap"],"regex":"","selector":"div.px-gt-xs-32","type":"SelectorText"},{"id":"languages","multiple":false,"parentSelectors":["sitemap"],"regex":"","selector":"b.text-capitalize","type":"SelectorText"},{"clickActionType":"real","clickElementSelector":"div.p-8:nth-of-type(1) a.small.underline","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1000,"discardInitialElements":"discard-when-click-element-exists","id":"references","multiple":false,"parentSelectors":["sitemap"],"selector":"body","type":"SelectorElementClick"},{"id":"Beschreibung","multiple":false,"parentSelectors":["pagination"],"regex":"","selector":"div.pt-64","type":"SelectorText"},{"id":"pagination","paginationType":"clickMore","parentSelectors":["references","pagination"],"selector":"[id=\"next-work-btn\"]","type":"SelectorPagination"},{"id":"website","multiple":false,"parentSelectors":["sitemap"],"regex":"","selector":"a.text-truncate, span.btn.small","type":"SelectorText"}]}

This is awesome thanks. I´m still struggling with this:

I am struggling to scrape the website link (“Visit website”) and the details of the references in the “Portfolio”. The problem is on the one hand that these are popups and on the other hand that I can't select the selector...

Any recommendation how I can fix this ?
Thanks!

Hi,

Could you please elaborate on what data is still missing? As far as I can see the data is scraped as you have described.

Sure. As you can see on my sitemap I´m trying to click the case studies and to scrape the popups

To add additional data selectors, open one of the popups and just add selectors with point-and-click, nesting them under 'pagination':

image

Thanks, but I´m not sure how this works :frowning:
I´m trying to open the pop ups with "Element Click". Later on the popup I´m not able to select the selectors.
Sitemap:
{"_id":"sortlist3","startUrl":["https://www.sortlist.de/"],"selectors":[{"id":"mehranzeigen","parentSelectors":["sitemap"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":".p-16 a.underline","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":0,"discardInitialElements":"do-not-discard","multiple":false,"selector":".p-16 a.underline"},{"id":"description","parentSelectors":["sitemap"],"type":"SelectorText","selector":"span.display-block.lh-2, [class="p-16 text-break-word"]","multiple":false,"regex":""},{"id":"sitemap","parentSelectors":["_root"],"type":"SelectorSitemapXmlLink","sitemapXmlMinimumPriority":"0.1","sitemapXmlUrlRegex":"","sitemapXmlUrls":["https://www.sortlist.de/sitemaps/3/agencies.xml.gz"]},{"id":"services","parentSelectors":["sitemap"],"type":"SelectorText","selector":"div.px-gt-xs-32","multiple":true,"regex":""},{"id":"languages","parentSelectors":["sitemap"],"type":"SelectorText","selector":"b.text-capitalize","multiple":false,"regex":""},{"id":"website","parentSelectors":["sitemap"],"type":"SelectorText","selector":"a.text-truncate, span.btn.small","multiple":false,"regex":""},{"id":"referenz","parentSelectors":["sitemap"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":"div.p-8:nth-of-type(1) a.small.underline","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"div.p-8:nth-of-type(1) a.small.underline"}]}

Any idea how I could scrape this ? I can´t select any selector on the popup...

Is it possible to scrape the content of the popups? Thanks!

Hi, yes, with the sitemap I have provided in the previous messages.

Please provide a screen recording of the steps you are struggling with, otherwise it is not clear what the issue is.

Sure ! I don´t know if the selector types are correct for the popup and I can´t select selectors on the Popup:
Screen

Could you see the screen recording ?