New & Need Help Scraping a site

Hi All,

I am new to Web Scraper and I've spent the last 6 hours trying to work out how to scrape this site. I only need the Art & Graphic categories.

Issues I'm having:
• only scraping two categories (Art & Graphic)
• including the subcategories to the main two
• including the different variables for each product

I have tried making three sitemaps; each one has failed or pulls the wrong date. Please help

Url: https://alwan.com.eg (categories Art and Graphic only)

The easiest way to include only certain categories is to set the category links as start URLs.

See below a reference of how to set up the sitemap to open the categories and open the product links:

{"_id":"alwan-com","startUrl":["https://alwan.com.eg/products?c=4","https://alwan.com.eg/products?c=5"],"selectors":[{"id":"pagination","paginationType":"auto","parentSelectors":["_root","pagination"],"selector":".paging a:contains('»')","type":"SelectorPagination"},{"id":"product-link","linkType":"linkFromHref","multiple":true,"parentSelectors":["pagination"],"selector":"a.photo","type":"SelectorLink"},{"id":"title","multiple":false,"parentSelectors":["product-link"],"regex":"","selector":"h1","type":"SelectorText"},{"id":"price","multiple":false,"parentSelectors":["product-link"],"regex":"","selector":"span#price","type":"SelectorText"},{"id":"image","multiple":false,"parentSelectors":["product-link"],"selector":"img.main","type":"SelectorImage"}]}
1 Like

Thank you. How can I add the product variations like size and color?

Do you just need to fetch all visible variations or click through the variations and scrape data that appears after the click?

I have tried this but its not working

{"_id":"alwan-com","startUrl":["https://alwan.com.eg/products?c=4","https://alwan.com.eg/products?c=5"],"selectors":[{"id":"pagination","parentSelectors":["_root","pagination"],"paginationType":"auto","type":"SelectorPagination","selector":".paging a:contains('»')"},{"id":"product-link","parentSelectors":["pagination"],"type":"SelectorLink","selector":"a.photo","multiple":true,"linkType":"linkFromHref"},{"id":"title","parentSelectors":["product-link"],"type":"SelectorText","selector":"h1","multiple":false,"regex":""},{"id":"price","parentSelectors":["product-link"],"type":"SelectorText","selector":"span#price","multiple":false,"regex":""},{"id":"image","parentSelectors":["product-link"],"type":"SelectorImage","selector":"img.main","multiple":false},{"id":"Short Description","parentSelectors":["product-link"],"type":"SelectorText","selector":"div.brief","multiple":false,"regex":""},{"id":"Product Details","parentSelectors":["product-link"],"type":"SelectorText","selector":"div.product-details","multiple":false,"regex":""},{"id":"Product Photos","parentSelectors":["product-link"],"type":"SelectorImage","selector":".product-photos img","multiple":true},{"id":"Variations","parentSelectors":["product-link"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":"div.form select#color option:not(:contains(\"Select Color\"))","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickMore","delay":500,"discardInitialElements":"discard-when-click-element-exists","multiple":true,"selector":"body"},{"id":"Wrapper","parentSelectors":["Variations"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":"select#order_option1_value","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":500,"discardInitialElements":"discard-when-click-element-exists","multiple":true,"selector":"_parent_"},{"id":"Color","parentSelectors":["Wrapper"],"type":"SelectorText","selector":"div.form select#color [selected]","multiple":true,"regex":""},{"id":"Size","parentSelectors":["Wrapper"],"type":"SelectorText","selector":"select#order_option1_value [selected]","multiple":true,"regex":""}]}

The website is not correctly updating the HTML after the variant click, thus I would recommend scraping the variants like this:

{"_id":"alwan-com2","startUrl":["https://alwan.com.eg/products?c=4","https://alwan.com.eg/products?c=5"],"selectors":[{"id":"pagination","paginationType":"auto","parentSelectors":["_root","pagination"],"selector":".paging a:contains('»')","type":"SelectorPagination"},{"id":"product-link","linkType":"linkFromHref","multiple":true,"parentSelectors":["pagination"],"selector":"a.photo","type":"SelectorLink"},{"id":"title","multiple":false,"parentSelectors":["product-link"],"regex":"","selector":"h1","type":"SelectorText"},{"id":"price","multiple":false,"parentSelectors":["product-link"],"regex":"","selector":"span#price","type":"SelectorText"},{"id":"image","multiple":false,"parentSelectors":["product-link"],"selector":"img.main","type":"SelectorImage"},{"id":"Short Description","multiple":false,"parentSelectors":["product-link"],"regex":"","selector":"div.brief","type":"SelectorText"},{"id":"Product Details","multiple":false,"parentSelectors":["product-link"],"regex":"","selector":"div.product-details","type":"SelectorText"},{"id":"Product Photos","multiple":true,"parentSelectors":["product-link"],"selector":".product-photos img","type":"SelectorImage"},{"id":"Variations-color","multiple":true,"parentSelectors":["product-link"],"regex":"","selector":"div.form select#color option:not(:contains(\"Select color\"), :contains('Choose'))","type":"SelectorText"},{"id":"Variant-size","multiple":true,"parentSelectors":["product-link"],"regex":"","selector":"select#order_option1_value option:not(:contains('Choose'))","type":"SelectorText"}]}
1 Like

Thank you so much your a star. This looks to be working.

1 Like

I just ran the scrap, and it is skipping a lot of products. Do you have any ideas why?

Could it be skipping products because there is no variations?

Don't think so. Can you elaborate on why you think products are skipped? And prove an example URL of a skipped product.

Hi,

I ran the full scrap and I saw things are missing for example " Standardgraph stencils 8355 Nato forces"

I think its because they don't have both variations. How do we add in a zero value if there is no variable

I just ran a test and it worked correctly. Maybe there is some kind of occasional loading lag. Try setting a longer page load delay.

I ran it again and it took more products but still no "Standardgraph stencils 8355 Nato forces" I will try again with a longer page load delay again

Standardgraph stencils 8355 Nato forces is still missing from the data and I'm sure other things are missing if this is. How can we work out whats happening

Are you logged in to the website or performing any other additional steps?

Im not logged in and im using the firefox extention

Just tried it on Firefox and it worked fine. You can try running it on Chrome. Unfortunately, there is not much I can do if I cannot reproduce the issue.