Pull Product ID From HTML

Trying to pull the product id from each product to go along with the product information. The ID is only available in the html apart of the cart information.

Sample of product listing HTML on a category page

<div class="price-box price-final_price" data-role="priceBox" data-product-id="**513983**">
     <span class="price">$225.99</span>
</div>

I need to pull the ID above to add to my data output. How can this be done? Cant post URL for competitive reasons.

Hi,

Yes, you can achieve that by using the Element attribute selector: Element attribute | Web Scraper How To

Hello JanAp,

I dont want to select the product using this attribute. I need to capture the id for each product as apart of my data.

When I watch the tutorial, it does not accomplish this. Did I miss something?

Can you share your sitemap, please?

https://www.....com/batteries/batteries-by-application

Here is a reference on how to use the Element attribute selector:

{"_id":"continentalbattery","startUrl":["https://www.continentalbattery.com/batteries/batteries-by-application"],"selectors":[{"elementLimit":0,"id":"product_wrapper","multiple":true,"parentSelectors":["_root"],"scroll":false,"selector":"li.product-item","type":"SelectorElement"},{"id":"title","multiple":false,"multipleType":"singleColumn","parentSelectors":["product_wrapper"],"regex":"","selector":"a.product-item-link","type":"SelectorText","version":2},{"id":"price","multiple":false,"multipleType":"singleColumn","parentSelectors":["product_wrapper"],"regex":"","selector":"span.price","type":"SelectorText","version":2},{"extractAttribute":"data-product-id","id":"product-id","multiple":false,"multipleType":"singleColumn","parentSelectors":["product_wrapper"],"selector":"div.price-box","type":"SelectorElementAttribute","version":2}]}

Works perfect, I appreciate it!

Once last thing. Will you edit your solution to remove the main URL and ID and replace them with something generic?

Not sure what you mean by that. Can you elaborate?

i dont want the url indexed in google.

{"_id":"xxxxx","startUrl":["https://www.xxxxx.com/batteries/batteries-by-application"],"selectors":[{"elementLimit":0,"id":"product_wrapper","multiple":true,"parentSelectors":["_root"],"scroll":false,"selector":"li.product-item","type":"SelectorElement"},{"id":"title","multiple":false,"multipleType":"singleColumn","parentSelectors":["product_wrapper"],"regex":"","selector":"a.product-item-link","type":"SelectorText","version":2},{"id":"price","multiple":false,"multipleType":"singleColumn","parentSelectors":["product_wrapper"],"regex":"","selector":"span.price","type":"SelectorText","version":2},{"extractAttribute":"data-product-id","id":"product-id","multiple":false,"multipleType":"singleColumn","parentSelectors":["product_wrapper"],"selector":"div.price-box","type":"SelectorElementAttribute","version":2}]}

You mean remove the URL from my message?