Help Me Solve Walmart Server-Side Redirect

Trying to scrape info from Walmart orders.

Problem 1) product URL's all redirect from

to

with only change being "seort/" added in and each product has a different number (not all are "30418541")

See screenshots

Want scraped data from final "seort/" URL page, but doesn't work due to redirect. I'm told it's server-side redirect.

Problem 2) is having trouble keeping all scraped data for Product A on row A, Product B on row B, etc. Data for Product A is ending up on row A and row X. Currently using wrapper and link selector, maybe this is problem since they're separate selectors...don't know how to use just one selector to get data from current page, then click link and get data from that page, while keeping it all on one row.

Screenshots show all the wanted data:
Start URL: link to product page, item name, order qty, total cost
Product Detail Page: item name, Walmart item no, dimensions, delivery, & rest shown in screenshot

Start URL: https://www.walmart.com/account/wmpurchasehistory/?limit=5&startingAt=1626109910&endingAt=1626103911&nav=backward

Sitemap:
{"_id":"small-step-wm-orders-v2","startUrl":["https://www.walmart.com/account/wmpurchasehistory/?limit=5&startingAt=1626109910&endingAt=1626103911&nav=backward"],"selectors":[{"id":"product-wrappers","type":"SelectorElement","parentSelectors":["_root","next-page"],"selector":"div.product-block","multiple":true,"delay":0},{"id":"name","type":"SelectorText","parentSelectors":["product-wrappers"],"selector":"div.LinesEllipsis","multiple":false,"regex":"","delay":0},{"id":"quantity","type":"SelectorText","parentSelectors":["product-wrappers"],"selector":"p","multiple":false,"regex":"","delay":0},{"id":"total price","type":"SelectorText","parentSelectors":["product-wrappers"],"selector":"div.order-info-price-v2","multiple":false,"regex":"","delay":0},{"id":"next-page","type":"SelectorLink","parentSelectors":["_root","next-page"],"selector":".s-margin-left a","multiple":false,"delay":0},{"id":"product-page","type":"SelectorLink","parentSelectors":["_root","next-page"],"selector":"a.product-name","multiple":true,"delay":0},{"id":"detail-page-name","type":"SelectorText","parentSelectors":["product-page"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"wm-item-no","type":"SelectorText","parentSelectors":["product-page"],"selector":"div.copy-mini.display-inline-block","multiple":false,"regex":"","delay":0},{"id":"detail-page-cost","type":"SelectorText","parentSelectors":["product-page"],"selector":"[itemprop='offers'] > div:nth-of-type(1)","multiple":false,"regex":"","delay":0},{"id":"delivery","type":"SelectorText","parentSelectors":["product-page"],"selector":"div.prod-fulfillment:nth-of-type(2) div.prod-fulfillment-messaging-text","multiple":false,"regex":"","delay":0},{"id":"image","type":"SelectorImage","parentSelectors":["product-page"],"selector":"img.prod-hero-image-image","multiple":false,"delay":0},{"id":"qty-available","type":"SelectorElementClick","parentSelectors":["product-page"],"selector":"section.prod-ProductCTA","multiple":false,"delay":2000,"clickElementSelector":".display-inline-block select","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"quantity-available-text-selector","type":"SelectorText","parentSelectors":["qty-available"],"selector":"select","multiple":false,"regex":"","delay":0},{"id":"sold-by","type":"SelectorText","parentSelectors":["product-page"],"selector":"a.seller-name","multiple":false,"regex":"","delay":0},{"id":"dimensions","type":"SelectorText","parentSelectors":["product-page"],"selector":"tr:contains('Assembled Product Dimensions (L x W x H)') div","multiple":false,"regex":"","delay":0},{"id":"weight","type":"SelectorText","parentSelectors":["product-page"],"selector":"tr:contains('Assembled Product Weight') div","multiple":false,"regex":"","delay":0}]}

Willing to pay $20 for someone to get me something that does this. or at least point in right direction.

Thanks

One way would be to scrape purchase history and the detail pages separately (2 sitemaps), then later merge them using Excel, SQL or Python. This would be based on common title in both.

For the URLs, you can generate your own list of redirected URLs, assuming the pattern holds, e.g.
https://www.walmart.com/ip/Roland-Quinoa-Black-Bean-5-46-oz/30418541
needs to be changed to
https://www.walmart.com/ip/seort/30418541

This can be done with a text editor which supports regex, like Notepad++
For the example above, the regex would be:
Find: ip/[^/]+
Replace: ip/seort

1 Like