Describe the problem.
Hey All....I am very new here and new to data scraping as well. I am working on a project to collect and create a database of books by different publishers in Marathi Language.
Can someone please help here.
I am trying to scrape data from books sites. each site is very different but with similar sets of data - Books.
I want to extract the
a. Book Title
b. Book Author
c. Book Image
d. Book MRP
e. Book Discounted Price
f. Book Description
I tried doing it multiple times, however it is selecting the data only from page no 1 and leaving all the the other pages. Also for each record it is giving two different lines, capturing MRP in one line and Discounted MRP in other line.
Url:
example 1 - Types of Books
example 2 - https://akshardhara.com/health.html
Sitemap:
{"_id":"Manovikas","startUrl":["https://manovikasprakashan.com/index.php?route=product/category&path=81"],"selectors":[{"id":"pagination","paginationType":"auto","parentSelectors":["_root","pagination"],"selector":"a.ias-trigger","type":"SelectorPagination"},{"id":"element_wrapper","multiple":true,"parentSelectors":["pagination"],"selector":"div.product-thumb","type":"SelectorElement"},{"id":"Book_Link","multiple":true,"parentSelectors":["pagination"],"selector":"div.product-thumb","type":"SelectorLink"},{"id":"Title","multiple":false,"parentSelectors":["element_wrapper"],"regex":"","selector":".name a","type":"SelectorText"},{"id":"MRP","multiple":true,"parentSelectors":["element_wrapper"],"regex":"","selector":"span.price-old","type":"SelectorText"},{"id":"DMRP","multiple":true,"parentSelectors":["element_wrapper"],"regex":"","selector":"span.price-new","type":"SelectorText"},{"id":"Author","multiple":true,"parentSelectors":["Book_Link"],"regex":"","selector":".tags a","type":"SelectorText"}]}