An irregular webshop

The site has different depth levels (2-4 levels deep). Furthermore, in some places it does not show pagination and in some places it does (some times with and some times without ellipsis). On top of that when I enter a subcategory, all products are listed, but some of those products can be in fact product groups containing 2 or more products.
At the moment it just browses through the categories and some pagination pages, and stops (not one product). When I manually go through the project with "data preview" - it displays the data nicely.

Url: http://www.msb-srbija.com/index.php?cat=5616 (this is the test case, after that I will just expand one level above)

Sitemap:
{"_id":"msb_test_4","startUrl":["http://www.msb-srbija.com/index.php?cat=5616"],"selectors":[{"id":"group_simple","type":"SelectorLink","selector":"div.cat_box:nth-of-type(n+2) a.link","parentSelectors":["_root","group_simple","paginator"],"multiple":true,"delay":0},{"id":"product_simple","type":"SelectorLink","selector":"div.product_box > table > tbody > tr:nth-of-type(1) td","parentSelectors":["group_simple","paginator"],"multiple":true,"delay":0},{"id":"paginator","type":"SelectorLink","selector":"a.shadow","parentSelectors":["group_simple","paginator"],"multiple":true,"delay":0},{"id":"element","type":"SelectorElement","selector":"table.shadow:has(table.shadow):not(:has(td h3))","parentSelectors":["product_simple","group_flat"],"multiple":false,"delay":0},{"id":"title","type":"SelectorText","selector":"h2","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"art_no","type":"SelectorText","selector":"td > table > tbody > tr > td > span","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"description","type":"SelectorHTML","selector":"tr:nth-of-type(5) td","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"image","type":"SelectorHTML","selector":"a.lightbox","parentSelectors":["element"],"multiple":true,"regex":"","delay":0},{"id":"group_flat","type":"SelectorLink","selector":"tr.row, tr.rowodd","parentSelectors":["product_simple"],"multiple":true,"delay":0}]}

Thank you in advance

Hi there!

First of all, due to a bug present in version 0.3.7, you can set selector as it's own parent by mistake (which causes recursion). Your sitemap has recursive selector, you got to fix it first.

You've also set various parents beside recursion. Second parent is set to follow pagination. If there's more categories with pages, you add another pagination selector and set it as second parent to an items to follow it.
I'm just trying to explain the logic.

Now for the 'right' selectors and picking stuff right. Sometimes you will have to use Browser Element Select tool (Ctrl + Shift + C), just because if you try to pick a link on this particular website, it will trigger while being selected, causing you to pick wrong selector or not pick anything at all. I guess it will be fixed in future releases.

Another workaround is to select one of the items, and look if selector has CSS selector in it :nth-of-items(number) in it, and then removing or editing it to get all the items or just one.

For the rest part, everything can be set up just as shown in this tutorial (its on main WebScraper website):

I've built and tested an example sitemap (based on above mentioned tutorial):

{"_id":"msb-srbija","startUrl":["http://www.msb-srbija.com/index.php?cat=5616"],"selectors":[{"id":"categories","type":"SelectorLink","selector":"div.cat_box:nth-of-type(2) a.link","parentSelectors":["_root"],"multiple":true,"delay":"0"},{"id":"items","type":"SelectorLink","selector":"div.shadow:nth-of-type(1) tr:nth-of-type(2) a.link","parentSelectors":["categories","pagination"],"multiple":true,"delay":"0"},{"id":"txt","type":"SelectorText","selector":"h2","parentSelectors":["items"],"multiple":false,"regex":"","delay":0},{"id":"prc","type":"SelectorText","selector":"span.price","parentSelectors":["items"],"multiple":false,"regex":"","delay":0},{"id":"pagination","type":"SelectorLink","selector":"a.shadow","parentSelectors":["categories"],"multiple":true,"delay":0}]}

It will enter into 'round saw' category, and pick every first item in the list on each page. Just to see that everything works fine.

I hope I've helped.

Hi Iconoclast,
thank you for the help. Your input was helpful. It did work how you described (to show 1st product of the 1st category). Also, when I expanded to show all the products of the 1st category it still worked. However, when I expanded the category selector it started giving bad results.
To some categories (the one with 5 pages of products) it would not give products of the second page (even though he opened the page) and in the case of the "round saws" it would retrieve only the products of the first page. I've tried changing selectors and all of them proved to be working in the testing phase (data and element preview), but neither of the made any improvements.
I've also expanded the delay between requests and it changed nothing.
For some reason I still can't retrieve products of all of the categories.

Regarding the tutorial video: I've watched it several times and that was precisely the video where I've learned to use pagination element as parent to it self for cases when there is ellipsis and not all links are present when the first page is loaded. On the previous project that I've worked it proved to be the only way of getting all of the items in the category. Still, while I was testing your proposed solution I did not use a "self-parent" any more.