Issues with Shifting Tables During Web Scraping

Hi everyone,

1.I'm facing a problem while scraping a website. The tables keep shifting because they're in different locations each time. Please take a look at the screenshot I've attached. Is there any way to fix this issue?
2. Is it possible to embed the Webscraper.io tool in an HTML file? I need the program to navigate through different links, such as "AIR CONDITIONER COMPRESSOR," "TURBOCHARGER," "GEARBOX," and collect data. If this isn’t possible, is there a way to have it automatically go through these titles on an HTTP page and gather the data? If you have any questions, feel free to ask, and I’ll explain further! :slight_smile:

Thanks in advance!

Url: bmw e90 sprężarka klimatyzacji w Twojej okolicy? Sprawdź kategorię Osobowe

Sitemap:
{"_id":"czesci3","startUrl":["bmw e90 sprężarka klimatyzacji w Twojej okolicy? Sprawdź kategorię Osobowe)","multiple":false,"extractAttribute":"href"},{"id":"URL","parentSelectors":["element-card"],"type":"SelectorLink","selector":"div[data-cy="ad-card-title"] a","multiple":false,"linkType":"linkFromHref"},{"id":"tytul","parentSelectors":["URL"],"type":"SelectorText","selector":"h4.css-1juynto","multiple":false,"regex":""},{"id":"opis","parentSelectors":["URL"],"type":"SelectorText","selector":"div.css-1t507yq","multiple":false,"regex":""},{"id":"info1","parentSelectors":["URL"],"type":"SelectorText","selector":".css-1r0si1e span","multiple":false,"regex":""},{"id":"info2","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(2) p","multiple":false,"regex":""},{"id":"info3","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(3) p","multiple":false,"regex":""},{"id":"info4","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(4) p","multiple":false,"regex":""},{"id":"info5","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(5) p","multiple":false,"regex":""},{"id":"info6","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(6) p","multiple":false,"regex":""},{"id":"info7","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(7) p","multiple":false,"regex":""},{"id":"info8","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(8) p","multiple":false,"regex":""},{"id":"info9","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(9) p","multiple":false,"regex":""},{"id":"info10","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(10) p","multiple":false,"regex":""},{"id":"cena","parentSelectors":["URL"],"type":"SelectorText","selector":".css-1w5u3ie h3","multiple":false,"regex":""},{"id":"ID_ogloszenia","parentSelectors":["URL"],"type":"SelectorText","selector":"span.css-12hdxwj","multiple":false,"regex":""},{"id":"imie","parentSelectors":["URL"],"type":"SelectorText","selector":".css-rnqkz0 h4","multiple":false,"regex":""},{"id":"negocjacja","parentSelectors":["URL"],"type":"SelectorText","selector":"p.css-1uidktz","multiple":false,"regex":""},{"id":"lokalizacja","parentSelectors":["URL"],"type":"SelectorText","selector":"p.css-1cju8pu","multiple":false,"regex":""},{"id":"img","parentSelectors":["URL"],"type":"SelectorGroup","selector":"div[data-cy="adPhotos-swiperSlide"] img","extractAttribute":"src"},{"id":"dodane","parentSelectors":["URL"],"type":"SelectorText","selector":"span.css-19yf5ek","multiple":false,"regex":""}]}

Screenshoots:


I didn't understand what's wrong with your data.... I asked for a screenshot of a web page with exact illustration what is going to be scraped...

@don2010 I need a web scraper to fetch all information about a car part. The problem is that the website OLX.pl sometimes redirects to Otomoto (they are affiliated). When I try to scrape information like the part number, the scraper sometimes captures the generation info instead (as shown in the screenshot). This issue also happens on the OLX website itself. This happens because the table shifts when some information is missing. I want the scraper to leave the part number field empty rather than inserting the generation information. I hope you understand what I mean. Thank you for response.




I renamed the fields to make it easier for you to understand.

{"_id":"czesci6","startUrl":["bmw e90 sprężarka klimatyzacji w Twojej okolicy? Sprawdź kategorię Osobowe)","multiple":false,"extractAttribute":"href"},{"id":"URL","parentSelectors":["element-card"],"type":"SelectorLink","selector":"div[data-cy="ad-card-title"] a","multiple":false,"linkType":"linkFromHref"},{"id":"seller_olx","parentSelectors":["URL"],"type":"SelectorText","selector":"li.css-1r0si1e:nth-of-type(1)","multiple":false,"regex":""},{"id":"price_olx","parentSelectors":["URL"],"type":"SelectorText","selector":"h3.css-90xrc0","multiple":false,"regex":""},{"id":"description_olx","parentSelectors":["URL"],"type":"SelectorText","selector":"div.css-1o924a9","multiple":false,"regex":""},{"id":"localization_olx","parentSelectors":["URL"],"type":"SelectorText","selector":".css-13l8eec > div","multiple":false,"regex":""},{"id":"ID_olx","parentSelectors":["URL"],"type":"SelectorText","selector":"span.css-12hdxwj","multiple":false,"regex":""},{"id":"condition_olx","parentSelectors":["URL"],"type":"SelectorText","selector":"li:nth-of-type(2) p","multiple":false,"regex":""},{"id":"price_otomoto","parentSelectors":["URL"],"type":"SelectorText","selector":"h3.offer-price__number","multiple":false,"regex":""},{"id":"description_otomoto","parentSelectors":["URL"],"type":"SelectorText","selector":"div.ooa-unlmzs","multiple":false,"regex":""},{"id":"localization_otomoto","parentSelectors":["URL"],"type":"SelectorText","selector":".ooa-1botqcq a.e1m6rqv1:nth-of-type(1)","multiple":false,"regex":""},{"id":"condition_otomoto","parentSelectors":["URL"],"type":"SelectorText","selector":"div.ooa-162vy3d:nth-of-type(6)","multiple":false,"regex":""},{"id":"part_number_otomoto","parentSelectors":["URL"],"type":"SelectorText","selector":"div.ooa-162vy3d:nth-of-type(4)","multiple":false,"regex":""}]}

first of all, it is better way to share your sitemap by using a specific button "Preformatted text"
image

second, I can give you an advice if you need to scrape desired field, you should use the following selector:

div[data-testid="advert-details-item"]:contains("Numer referencyjny producenta") p + p

those you use like div.ooa-162vy3d or .ooa-1botqcq a.e1m6rqv1 is not correct as far as these fields may not be scraped on other pages... think about it...

Thank u for help, it work perfectly. Do you have any idea how I can scrape several titles at once? I'm thinking of something like this:
https://forum.webscraper.io/uploads/default/original/2X/a/a7443aadecbf75c6cd3dd60407ad100f1b7f466f.png
Thank you for response. If you need me to explain this more clearly, let me know.

Hi, I think you should use "Grouped" type to scrape at once similar selectors...
image

I mean that I want to scrape, for example, turbochargers (5 pages), then air conditioning compressors (10 pages), engines (20 pages), etc., so that I don't have to manually start the scraper separately for turbochargers, air conditioning compressors, engines, and so on. I want to scrape multiple titles automatically at once.

just copy proper URLs for these categories and add them into start URL... that's it