How to select link if div class names dynamic

There are some web site changes div class names dynamically with each re-load. So do u guys have solution for it ?

1 Like

There are a number of useful selectors to handle this, ^ $ *

2020-08-29_102346

Ref: https://www.w3schools.com/cssref/css_selectors.asp

For for example if you have class = key1234, key3355, key6667 etc you can just use:

div[class^="key']

1 Like

Leemeng's answer works if there is some consistent pattern to use for partial string matching (e.g. all elements/attributes begin with "https" or end with "pdf"), but what if there are no consistent aspects to use for partial string matching?

For example, on eBay's search results page for completed listings, each listing shows the date the auction ended. I am attempting to select the completion date, but I'm being thwarted by the underlying HTML containing a bunch of invisible text.

Here is the page in question: https://www.ebay.com/sch/i.html?_from=R40&_nkw=ryzen+5+3600&_sacat=0&LH_ItemCondition=3000&LH_PrefLoc=1&rt=nc&LH_Sold=1&LH_Complete=1.

If I could depend on the class name remaining consistent, then I would just select span.s-uore92. The problem is, the next time I load the results page, the class names change to some other random string.

Here is a snippet of HTML from that page:

<style type="text/css">
span.s-uore92 {
    display: inline;
}
span.s-lxh0hqr {
    display: none;
}
</style>
...
<span class="POSITIVE" role="text">
   <span class="s-lxh0hqr">X</span>
   <span class="s-lxh0hqr">F</span>
   <span class="s-uore92">S</span>
   <span class="s-lxh0hqr">S</span>
   <span class="s-lxh0hqr">5</span>
   <span class="s-lxh0hqr">0</span>
   <span class="s-lxh0hqr">H</span>
   <span class="s-lxh0hqr">0</span>
   <span class="s-uore92">o</span>
   <span class="s-uore92">l</span>
   <span class="s-lxh0hqr">9</span>
   <span class="s-lxh0hqr">O</span>
   <span class="s-uore92">d</span>
   <span class="s-uore92"> </span>
   <span class="s-uore92"> </span>
   <span class="s-uore92">M</span>
   <span class="s-lxh0hqr">A</span>
   <span class="s-lxh0hqr">O</span>
   <span class="s-lxh0hqr"></span>
   <span class="s-lxh0hqr"></span>
   <span class="s-uore92">a</span>
   <span class="s-uore92">r</span>
   <span class="s-lxh0hqr"></span>
   <span class="s-uore92"> </span>
   <span class="s-uore92">1</span>
   <span class="s-uore92">6</span>
   <span class="s-uore92">,</span>
   <span class="s-uore92"> </span>
   <span class="s-uore92">2021</span>
</span>

I've tried div.s-item__title--tagblock > span.POSITIVE > span[display~='inline'] as my selector (and I've tried with several different strings within the brackets in case my syntax is off), but since the display property is contained in a separate style tag elsewhere in the HTML document, I haven't figured out how to only capture the spans that have display:inline.

I've checked the W3 Schools page that Lee linked to, as well as https://devhints.io/xpath, but I'm stumped.

1 Like

There is definitely an aggressive anti-scraping measure in place here. It's intentionally structured that way to trip up scrapers. I would just grab the whole HTML block for the Sold date, then post-process with Python.

Type: HTML
Selector: div > span.POSITIVE

There are only two span classes active at any time, and only one of them contains the valid date. In the example above, the spans with class="s-uore92" are the valid ones.

1 Like