How can I get the indentation applied to a text element?

I'd like to inform my peers about changes to Microsoft's publicly available Azure Products available by region page.

No problem extracting the whole table.

However, they use indentation to denote subgroupings.

How can I extract the indentation value?

I've attached two screen captures showing how two example elements appear. The first image (001) shows how "Azure Search", which is NOT indented is coded.

The second image, for "Cognitive Search", IS indented. You can see there is a difference in the HTML, as shown on the right panel.

.data-table tr th.indented-header, .fixed-row tr th.indented-header {
padding-left: 26px;
padding-right: 5px;
}

Let me know if you need more information in order to help.

Thanks!

Hugo

Hi there!

If I understand you right, you want the table you scrape to show indentation as shown on a website.
Unfortunately, indentation on a website is just a CSS property, and doesn't affect text inside the cell beside visually moving it.

The only option to display indentation in your scrape is by using JQuery extension, for examle Tampermonkey, so you can redraw particular class (indented-header) with a white space character in the beginning of the line.

@KristapsWS please correct me if I'm wrong.

You can scrape the table by using element selector and element attribute selector. You can actually find table which contains field attributes with product fields, it is layered beneath another table.

You can even select it with table selector if you use table.main-table .

Here is an example sitemap with both methods:

{"_id":"azure_table","startUrl":["https://azure.microsoft.com/en-us/global-infrastructure/services/"],"selectors":[{"id":"element","type":"SelectorElement","selector":"table.main-table tr.toggled","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"Product","type":"SelectorText","selector":"th.row-header","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"Non-Regional","type":"SelectorText","selector":"td.col-1 span.hide-text","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"East_US","type":"SelectorText","selector":"td.col-2 span.hide-text","parentSelectors":["element"],"multiple":false,"regex":"","delay":0},{"id":"table","type":"SelectorTable","selector":"table.main-table","parentSelectors":["_root"],"multiple":true,"columns":[{"header":"Products","name":"Products","extract":true},{"header":"Non-regional More information Non-regional services are ones where there is no dependency on a specific Azure region","name":"Non-regional More information Non-regional services are ones where there is no dependency on a specific Azure region","extract":true},{"header":"East US","name":"East US","extract":true},{"header":"East US 2","name":"East US 2","extract":true},{"header":"Central US","name":"Central US","extract":true},{"header":"North Central US","name":"North Central US","extract":true},{"header":"South Central US","name":"South Central US","extract":true},{"header":"West Central US","name":"West Central US","extract":true},{"header":"West US","name":"West US","extract":true},{"header":"West US 2","name":"West US 2","extract":true},{"header":"Canada East","name":"Canada East","extract":true},{"header":"Canada Central","name":"Canada Central","extract":true},{"header":"Brazil South","name":"Brazil South","extract":true},{"header":"North Europe","name":"North Europe","extract":true},{"header":"West Europe","name":"West Europe","extract":true},{"header":"France Central","name":"France Central","extract":true},{"header":"France South","name":"France South","extract":true},{"header":"Germany Non-Regional","name":"Germany Non-Regional","extract":true},{"header":"Germany Central","name":"Germany Central","extract":true},{"header":"Germany Northeast","name":"Germany Northeast","extract":true},{"header":"UK West","name":"UK West","extract":true},{"header":"UK South","name":"UK South","extract":true},{"header":"Southeast Asia","name":"Southeast Asia","extract":true},{"header":"East Asia","name":"East Asia","extract":true},{"header":"Australia Central","name":"Australia Central","extract":true},{"header":"Australia Central 2","name":"Australia Central 2","extract":true},{"header":"Australia East","name":"Australia East","extract":true},{"header":"Australia Southeast","name":"Australia Southeast","extract":true},{"header":"Central India","name":"Central India","extract":true},{"header":"West India","name":"West India","extract":true},{"header":"South India","name":"South India","extract":true},{"header":"Japan East","name":"Japan East","extract":true},{"header":"Japan West","name":"Japan West","extract":true},{"header":"Korea Central","name":"Korea Central","extract":true},{"header":"Korea South","name":"Korea South","extract":true},{"header":"US Gov Non-Regional","name":"US Gov Non-Regional","extract":true},{"header":"US Gov Virginia","name":"US Gov Virginia","extract":true},{"header":"US Gov Iowa","name":"US Gov Iowa","extract":true},{"header":"US Gov Arizona","name":"US Gov Arizona","extract":true},{"header":"US Gov Texas","name":"US Gov Texas","extract":true},{"header":"US DoD East","name":"US DoD East","extract":true},{"header":"US DoD Central","name":"US DoD Central","extract":true}],"delay":0,"tableDataRowSelector":"tr.toggled.open:nth-of-type(n+4)","tableHeaderRowSelector":"tr.data-table-headers:nth-of-type(2)"}]}