- [Fix] Now it is possible to select elements within element selector that has CSS selector
- [Feature] Allow IP addresses, host names as start urls
- [Change] Refactored CSV export. Line break now is
\r\n instead of
\n. Now the CSV format should be as described in RFC https://tools.ietf.org/html/rfc4180 . Tested edge cases when text contains quotes and backslashed quotes. Opening CSV with MS Office 365 and libre office worked as expected. Note some software libraries use backslash
\ as escape character. This use is incorrect by RFC standard.
- [Change] Data extraction module has been rewritten completely. The only difference should be that an element selector with multiple not checked will return record with null values when child selectors don't extract anything
- [Feature] Element click selector can have child element click selectors. Useful when clicking trough multiple variation selections in a product page
- [Feature] With some minor changes we managed to port Web Scraper on Firefox
- [Fix] Refactored page status code detection code to fix a race condition on Firefox
- [Feature] Element click selector now also triggers touch events in case buttons are triggered by touch instead of click.
- [Fix] Disallow "&" char in selector ids
- [Fix] Sitemaps in sitemap list are now loaded one by one. This should resolve a problem when too big sitemaps are stored in chrome
- [Feature] Web Scraper got a new logo
- [Feature] In a recent release Chrome added lookbehind to regex engine. Now you can write regex like this
(?<=sku: ).+. This will extract
- [Feature] Now you can detach devtools and run web scraper in a separate window.
- [Fix] When a parent element selector selected HTML element data preview didn't work
- [Fix] When a class name contained % char CSS selector couldn't be generated
- [Fix] Escape special characters in a CSS selector ($, (, etc..)
- [Fix] White space in links extracted by link selector is now removed or escaped
- [Fix] Table selectors header and data row selector generator has been rewritten. Fixed an issue when table contained tables
- [Fix] Added error handlers for errors that were happening in chrome API
- [Fix] Web scraper won't stop if an url with invalid domain name is being scraped. It will continue
- [Change] If a page returns loads with a 4xx or 5xx status code data won't be extracted from this page.
- [Feature] CSS selector can now generate CSS selectors that start with
- [Change] delay option is marked as deprecated in some selectors
- [Fix] Image selector might have failed when a sitemap had image download enabled. Image download was disabled in a previous release.
- [Fix] If the loaded url isn't an HTML document data won't be extracted from it. For example it might be an Image url.
- [Change/Fix] We are limiting start url count to 10000 in a sitemap. The problem was that chrome has some internal storage limitations and a large sitemap could make all of sitemaps inaccessible.
- [Change] Scraper window now opens the first url that needs to be scraped instead of the "waiting" page
- [Feature] When element click selector is used to click through a
<select> tag, the selected
<option> tag will have
- [Feature] We added a survey system to better understand what should we focus on.
- Fixed an issue where large csv files couldn't be downloaded.
- Fixed scrolling with mouse middle button in some UI elements.
- Fixed data preview which sometimes showed more data because of multiple linked link selectors
- Updated test sites in webscraper.io. Now there are product pages, more pagination pages and a test site for popup link selector.
- Refactored ajax wait functionality. It will wait only for xhr and script requests that are made to domain or subdomain of the currently open age.
- Element click selector now will be able to click on
<option> tags. Instead of clicking it will trigger a value change event.
- Fixed an issue in URLSearchParams library which was incorrect in chrome 49
- Disabled wait ajax functionality, on chrome versions that doesn't support privilege request from devtools (chrome 49 couldn't)
- Refactored page load delay waiter with an 60s+30s delays. The scraper will also try multiple times to connect to scraper window during page load process if the content script isn't reachable. Previously during this check an error could happen on a slower computer.
- Fixed element preview for element click selector
- Added ajax wait to element click selector delay. Element click selector now should wait on requests that the page is making after clicking an element. This won't work in windows xp/vista though.
- Increased page load timeouts which have caused problems when a page has a lot of content or when scraper is running on a slower computer
- Reordered scraped data preview columns to match exported csv columns.
- Fixed an issue where configuring request interval would make the scraper to load only one page
- Added tab refresh timeout. In case of a timeout scraper window will be recreated
- Added page load checker. In case a page is stuck in loading process scraper window will be recreated
- Added tab refresh
- Added page load detection using network listeners. This detection feature will wait for dynamic data to load before starting data extraction
- Page load detection feature should also increase page load speed.
- Added primitive adblocker. Right now it blocks few analytics trackers in scraper window.
- Removed Image download. (Use image download script instead)