Hi,
(1) I currently find under Edit metadata > Start URL, I can only input a "proper" URL, that includes 2 conditions:
(a) a "http://" or "https://" beginning, and
(b) that ends in a valid-looking top level domain (TLD), such as .com and .net (and even .local).
(2) Is it possible to allow:
(a) the "file://" protocol beginning, and
(b) any-format URLs, such as localhost, 127.0.0.1, or 192.168.1.100?
(3) The reason for this request is that it will open up a lot of possibilities:
(a) It will go a long way to simplify multiple-URL scraping. Currently we need to add Start URL one by one with the + button. Or often we can use serial number [1-2000].
With such a new feature, we can build larger HTML file with a long list of varied URLs in Excel or other scripts, then call it from Start URL with "file://..." or "http://localhost/urls.html".
Currently I can still do something like this, but I have to serve the listing file from a local (Apache) server, and change the "hosts" file to simulate a "http://myproject.local" URL that contains the valid TLD of ".local". This works pretty well. But I wonder if it can become simpler.
(b) It will open the possibility of wrapping JSON data file in our own parser script, that generates a HTML-looking file locally, then ask Webscraper.io to scrape that local HTML file. That way, we might be able to "directly" scrape JSON-only data file (after predicting and organizing the JSON file API/URLs).
The above can still be done with our own local server. But the current restrictions make it more difficult by disallowing locally-served JS or jQuery scripted files.
Are there reasons to restrict or exclude the "file://" protocol and invalid TLD currently? URL validation is a good point; can validation result appear as a warning text only?
Thanks for the great software!