The URL http://example.com/page/[1-3] can scrape:
My question is: how to scrape pages with alphabet. For example:
If I use http://example.com/page/[a-c], the program report error.
Thank you!
The URL http://example.com/page/[1-3] can scrape:
My question is: how to scrape pages with alphabet. For example:
If I use http://example.com/page/[a-c], the program report error.
Thank you!
Hi!
In order to keep pages arranged the way you want, you need to use CouchDB.
Information about CouchDB server can be found here:
http://webscraper.io/documentation#storage-backends
Please keep in mind that multiple URL sitemap works bottom-up.
P.S. if you ment how to add multiple URLs, open your Metadata, then add URLs by pressing [ + ] button to the right side.
Thank you for reply. Can you give me a simple example or URLs to learn tips on using Web Scraper+CouchDB ?
You can download CouchDB instance directly from here: https://dl.bintray.com/apache/couchdb/win/2.1.2/couchdb-2.1.2.msi
Then you have to install it, i would recommend to have it installed (if possible) on your second drive, in the root of the drive (like D:\CouchDB).
Next, you right click on WebScraper icon in your Browser, click Options, then select CouchDB from the list.
Then put these two lines accordingly:
(sitemap) http://127.0.0.1:5984/scraper-sitemaps
(data) http://127.0.0.1:5984/
And there you go.
You can access your CouchDB server instance using this url: http://127.0.0.1:5984/_utils/
I try to create the following sitemap and view it in http://127.0.0.1:5984/_utils/#database/scraper-sitemaps/example
{
"_id": "example",
"_rev": "2-32bd47d5bbddc2eb23bc9e3ec7014772",
"startUrl": [
"http://example.com/page/a"
],
"selectors": [
{
"id": "example",
"type": "SelectorText",
"selector": "h1",
"parentSelectors": [
"_root"
],
"multiple": false,
"regex": "",
"delay": 0
}
]
}
Can you teach me how to scrape a set of URLs with the last letter from a to z?
http://example.com/page/a
http://example.com/page/b
http://example.com/page/c
...
http://example.com/page/z
Thank you very much!
Hi!
Please add URLs within WebScraper itself, using Menu -> Your Sitemap Name -> (dropdown) -> Metadata.
Please add urls bottom up (starting from Z to A). You can add new url by pressing [ + ] button to the right side of URL list.
I that case I will manually add 26 URLs. However my original request is to scrape pages with 3-letter string with numbers(0-9) and alphabet(a-z). Can you find a method to add URLs automatically?
I do add URLs automatically using Macro in UltraEdit(paid, more functionality) / Notepad++(free, 'nuff functionality)
The array [#-#] method works only for numbers though.
Hi, there is another way - just use URL encoding (percent encoding) to turn your letters into numbers.
For instance, http://example.com/page/%61 is the same as http://example.com/page/a
%61 = a
%62 = b
%63 = c and so on (refer to chart)
Then you can use the WS number ranges again. However, these are hexa numbers so it'll only work for letters a - i (%61 - %69). The letter j is %6A.
Fun test - where do you think https://forum.webscraper.%69%6F
points to?