Hey foxes!
I need to scrape the source code . i tried with this -> view-source:https://www.mysite.com.. but can not scrape the page.. webscraper doesnt let me save the URL.
Any help?
THX a looot
Hey foxes!
I need to scrape the source code . i tried with this -> view-source:https://www.mysite.com.. but can not scrape the page.. webscraper doesnt let me save the URL.
Any help?
THX a looot
The scraper has access to the current "view-source" regardless, so do not need to add this part in the metadata. The only thing is that you will not be able to use the point-and-click interface for it and will have to build the scraper by creating the selectors manually and typing them out.
Ah ok thanks! i tried few times but i dont see how to scrape it with webscraper and extract the "logging_page_id":"profilePage_6887304"
in this page : view-source:https://www.instagram.com/tf1/
. could you help me a bit
Thanks a loooot
This is an interesting one, I know what you're trying to extract - the Instagram User ID. You can get that info by using this link:
https://www.instagram.com/web/search/topsearch/?query=[exact username]
So for your example it would be:
https://www.instagram.com/web/search/topsearch/?query=tf1
The results page is a plain text file, so there is only one selector, pre. Grab the text from that, and you can then pick out the user ID with a regex. This is assuming the first line contains the correct user ID. The regex I'm using is \b\d{6,10}\b , which picks out the first 6-10 digit number it finds. That's why you'll need the exact username. Sample sitemap:
{"_id":"instagram_get_user_id","startUrl":["https://www.instagram.com/web/search/topsearch/?query=tf1"],"selectors":[{"id":"user_id","type":"SelectorText","parentSelectors":["_root"],"selector":"pre","multiple":false,"regex":"\\b\\d{6,10}\\b","delay":0}]}
you're the best!! Thanks a lot!