Can anyone please help me in creating sitemap

https://apps.coachfederation.org/eweb/CCFDynamicPage.aspx?webcode=ccfsearch

Can anyone please help me in creating sitemap.

Here by above link we will go to coach finder link, on left hand side demographic location is to be United states ,thus search result will open above 6000+ .

My problem is that I am not able to find any link on each element. But when we click of element, it directs us to other url within same page. Thus I am not able to scrap the data from page in page. example : https://apps.coachfederation.org/eweb/CCFDynamicPage.aspx?webcode=ccfcoachprofileview&coachcstkey=2131C545-4A51-4173-8DCD-4609F67414F7
Thus from this webpage again I want some data like name and address.

Looking forward for you support.

Thank you.

Hey there,

I'm not an expert but you need to find something called "iframe id" (see pic below)

I checked the source code of the website you provided but couldn't identify the iframe id. I'm sure other memebers could be of help.

Can’t seen to get into work. Played around with a few css selectors but none worked @iconoclast?

@hossain007 and @bretfeig Thank you very much for your response.

Still waiting to solve my query. Can any one please assist me.

Looking forward.

Thank you;

Hi!

For some reason it just stops after second coach. There is a workaround, since every coach has a coach key kept on a compare button, you can collect all the keys and then create a sitemap to scrape particular profiles using scraped keys.

Here's a sitemap that will paginate through all pages and pick coach keys and names accordingly:

{"_id":"coach_fed","startUrl":["https://apps.coachfederation.org/eweb/CCFDynamicPage.aspx?webcode=ccfsearch"],"selectors":[{"id":"test","type":"SelectorElementClick","selector":"div.ui.content div div.ui > div.content","parentSelectors":["_root"],"multiple":true,"delay":"4000","clickElementSelector":"a.icon.borderless","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"coach_key","type":"SelectorElementAttribute","selector":"div.ui.custom.right.floated.checkbox > input","parentSelectors":["test"],"multiple":false,"extractAttribute":"value","delay":0},{"id":"name","type":"SelectorText","selector":"div.header","parentSelectors":["test"],"multiple":false,"regex":"","delay":0}]}

Then you will have to create a different sitemap using collected keys and below URL as a prefix for them.

https://apps.coachfederation.org/eweb/CCFDynamicPage.aspx?webcode=ccfcoachprofileview&coachcstkey= 

I strongly recommend you to narrow the results to maybe a particular state or any other specific criteria you would like to sort coaches by.
I can help you with the macro to create URL list for WebScraper, I believe you don't want to add thousand+ of URLs just by [ + ] button in Metadata.

P.S. if you want to stop scraping manually, you have to call Developer tools within Scrape window, and delete 'next page' button (shown as a right arrow on a page), that will result in successfull scraping finish and results will be properly shown.

Let me see if I an assemble the sitemap based on the sitemap Iconoclast recommended. Might take me a bit to scrape 6500 records and make the sitemap.

Here is the full JSON files to scrape it all. It's too large to paste here

https://jsoneditoronline.org/?id=7746526196844818bc3bb23cd6804dd1

1 Like

Awesome work there, Bret!

@sss, all you've left to modify is the actual selectors for coach pages, name/address/e-mail etc.
Please keep in mind that this sitemap is pretty much memory consuming, i would recommend you to install CouchDB beforehand and use it instead of chrome internal storage (it might crash).

Please refer to this topic upon installing CouchDB:

@iconoclast, @bretfeig Thank you very much. It's working !! It's amazing, I am in excitement.

You people are genius. Hats off to you.

1 Like

Here you go! Wish Iconoclast's help, I got you the data

1 Like

Thank you so much for supporting me @bretfeig. I am grateful to you. You made my day.:grinning::+1::+1:

@bretfeig @iconoclast @hossain007 I am glad to meet you all.

1 Like

Hi Everyone,

@iconoclast , @bretfeig , I tried to scrape myself some data as per your directions from the same website for limited search result. I am facing a challenge again. @bretfeig you have helped me a lot by scraping total requirement. Here is a request to guide me to do it myself.

As a part of learning, below are the steps that I followed which doesn't work, can you please assist me to scrape the data:

Step 1: I have scraped the coach keys for five selected states (search result shown 193 coaches).
Step 2: Prepared a JSON file and imported sitemap to Web scraper

Step 3: Tried to scrape the data but its showing to log in. When I refresh the browse data the column is showing as NULL.

Please have a look at below images:

Selected States
selected the 5 states from US

Total 193 coaches appeared in search result and sitemap created by importing JSON file
Sitemap Start url

Data is not scraping, it is showing to login
showing to sign in

After refreshing, it shows NULL, No data is scraped
showing null

@iconoclast , @bretfeig Can you please assist me that where I went wrong.

Looking forward;

Thank you for your continuous support.

Post your sitemap and I'll have a look.

this was my sitemap
https://goo.gl/v9Y7pS

Hi @bretfeig,

Here is my site map
https://drive.google.com/file/d/1Io6eikE36rtP1KLYVwbB8GhD8bmtvwUC/view?usp=sharing

Can you please give me access to your sitemap. The link https://goo.gl/v9Y7pS is saying that I should have permission to access. I also requested the access.

Thank you.

Done! You can now see my sitemap... apologizes..

I'll have a look at yours.

It seems like your links are incorrect.. The only difference is what I've made bold below.

My Links: "https://apps.coachfederation.org/eweb/CCFDynamicPage.aspx?webcode=ccfcoachprofileview&coachcstkey=FFF3FEE1-00FA-4FA6-A2F8-048C5D75AE7D

You Link: https://apps.coachfederation.org/eweb/CCFDynamicPage.aspx?webcode=ccfsearchprofileview&coachcstkey=7F824E68-5215-4334-A226-3A48C2806DD5

1 Like

Thank you very much. Now it started working :+1::grinning: