Scrape only the main page of a website

sasa · July 19, 2022, 9:54am

I am trying to extract the hyperlinks from websites, but only the ones that are on the main page. But, when I pass the URL, the scraper collects all the href links within the domain. Is there any way to scrap only the main page?

Here is the function I use for each website:

def scrap_(website): # website address is given to the function in a string format
    

    try:

        http = httplib2.Http()
        status, response = http.request(str('https://') + website)


        for link in bs.BeautifulSoup(response, 'html.parser',
                             parseOnlyThese = SoupStrainer('a')):
            if link.has_attr('href'):
                print(link['href'])

    
    except:
        ConnectionRefusedError