I am trying to extract the hyperlinks from websites, but only the ones that are on the main page. But, when I pass the URL, the scraper collects all the href links within the domain. Is there any way to scrap only the main page?
Here is the function I use for each website:
def scrap_(website): # website address is given to the function in a string format
try:
http = httplib2.Http()
status, response = http.request(str('https://') + website)
for link in bs.BeautifulSoup(response, 'html.parser',
parseOnlyThese = SoupStrainer('a')):
if link.has_attr('href'):
print(link['href'])
except:
ConnectionRefusedError