Scraping Links from a Webpage in Python

Here, we are going to learn how to scrape links from a webpage in Python, we are implementing a python program to extract all the links in a given WebPage.
Submitted by Aditi Ankush Patil, on May 17, 2020


  1. Urllib3: It is a powerful, sanity-friendly HTTP client for Python with having many features like thread safety, client-side SSL/TSL verification, connection pooling, file uploading with multipart encoding, etc.
    Installing urllib3:
        $ pip install urllib3
  2. BeautifulSoup: It is a Python library that is used to scrape/get information from the webpages, XML files i.e. for pulling data out of HTML and XML files.
    Installing BeautifulSoup:
        $ pip install beautifulsoup4

Commands Used:

html= urllib.request.urlopen(url).read(): Opens the URL and reads the whole blob with newlines at the end and it all comes into one big string.

soup= BeautifulSoup(html,'html.parser'): Using BeautifulSoup to parse the string BeautifulSoup converts the string and it just takes the whole file and uses the HTML parser, and we get back an object.

tags= soup('a'): To get the list of all the anchor tags.

tag.get('href',None): Extract and get the data from the href.

Python program to Links from a Webpage

# import statements
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup

# Get links
# URL of a WebPage
url = input("Enter URL: ") 

# Open the URL and read the whole page
html = urllib.request.urlopen(url).read()
# Parse the string
soup = BeautifulSoup(html, 'html.parser')
# Retrieve all of the anchor tags
# Returns a list of all the links
tags = soup('a')

#Prints all the links in the list tags
for tag in tags: 
  # Get the data from href key
  print(tag.get('href', None), end = "\n")


Enter URL:

Comments and Discussions!

Load comments ↻

Copyright © 2024 All rights reserved.