Python: Scraping Local Web Page by Beautiful Soup

n order to get related data from other web pages, some of web site uses web scraping to get parse HTML files an extract information. Learn how to use Beautiful Soup library to parse web page.

Learn on HTML structure

In order for us to read out the content of a web page, we will need to understand some basic of HTML tag.

A anatomy of an HTML Tag

Tag NameDescription
<html>The root-level tag of HTML document. It encapsulates all other HTML tags.
<head>The head section of an HTML document that contains metadata about the page.
<title>The title of the web page, ,to be displayed on the tab of the browser.
<body>The body of an HTML document, with all displayed content.
<h1>A level 1 heading for example, the title of a news article
<p>A paragraph of displayed content
<div>A container uses for page elements that divide the HTML document into sections.
<a>A hyperlink to link one page to another

Create a sample local web page

  1. We will create a sample html for our demo on our article.
  2. Create a CSS (style) for the redtext and leftmargin to use on the hyperlink
  3. Hyperlink to the libraries and directory of University of Kentucky.
UKY example html
UKY example html
html display
html display

Extract Information from Beautiful Soup

local_parse code
local_parse code
  1. Import Beautiful Soup libraries with bs4 (Beautiful Soup 4)
  2. Open the local web page that we have created
  3. Load using Beautiful Soup html parser to generate BeautifulSoup object, which represents the document as a nested data structure.
  4. Use findAll tag to get related tags.
  5. Print the <a> hyperlink tags and its text.
local_parse output
local_parse output

Finally, you now can work on a offline web page for scraping the data content. Next, we will discuss on live web page .

beautiful cartoon
beautiful cartoon

Take care and see you again.

Leave a comment