Scrape emails from URL
Tags: #beautifulsoup #python #scraping #emails #url #webscraping #html
Last update: 2023-04-12 (Created: 2023-02-16)
Description: This notebook will show how to scrape emails stored in HTML webpage using BeautifulSoup.
from urllib.parse import urlsplit
from collections import deque
from bs4 import BeautifulSoup
import pandas as pd
url: URL of the webpage to scrape
limit: number of emails found to stop scraping
url = "https://www.naas.ai/"
limit = 3
We will use the
requestslibrary to get the HTML content of the webpage and the
BeautifulSouplibrary to parse the HTML content. We will use a regular expression to extract the emails from the HTML content.
unscraped = deque([url])
scraped = set()
emails = set()