Scrape emails from URL
Tags: #beautifulsoup #python #scraping #emails #url #webscraping #html
Last update: 2023-04-12 (Created: 2023-02-16)
Description: This notebook will show how to scrape emails stored in HTML webpage using BeautifulSoup.
References:
import re
import requests
from urllib.parse import urlsplit
from collections import deque
from bs4 import BeautifulSoup
import pandas as pd
url
: URL of the webpage to scrapelimit
: number of emails found to stop scraping
url = "https://www.naas.ai/"
limit = 3
We will use the
requests
library to get the HTML content of the webpage and the BeautifulSoup
library to parse the HTML content. We will use a regular expression to extract the emails from the HTML content.unscraped = deque([url])
scraped = set()
emails = set()