Python爬蟲+資料處理Project(1):電子書網站標題爬蟲

Image for post
Image for post

在本文中,我們將學習如何使用Python的requests, BeautifulSoup, pandas來完成一個簡單的爬蟲專案

1.安裝模組

pip install requests
pip install beautifulsoup4
pip install pandas

2.開始爬蟲

我們要爬取的網頁是All IT eBooks — Free IT eBooks Download,目前總共有831頁。每頁都有電子書的書名,我想要把這831頁的書本標題都下載回來,但是一頁一頁抓,很麻煩,該怎麼辦呢?

import requests                               #引入兩個模組
from bs4 import BeautifulSoup

3.進行資料分析

執行程式後,我們將書名的標題複製起來,透過Excel產生一個xlsx檔案,如下圖所示:

Image for post
Image for post

接著,我們透過pandas來進行資料分析

import pandas as pd                            #引入模組
df=pd.read_excel("All IT eBooks.xlsx") #讀取xlsx檔案
df.head() #顯示前5筆資料
df.describe() #針對資料產生描述性呈現

Written by

Machine Learning / Deep Learning / Python / Flutter cakeresume.com/yanwei-liu

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store