Python爬蟲學習筆記(五) — 使用newspaper3k進行新聞爬蟲
5 min readJul 17, 2019
安裝
pip install newspaper3k
pip install nltk進入CMD
python
import nltk
nltk.download('punkt')
基本流程
>>> from newspaper import Article>>> article = Article('https://www.npr.org/2019/07/10/740387601/university-of-texas-austin-promises-free-tuition-for-low-income-students-in-2020')>>> article.download()
>>> article.parse()
>>> article.nlp()------------------------------------------------------------------->>> article.authors #作者['Vanessa Romo', 'Claire Mcinerny']>>> article.publish_date #發文時間datetime.datetime(2019, 7, 10, 0, 0)>>> article.keywords #文章關鍵字['free', 'program', '2020', 'muñoz', 'offering', 'loans', 'university', 'texas', 'texasaustin', 'promises', 'families', 'lowincome', 'students', 'endowment', 'tuition']>>> print(article.text) #全文University of Texas-Austin Promises Free Tuition For Low-Income Students In 2020toggle caption Jon Herskovitz/ReutersFour year colleges and universities have difficulty recruiting...>>> print(article.summary) #總結University of Texas-Austin Promises Free Tuition For Low-Income Students In 2020toggle caption Jon Herskovitz/ReutersFour year colleges and universities have difficulty recruiting talented students from the lower end of the economic spectrum who can't afford to attend such institutions without taking on massive debt.To remedy that — at least in part — the University of Texas-Austin announced it is offering full tuition scholarships to in-state undergraduates whose families make $65,000 or less per year.The endowment — which includes money from oil and gas royalties earned on state-owned land in West Texas — more than doubles an existing program offering free tuition to students whose families make less than $30,000.It also expands financial assistance to middle class students whose families earn up to $125,000 a year, compared to the current $100,000.In 2008, Texas A&M began offering free tuition to students whose families' income was under $60,000.
完整程式:以The News Lens為例
from newspaper import Article
URL='https://www.thenewslens.com/article/121889'
article = Article(URL)
article.download()
article.parse()
article.nlp()
print("作者:",article.authors) #作者
print("發文時間:",article.publish_date) #發文時間
print("全文",article.text) #全文
print("關鍵字:",article.keywords) #關鍵字
print("總結:",article.summary) #總結