Python網頁設計-Django使用筆記(二)：實作104職缺爬蟲APP

Yanwei Liu

8 min readJan 13, 2020

Django系列文：

Python網頁設計-Django使用筆記(一)：基本環境

Python網頁設計-Django使用筆記(三)：Mezzanine CMS

Python網頁設計-Django使用筆記(四)：LINE聊天機器人(部署至Heroku)

Python網頁設計-Django使用筆記(五)：資料庫(CRUD)

Python網頁設計-Django使用筆記(六)：幫網站加上Custom Domain

Python網頁設計-Django使用筆記(七)：套用Bootstrap Template

在閱讀本文之前，請先確保已經閱讀過下面這篇文章，了解Django的基本運作狀況。

Python網頁設計-Django使用筆記(一)：基本環境

How To Use Django in 10 minutes

medium.com

Repo:

e96031413/104-Django-APP

github.com

先備知識：

經過一連串的測試，終於把這個爬蟲做成Django版本了。建立Django專案後，主要會修改到4個檔案:

settings.py

加入APP到INSTALLED_APPS、設定templates資料夾目錄、修改LANGUAGE_CODE、TIME_ZONE、設定static資料夾目錄

views.py

幾乎所有外部程式都在這個views.py裡面執行，通常會包含多個函式。

寫成函式的原因是因為等等的urls.py在進行網址連接時，第二個參數必須是函式。我們的程式也都是透過第二個參數函式執行

urls.py

處理APP網站執行時的網址path()

from django.urls import path
from Job.views import main,POST_crawl urlpatterns = [
 path(‘admin/’, admin.site.urls),
 path(‘’,main),
 path(‘POST_crawl/’,POST_crawl)
]

index.html

位於templates底下。這個檔案作為程式啟動的首頁，裡面會有表單輸入功能，會將其存成變數，在views.py中執行。

result.html

位於templates底下。這個檔案是作為爬蟲程式執行完顯示的結果頁面。

關於{ { } }符號的部分

Django的模板讀取變數的語法為：{{變數名稱}}

所以如果我要在template資料夾中的html檔案顯示python處理的程式內容要用：

{{ greeting }}          #顯示greeting變數的內容
{% for job in text %}   #執行for迴圈

先備知識的部分到此先告一段落，接下來是實作的部分

實作

建立項目

django-admin startproject Job_104

建立JobSearch應用程式

python manage.py startapp JobSearch

建立templates資料夾

於manage.py所在位置輸入下列指令產生templates資料夾

md templates

啟動伺服器

python manage.py runserver

設定settings.py

settings.py加入Application應用程式

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'JobSearch'   #新增的JobSearch APP
]

settings.py設定Template路徑

設定templates路徑讓Django能讀取到模板(.html)檔案

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [os.path.join(BASE_DIR, 'templates'),],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

(可跳過)settings.py設定語系和時區

LANGUAGE_CODE = 'zh-hant' #繁體中文
TIME_ZONE = 'Asia/Taipei' #台北時區
USE_I18N = True
USE_L10N = True
USE_TZ = True

設定urls.py

from django.urls import path#從JobSearch資料夾底下的views.py引入main函式及POST_crawl函式
from JobSearch.views import main,POST_crawlurlpatterns = [
    path('',main),                 #在首頁下執行main函式
    path('POST_crawl/',POST_crawl) #在POST_crawl頁面執行POST_crawl函式
]

至於main函式跟POST_crawl函式的內容是什麼，我們等等會提到。

設定views.py

從下面的程式我們可以看到，main函式就是開啟index.html、POST_crawl就是透過selenium進行職缺搜尋，保存結果後，將結果呈現於result.html。

from django.shortcuts import render
from django.http import HttpResponse
from django.views.decorators import csrffrom selenium import webdriver
from selenium.webdriver.chrome.options import Optionsdef main(request):                         #render "index.html"出來
    return render(request,'index.html')def POST_crawl(request):
    keywords = request.POST["title"]       
    #從index.html中選中使用者輸入的內容
    #我將表單輸入框的name設定成"title"，所以這邊要選title，並且存成   keywords變數，再進行後續的爬蟲操作
.
.
中間略過
.
.
.
    return render(request,'result.html',locals())
#render "result.html"出來

完整的views.py請參考以下

在templates資料夾底下建立兩個檔案

index.html

其實就是基礎的html表單格式，需要注意的是form action的部分。我們採用/POST_crawl/，與前面urls.py當中的path(‘POST_crawl/’,POST_crawl)，是同一個部分。

method採用post進行傳送。

{% csrf_token %}則是django用來處理post的語法。

輸入職缺名稱後方有一個name=”title”的部分是讓我們在views.py當中，讀取職缺名稱的變數

<!DOCTYPE html>
<htmllang="en"><head>
    <meta charset="UTF-8">
</head>
<body>
    <h1>104職缺爬蟲：</h1>
    <p>程式執行需要一些時間，請耐心等待~</p>
    <form action="/POST_crawl/" method="post">{% csrf_token %}
       輸入職缺名稱:<input type="text" name="title"><br>
       <input type="submit" value="送出">        
    </form></body>
</html>

result.html

{% for job in text %}指的是我在views.py中有一個變數text，用來保存職缺，透過這個for迴圈讀取出所有內容，保存成job變數

{{job}}搭配<br>，可以將所有的內容顯示在網頁當中，<br>讓文字跳到下一行，而不會產生一片混亂的情形

{{ % endfor %}}代表迴圈結束

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>104職缺爬蟲結果</title>
</head>
<body>
    <h2>您爬取的職缺如下：</h2>{% for job in text %}
    {{ job }} <br>
 {% endfor %}
    
</body>
</html>

Python網頁設計-Django使用筆記(二)：實作104職缺爬蟲APP

Python網頁設計-Django使用筆記(一)：基本環境

How To Use Django in 10 minutes

Repo:

e96031413/104-Django-APP

先備知識：

settings.py

views.py

urls.py

index.html

result.html

關於{ { } }符號的部分

先備知識的部分到此先告一段落，接下來是實作的部分

實作

建立項目

建立JobSearch應用程式

建立templates資料夾

啟動伺服器

設定settings.py

settings.py設定Template路徑

(可跳過)settings.py設定語系和時區

設定urls.py

設定views.py

在templates資料夾底下建立兩個檔案

Written by Yanwei Liu

No responses yet