Member-only story

Python爬蟲學習筆記(七) — 處理圖片驗證碼

Yanwei Liu
12 min readJul 18, 2019

--

要解決圖片驗證碼,你或許有更好的方式:

文字驗證碼

安裝

pip install pytesseract
pip install tesseract-ocr

匯入模組

try:
from PIL import Image
except ImportError:
import Image
import pytesseract

(1)處理黑白的數字圖片

captcha = Image.open("captcha1.png")           #黑白圖片
result = pytesseract.image_to_string(captcha)
print(result)

(2)處理彩色的數字圖片

def convert_img(img,threshold):
img = img.convert("L") # 處理灰白
pixels = img.load()
for x in range(img.width):
for y in range(img.height):
if pixels[x, y] > threshold:
pixels[x, y] = 255
else:
pixels[x, y] = 0
return img
convert_img(captcha,150)
result = pytesseract.image_to_string(result)
print(result)

(3)處理有雜訊的數字圖片

data = img.getdata()
w,h = img.size
count = 0
for x in range(1,h-1):
for y in range(1, h - 1):
# 找出各个像素方向
mid_pixel = data[w * y + x]
if mid_pixel == 0:
top_pixel = data[w * (y - 1) + x]
left_pixel = data[w * y + (x - 1)]…

--

--

No responses yet