Member-only story

Python爬蟲學習筆記(七) — 處理圖片驗證碼

Yanwei Liu

12 min readJul 18, 2019

要解決圖片驗證碼，你或許有更好的方式：

如何破解並繞過網頁上常見的Captcha驗證？以2Captcha API為例

2Captcha是一個非常強大的CAPTCHA辨識服務。在我們日常生活當中，如果要登入網站(如：AWS的帳戶登入頁面)，可能就會遇到需要手動輸入驗證碼的視窗，有些可能單純只是英文及數字的組合。但是有些卻極度複雜，扭曲的字體及顏色，常常讓使用…

yanwei-liu.medium.com

文字驗證碼

安裝

pip install pytesseract
pip install tesseract-ocr

匯入模組

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract

(1)處理黑白的數字圖片

captcha = Image.open("captcha1.png")           #黑白圖片
result = pytesseract.image_to_string(captcha)
print(result)

(2)處理彩色的數字圖片

def convert_img(img,threshold):
    img = img.convert("L")  # 處理灰白
    pixels = img.load()
    for x in range(img.width):
        for y in range(img.height):
            if pixels[x, y] > threshold:
                pixels[x, y] = 255
            else:
                pixels[x, y] = 0
    return imgconvert_img(captcha,150)
result = pytesseract.image_to_string(result)
print(result)

(3)處理有雜訊的數字圖片

data = img.getdata()
    w,h = img.size
    count = 0
    for x in range(1,h-1):
        for y in range(1, h - 1):
            # 找出各个像素方向
            mid_pixel = data[w * y + x]
            if mid_pixel == 0:
                top_pixel = data[w * (y - 1) + x]
                left_pixel = data[w * y + (x - 1)]…

Python爬蟲學習筆記(七) — 處理圖片驗證碼

如何破解並繞過網頁上常見的Captcha驗證？以2Captcha API為例

文字驗證碼

Written by Yanwei Liu

No responses yet