Python影像辨識筆記(二):中英文字OCR辨識(圖片、驗證碼、長篇文章)

#20200107更新:若tesseract無法辨識出結果,可用Pillow進行對比或亮度處理

pytesseract可以辨識多種格式,如:tiff,pdf,jpg,png等

安裝相關套件

pip install pillow
pip install pytesseract
Tesseract-OCR tesseract-ocr-w64-setup-v4.1.0.20190314 (rc1)#記得OCR的安裝路徑:C:\Program Files\Tesseract-OCR
#安裝時要選取中文語言包才能辨識中文
或者至以下連結下載語言包資料https://github.com/tesseract-ocr/tessdata存至路徑
C:\Program Files\Tesseract-OCR\tessdata
20191202更新
### 錯誤解決pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path ###
找到pytesseract模組裡的pytesseract.py文件,進行修改找到:tesseract_cmd = 'tesseract'

改成:tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
tesseract_cmd所賦予的值其實就是tesseract的安裝路徑

英文字圖片

Image for post
Image for post
Hello word !

辨識英文

import pytesseract
from PIL import Image
image = Image.open(r'C:\Users\Yanwei\Desktop\新增資料夾\a.png')
code = pytesseract.image_to_string(image)
print(code)
Image for post
Image for post
成果

中文字圖片

Image for post
Image for post

辨識中文

import pytesseract
from PIL import Image
image = Image.open(r'C:\Users\Yanwei\Desktop\新增資料夾\a.png')
code = pytesseract.image_to_string(image, lang='chi_sim')
print(code)
Image for post
Image for post
成果

Written by

Machine Learning / Deep Learning / Python / Flutter cakeresume.com/yanwei-liu

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store