diy:projets:paperscanner
Différences
Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédenteProchaine révisionLes deux révisions suivantes | ||
diy:projets:paperscanner [2018/04/25 21:05] – gbouyjou | diy:projets:paperscanner [2018/04/25 21:30] – [Crop and rotate target paper] gbouyjou | ||
---|---|---|---|
Ligne 25: | Ligne 25: | ||
---- | ---- | ||
===== Explanation of source program ===== | ===== Explanation of source program ===== | ||
- | Firstly we have a picture of text page like this | ||
- | {{: | + | ---- |
+ | ==== Crop and rotate target paper ==== | ||
+ | |||
+ | Firstly we have a picture of text page like this\\ | ||
+ | {{ : | ||
+ | \\ | ||
and we have to retrieve all sentence of this text.\\ | and we have to retrieve all sentence of this text.\\ | ||
+ | \\ | ||
So before apply image treatment operations, I want to crop only the body text and align it.\\ | So before apply image treatment operations, I want to crop only the body text and align it.\\ | ||
- | I need to find the four corner of page before use warpPerspective function, to eliminate other color unlike the white page, I use histogram to exclude other colors | + | I need to find the four corner of page before use warpPerspective function,\\ |
- | On histogram, there are two peak, on left this is the yellow color chair, and the other is the page color, seem as yellow more white that the precedent, is not a perfect white page. | + | to eliminate other color unlike the white page, I use histogram to exclude other colors\\ |
- | {{: | + | {{ : |
+ | On histogram, there are two peak, on left this is the yellow color chair,\\ | ||
+ | and the other is the page color, seem as yellow more white that the precedent,\\ | ||
+ | is not a perfect white page.\\ | ||
+ | \\ | ||
I find the two boundary with this code | I find the two boundary with this code | ||
- | |||
<code python> | <code python> | ||
img = cv2.cvtColor(img, | img = cv2.cvtColor(img, | ||
Ligne 78: | Ligne 86: | ||
</ | </ | ||
after that, I make the contours, I see clearly the quadrilateral, | after that, I make the contours, I see clearly the quadrilateral, | ||
- | {{: | + | {{ : |
<code python> | <code python> | ||
Ligne 103: | Ligne 111: | ||
</ | </ | ||
- | the result : | + | the result :\\ |
- | {{: | + | {{ : |
+ | |||
+ | |||
+ | ---- | ||
+ | ==== Treatment (Thresholding, | ||
So now we must treat text character before launch tesseract recognition | So now we must treat text character before launch tesseract recognition | ||
Ligne 120: | Ligne 132: | ||
img = cv2.threshold(img, | img = cv2.threshold(img, | ||
</ | </ | ||
+ | |||
+ | |||
+ | ---- | ||
+ | ==== Character Recognition ==== | ||
finally I use tesseract and check if each word exist in a language text dictionary | finally I use tesseract and check if each word exist in a language text dictionary | ||
Ligne 139: | Ligne 155: | ||
print(" | print(" | ||
</ | </ | ||
+ | after ten seconds, I find 34.4% correct French word in the text. | ||
+ | {{: | ||
diy/projets/paperscanner.txt · Dernière modification : 2018/04/25 21:33 de gbouyjou