diy:projets:paperscanner
Différences
Ci-dessous, les différences entre deux révisions de la page.
| Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente | ||
| diy:projets:paperscanner [2018/04/25 21:05] – gbouyjou | diy:projets:paperscanner [2018/04/25 21:33] (Version actuelle) – [Crop and rotate target paper] gbouyjou | ||
|---|---|---|---|
| Ligne 25: | Ligne 25: | ||
| ---- | ---- | ||
| ===== Explanation of source program ===== | ===== Explanation of source program ===== | ||
| - | Firstly we have a picture of text page like this | ||
| - | {{: | + | ---- |
| + | ==== Crop and rotate target paper ==== | ||
| + | |||
| + | Firstly we have a picture of text page like this\\ | ||
| + | {{ : | ||
| + | \\ | ||
| and we have to retrieve all sentence of this text.\\ | and we have to retrieve all sentence of this text.\\ | ||
| + | \\ | ||
| So before apply image treatment operations, I want to crop only the body text and align it.\\ | So before apply image treatment operations, I want to crop only the body text and align it.\\ | ||
| - | I need to find the four corner of page before use warpPerspective function, to eliminate other color unlike the white page, I use histogram to exclude other colors | + | I need to find the four corner of page before use warpPerspective function,\\ |
| - | On histogram, there are two peak, on left this is the yellow color chair, and the other is the page color, seem as yellow more white that the precedent, is not a perfect white page. | + | to eliminate other color unlike the white page, I use histogram to exclude other colors\\ |
| - | {{: | + | {{ : |
| + | On histogram, there are two peak, on left this is the yellow color chair,\\ | ||
| + | and the other is the page color, seem as yellow more white that the precedent,\\ | ||
| + | is not a perfect white page.\\ | ||
| + | \\ | ||
| I find the two boundary with this code | I find the two boundary with this code | ||
| - | |||
| <code python> | <code python> | ||
| img = cv2.cvtColor(img, | img = cv2.cvtColor(img, | ||
| Ligne 78: | Ligne 86: | ||
| </ | </ | ||
| after that, I make the contours, I see clearly the quadrilateral, | after that, I make the contours, I see clearly the quadrilateral, | ||
| - | {{: | + | {{ : |
| <code python> | <code python> | ||
| Ligne 103: | Ligne 111: | ||
| </ | </ | ||
| - | the result : | + | the result :\\ |
| - | {{: | + | {{ : |
| + | |||
| + | |||
| + | ---- | ||
| + | ==== Treatment (Thresholding, | ||
| So now we must treat text character before launch tesseract recognition | So now we must treat text character before launch tesseract recognition | ||
| Ligne 120: | Ligne 132: | ||
| img = cv2.threshold(img, | img = cv2.threshold(img, | ||
| </ | </ | ||
| + | |||
| + | |||
| + | ---- | ||
| + | ==== Character Recognition ==== | ||
| finally I use tesseract and check if each word exist in a language text dictionary | finally I use tesseract and check if each word exist in a language text dictionary | ||
| Ligne 139: | Ligne 155: | ||
| print(" | print(" | ||
| </ | </ | ||
| + | after ten seconds, I find 34.4% correct French word in the text. | ||
| + | {{: | ||
diy/projets/paperscanner.1524690320.txt.gz · Dernière modification : de gbouyjou
