Issue
I was trying to read the text from image using pytesseract. Image is here, .
Using the Code i was able to read the text but it fails if there are city names listed in two rows. Example, in the image Grand Junction or Monterey bay national marine sanctuary are expected to be identified as single word but they are getting to new rows.
Code:
act_image = cv2.imread('C:/Users/a463129/Downloads/chromedriver_win32/images/capture.png')
dimension = act_image.shape
image = act_image[0:dimension[0], 500:dimension[1]]
image = cv2.bitwise_not(image)
cv2.imshow("invert", image)
cv2.waitKey()
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = np.ones((1, 1), np.uint8)
image = cv2.dilate(image, kernel, iterations=1)
image = cv2.erode(image, kernel, iterations=1)
image = cv2.GaussianBlur(image, (5, 5), 0)
img = image
img = cv2.resize(img,(0,0),fx=3,fy=3, interpolation=cv2.INTER_CUBIC)
img = cv2.medianBlur(img,5)
img = cv2.threshold(img,200,255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.imshow('asd',cv2.resize(img,(0,0),fx=0.3,fy=0.3))
cv2.waitKey(0)
cv2.destroyAllWindows()
txt = pytesseract.image_to_string(img)
Output: Twin Falls, Medford m, Logan e, Sait Lake City a, Redding verna, NEVADA, Chico Reno, UTAH Grand, JUNCTION, Sacramento, San Francisco, San Jos▒ NEVADA TEST Ou, MONTEREY AND TRAINING, CALIFORNIA MANGE (MTT RI St George, BAY NATIONAL, MARINE Fresno, SANCTUARY, Las Vegas, Gallup, Kingman, Santa Barbara Lancaster, ARIZONA, Los Angeles paim Springs
Solution
I am new to stackoverflow and this is the first time I am answering a question. So please forgive me for any kind of misleading or incorrect answers.
Considering your image as a noise-free image, I have an idea to extract the full city name by passing only that part of the image (cropped image) to tesseract. For that I have used morphological operations on the image for text-block-segmentation and obtain the coordinates of the contour. Then I cropped the otsu image and passed it to tesseract.
here is the full code in python:
import cv2
import pytesseract
import numpy as np
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
image = cv2.imread("act.png")
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
otsu = ~(cv2.threshold(gray,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1])
erode_otsu = cv2.erode(otsu,np.ones((7,7),np.uint8),iterations=1)
negated_erode = ~erode_otsu
dilated = cv2.dilate(negated_erode,np.ones((3,3),np.uint8),iterations=4)
contours_otsu,_ = cv2.findContours(dilated,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
texts = []
for cnt in contours_otsu:
x,y,w,h = cv2.boundingRect(cnt)
mask = otsu[y:y+h,x:x+w]
custom_oem_psm_config = r'--oem 3 --psm 3'
text = pytesseract.image_to_string(mask,lang='eng',config=custom_oem_psm_config)
print(text)
texts.append(text)
print(texts)
cv2.imwrite("dilated.jpg",dilated)
Output: ['', 'Palm Springs', 'Los Angeles', 'ARIZONA', 'Santa Barbara', 'Lancaster', 'Flagstaff', 'Kingman', 'Gallup', 'Las Vegas', 'Fresno', 'CALIFORNIA', 'St. George', 'MONTEREY\nBAY NATIONAL\nMARINE\nSANCTUARY', '', '', 'NEVADA TEST\nAND TRAINING\nRANGE (NTTR)', 'San José', 'San Francisco', 'Sacramento', 'Grand\nJunction', 'UTAH', 'Reno', 'Chico', 'NEVADA', 'Vernal', 'Redding', 'Salt Lake City', 'Eureka', '', '', 'Logan', '', 'Medford', 'Twin Falls']
There you go, text according to block-segmentation. I assume you do not have any time constraint for this, because the code takes a considerable amount of time as the image_to_string function is used inside a loop. You can also check out image_to_data function. You can also try cleaning your output text or make use of confidences instead. Thank you.
Answered By - Tarun Chakitha
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.