Odczyt tekstu przy pomocy opencv tylko w określonej pozycji (kwadracie) na zdjęciu

mmllmm · Kwiecień 6, 2025

Witam

Do pierwszych testów z rozpoznawaniem tekstu użyłem poniższego kodu.

Generalnie działa to na zasadzie rozpoznawania tekstu w całym obszarze zdjęcia i daje wynik w postaci pliku txt

W moim przypadku chciałbym mieś możliwość określenia niewielkiego pola (x,y) w skali całego zdjęcia gdzie rozpoznawanie tekstu powinno działać.

Tych miejsc na zdjęciu może być nawet 50 ale istotne jest dla mnie jaki odczyt jest z konkretnego miejsca.

Wszystkie rozwiązania jakie znalazłem analizują cały obraz w jednym cyklu.

Dobrze byłoby mieć też podgląd jak zakreśla się pole do odczytu na zdjęciu ( ułatwi to ustawienie parametrów)

Zdjęcie do analizy byłoby robione przez kamerkę dedykowaną dla RPi i rozdzielczość byłaby stała.

Zastanawia mnie czy tryb w jakim zdjęcie zostanie zrobione może mieć duże znaczenie i czy przy pomocy kamerki można to już zdefiniować aby ułatwić odczyt tekstu.

Jeśli spotkaliście się z rozwiązaniem jaki potrzebuję lub ewentualnie podobnym który można łatwo zmodyfikować to proszę o informację.

W grę wchodzi tez modyfikacja kodu poniżej jesli to nie byłoby zbyt skomplikowane.

LINK do żródła: geeksforgeeks.org/text-detection-and-extraction-using-opencv-and-ocr/

# Import required packages

import cv2

import pytesseract

 

# Mention the installed location of Tesseract-OCR in your system

pytesseract.pytesseract.tesseract_cmd = '/bin/tesseract'  # In case using colab after installing above modules

 

# Read image from which text needs to be extracted

img = cv2.imread("sample.jpg")

 

# Preprocessing the image starts

 

# Convert the image to gray scale

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

 

# Performing OTSU threshold

ret, thresh1 = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)

 

# Specify structure shape and kernel size.

# Kernel size increases or decreases the area

# of the rectangle to be detected.

# A smaller value like (10, 10) will detect

# each word instead of a sentence.

rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18))

 

# Applying dilation on the threshold image

dilation = cv2.dilate(thresh1, rect_kernel, iterations = 1)

 

# Finding contours

contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL,

                                                 cv2.CHAIN_APPROX_NONE)

 

# Creating a copy of image

im2 = img.copy()

 

# A text file is created and flushed

file = open("recognized.txt", "w+")

file.write("")

file.close()

 

# Looping through the identified contours

# Then rectangular part is cropped and passed on

# to pytesseract for extracting text from it

# Extracted text is then written into the text file

for cnt in contours:

    x, y, w, h = cv2.boundingRect(cnt)

   

    # Drawing a rectangle on copied image

    rect = cv2.rectangle(im2, (x, y), (x + w, y + h), (0, 255, 0), 2)

   

    # Cropping the text block for giving input to OCR

    cropped = im2[y:y + h, x:x + w]

   

    # Open the file in append mode

    file = open("recognized.txt", "a")

   

    # Apply OCR on the cropped image

    text = pytesseract.image_to_string(cropped)

   

    # Appending the text into file

    file.write(text)

    file.write("\n")

   

    # Close the file

    file.close()