from img2table.document import Image as Img
from img2table.ocr import TesseractOCR
import pytesseract
from PIL import Image

from IPython.display import display, Image as IPImage
display(IPImage(filename="ProductSalesTable.png"))

from IPython.display import display, Image as IPImage
display(IPImage(filename="text.png"))

# Set Tesseract binary path (Windows only)
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# ---------- TABLE EXTRACTION ----------
ocr = TesseractOCR()
doc = Img("ProductSalesTable.png")

tables = doc.extract_tables(ocr=ocr)

if tables:
    df = tables[0].df
    print(df)
else:
    print("No tables found")

                   0            1      2           3
0            Product     Category  Units  Revenue($)
1             Laptop  Electronics   1273   19,09,500
2     Wireless Mouse  Peripherlas   4652    1,39,560
3          USB-C Hub  Peripherlas   3121    1,87,260
4          HD WebCam  Accessories   1638    1,31,040
5  Monitor 27 inches  Electronics    870    6,09,000
6     Mechanical KBD  Peripherlas   2176    5,44,000
7      Desk USB Lamp  Accessories   2959      73,975

# ---------- TEXT EXTRACTION ----------
img = Image.open("text.png")
text = pytesseract.image_to_string(img)

print(text)

The Future of Artificial Intelligence

Artificial Intelligence (AI) is rapidly transforming the way we live and
work. From virtual assistants to self-driving cars, AI systems are
becoming more capable and widely adopted.

Task	Tool	Output
Extract table from image	`img2table` + `TesseractOCR`	pandas DataFrame
Extract text from image	`pytesseract`	Python string

OCR — Extracting Data from Images using Python¶

Workflow¶

Import Libraries¶

Image 1: Table Image¶

Image 2: Text Image¶

Table Extraction → DataFrame¶

Text Extraction → Python String¶

Conclusion¶