gmft.pdf_bindings
These classes are aliased through this module: BasePage, BasePDFDocument, ImageOnlyPage, PyPDFium2Page, PyPDFium2Document.
- class gmft.pdf_bindings.BasePage(page_number: int)
Bases:
ABC- abstract get_filename() str
- abstract get_image(dpi: int = None, rect: Rect = None) Image
Get an image of the page, constrained to be within the given rect. (x0, y0, x1, y1)
- abstract get_positions_and_text() Generator[tuple[float, float, float, float, str], None, None]
A generator of text and positions. The tuple is (x0, y0, x1, y1, “string”)
- height: float
- property page_no
- width: float
- class gmft.pdf_bindings.ImageOnlyPage(img: Image, *, words: list[tuple[float, float, float, float, str]] = None, dpi: int = None)
Bases:
BasePageThis is a dummy page that only contains an image.
- Parameters:
words – Assumes the words provided are in PDF units (dpi=72), not image units.
dpi – If provided, will assume the image is an upscaled version taken from the PDF.
- close()
- classmethod from_page(page: BasePage, dpi: int) ImageOnlyPage
dpi is needed for upscaling
- get_filename() str
- class gmft.pdf_bindings.PyPDFium2Document(filename: str)
Bases:
BasePDFDocumentWraps a pdfium.PdfDocument object. Note that you (the user) are responsible for calling doc.close() once you are done, otherwise the document will remain open and consume resources.
- close()
Close the document
- get_filename() str
- class gmft.pdf_bindings.PyPDFium2Page(page: pypdfium2.PdfPage, filename: str, page_no: int, *, parent: PyPDFium2Document = None)
Bases:
BasePageNote: This follows PIL’s convention of (0, 0) being top left. Therefore, beware: y0 and y1 are flipped from PyPDFium2’s convention.
- close()
Not recommended: use close_document instead.
- close_document()
- get_filename() str