Welcome to gmft’s documentation!
gmft is a lightweight, performant, configurable, deep library for converting pdf tables to many formats, including cropped image, text + positions, plaintext, csv, and pandas dataframes.
To see the approximate quality of gmft, the eval notebook (colab) (github) shows the output of gmft on a variety of pdfs.
To see how gmft stacks up against the many alternatives, this comparison may help you decide which library is best for your use case.
Check out the Usage section, including Installation instructions.
Check out the Config Guide section for a description on gmft’s settings.
Note
This project is under active development.
Table of Contents
Pages
- Usage
- Config Guide
- Passing into LLMs
- FAQ
- Why is my table not detected?
- How to parse my table with merged cells?
- Is gmft thread-safe?
- I need to tweak something (location/rotation) about a table. How do I do this?
- ValueError: The identified boxes have significant overlap
- What format is best for LLMs?
- How to get tables formatted inline with text?
- Cannot close object, library is destroyed.
- Advanced
- gmft package