gmft package
Top Level Aliases
Currently, contains aliases for key classes and functions.
Importing from the top-level module previously resulted in long load times. However, v0.5 introduces lazy loading, which greatly improves the situation.
Now, classes may either be imported from their original locations, gmft.auto, or from here, where they will be lazy loaded.
- class gmft.TATRFormatConfig(*args, **kwargs)
Bases:
objectThis import is deprecated.
Please use: - Reformat API (v0.5) - gmft.formatters.tatr.TATRFormatConfig
- classmethod get_mirrored_class()
- class gmft.TATRFormattedTable(*args, **kwargs)
Bases:
objectThis import is deprecated.
Please use: - Reformat API (v0.5) - gmft.formatters.tatr.TATRFormattedTable
- classmethod get_mirrored_class()
- class gmft.TATRTableDetector(*args, **kwargs)
Bases:
objectThis import is deprecated.
Please use: - gmft.AutoTableDetector - gmft.detectors.tatr.TATRDetector
- classmethod get_mirrored_class()
- class gmft.TATRTableFormatter(*args, **kwargs)
Bases:
objectThis import is deprecated.
Please use: - gmft.auto.AutoTableFormatter - gmft.formatters.tatr.TATRFormatter
- classmethod get_mirrored_class()
PDF providers
In gmft, multiple documents and PDF providers are supported through a common interface. PyPDFium2 is the default PDF reader. Pymupdf offers more accurate performance but requires the more restrictive AGPL license.
Detectors
In gmft, detectors locate the positions and bounds (bbox) of tables on a page.
Formatters
In gmft, formatters take a located table (CroppedTable) and produces machine-readable output (ie. pandas DataFrame). This task is known in the literature as table structure recognition and table function.