gmft.detectors.tatr module

class gmft.detectors.tatr.TATRDetector(config: TATRDetectorConfig = None, default_implementation=True)

Bases: BaseDetector[TATRDetectorConfig]

Uses TableTransformerForObjectDetection for small/medium tables, and a custom algorithm for large tables.

Using extract() produces a FormattedTable, which can be exported to csv, df, etc.

Detects tables in a pdf page. Default implementation uses TableTransformerForObjectDetection.

Initialize the TableDetector.

Parameters:
  • config – TATRDetectorConfig

  • default_implementation – Should be True, unless you are writing a custom subclass for TableDetector.

extract(page: BasePage, config_overrides: TATRDetectorConfig = None, rect: Rect = None) list[gmft.detectors.base.CroppedTable]

Detect tables in a page.

Parameters:
  • page – BasePage

  • config_overrides – Optional config overrides for this extraction

  • rect – Optional Rect to constrain detection within given dimensions

Returns:

list of CroppedTable objects

gmft.detectors.tatr.TATRTableDetector

alias of TATRDetector

gmft.detectors.tatr.TableDetector

alias of TATRDetector