gmft.formatters.base module

A collection of common objects used by formatters.

Type Hierarchy:

  • CroppedTable
    • RotatedCroppedTable
      • FormattedTable

  • BaseFormatter
    • TATRFormatter (alias: TableFormatter)

BaseFormatter is the base class for all formatters.

class gmft.formatters.base.BaseFormatter

Bases: ABC

Abstract class for converting a CroppedTable to a FormattedTable. Allows export to csv, df, etc.

abstract extract(table: CroppedTable) FormattedTable

Extract the data from the table. Produces a FormattedTable instance, from which data can be exported in csv, html, etc.

format(table: CroppedTable, **kwargs) FormattedTable

Alias for extract().

class gmft.formatters.base.FormattedTable(cropped_table: CroppedTable, df: pandas.DataFrame = None)

Bases: RotatedCroppedTable

This is a table that is “formatted”, which is to say it is functionalized with header and data information through structural analysis. Therefore, it can be converted into df, csv, etc.

Warning: This class is not meant to be instantiated directly. Use a TableFormatter to convert a CroppedTable to a FormattedTable.

Construct a CroppedTable object.

Parameters:
  • page – BasePage

  • bbox – tuple of (xmin, ymin, xmax, ymax) or Rect object

  • confidence_score – confidence score of the table detection

  • label – label of the table detection. 0 means table 1 means rotated table

df(recalculate=False, config_overrides=None) pandas.DataFrame

Return the table as a pandas dataframe. :param recalculate: By default, a cached dataframe is returned.

Note that it is preferred to explicitly call recompute().

abstract static from_dict(d: dict, page: BasePage)

Deserialize from dict

predictions: TablePredictions
recompute(config=None) pandas.DataFrame

Recompute the internal dataframe.

abstract to_dict()

Serialize self into dict

abstract visualize()

Visualize the table.

class gmft.formatters.base.TableFormatter

Bases: BaseFormatter