Config Guide ============ The AutoTableDetector and AutoTableFormatter have separate configurations. This guide focuses on the **formatter** side. Basics ------- The :class:`~gmft.auto.AutoFormatConfig` object can be passed into either the :class:`~gmft.auto.AutoTableFormatter` constructor or the :meth:`~gmft.formatters.tatr.TATRFormatter.df` method. For example: .. code-block:: python from gmft.auto import AutoFormatConfig, AutoTableFormatter # ... code here config = AutoFormatConfig(verbosity=3) formatter = AutoTableFormatter(config=config) ft = formatter.format(table) df = ft.df() # formatter's tables automatically uses settings of config config_overrides = AutoFormatConfig(enable_multi_header=True) df = ft.df(config_overrides=config_overrides) # if provided, config_overrides replaces config, so verbosity is reverted df = ft.df(config_overrides={"enable_multi_header": True) # pass dict to keep verbosity setting New behavior in v0.3: If `config_overrides` is provided, it completely replaces everything in `config`. For instance, if a value is set in `config` but left unassigned in `config_overrides`, the resultant object will **revert** to the default value. In versions <0.3, assigned values in `config_overrides` would have been merged into `config`. In the above example, the resultant object would have previously contained the value from `config`. To retain this old behavior, a dict can be passed. .. _semantic_spanning_cells: Semantic Spanning ------------------ The **semantic spanning cells** setting supports headers with multiple rows or columns. Supported spanning cells can either be on the top or left header of the table. .. figure:: /images/spanning_hier_left.png :alt: spanning hierarchical left Fig 1. Spanning Hierarchical Left Header .. figure:: /images/spanning_hier_top.png :alt: spanning hierarchical top Fig 2. Spanning Hierarchical Top Header .. raw:: html
Table 1. semantic_spanning_cells=True
Dataset Total Tables \nInvestigated† Total Tables \nwith a PRH∗ Tables with an oversegmented PRH \nTotal Tables with an oversegmented PRH \n% (of total with a PRH) Tables with an oversegmented PRH \n% (of total investigated)
0 SciTSR 10,431 342 54 15.79% 0.52%
1 PubTabNet 422,491 100,159 58,747 58.65% 13.90%
2 FinTabNet 70,028 25,637 25,348 98.87% 36.20%
3 PubTables-1M (ours) 761,262 153,705 0 0% 0%

Enable Multi Header -------------------- A slight **misnomer**, **enable multi header** only enforces that the pandas dataframe has multiple headers. This setting does not need to be enabled for semantic spanning cells (ie. hierarchical top or left headers) to be processed. If this setting is false, then all the headers are condensed into one header. Multi-line (and hence hierarchical) information is preserved through ``\n`` characters. .. raw:: html
Table 2. semantic_spanning_cells=True, enable_multi_header=True
Header 2 NaN NaN NaN Tables with an oversegmented PRH Tables with an oversegmented PRH Tables with an oversegmented PRH
Header 1 Dataset Total Tables \nInvestigated† Total Tables \nwith a PRH∗ Total % (of total with a PRH) % (of total investigated)
0 SciTSR 10,431 342 54 15.79% 0.52%
1 PubTabNet 422,491 100,159 58,747 58.65% 13.90%
2 FinTabNet 70,028 25,637 25,348 98.87% 36.20%
3 PubTables-1M (ours) 761,262 153,705 0 0% 0%

.. _large_table_assumption: Large Table Assumption ----------------------- The **large table assumption** is a mechanic that improves performance on large tables. Here, algorithmically generated rows are used instead of deep learning. By default, large table assumption activates under these conditions: At least one of these: 1. More than ``large_table_if_n_rows_removed`` rows are removed (default: >= 8) 2. OR all of the following are true: * Measured overlap of rows exceeds ``large_table_row_overlap_threshold`` (default: 20%) * AND the number of rows is greater than ``large_table_threshold`` (default: >= 10) Large table assumption can be directly turned on/off with ``config.large_table_assumption = True/False``. .. list-table:: * - .. figure:: /images/lta_off.png Fig 3. Deep bboxes - .. figure:: /images/lta_on.png Fig 4. Large Table Assumption on .. raw:: html Fig. 3 and 4 Credits: © C. Dougherty 2001, 2002 (c.dougherty@lse.ac.uk). These tables have been computed to accompany the text C. Dougherty Introduction to Econometrics (second edition 2002, Oxford University Press, Oxford). They may be reproduced freely provided that this attribution is retained.