Advanced
Specify locations of headers/rows
With known locations of headers, rows, or cells, you can configure a TATRFormattedTable of your choice.
One option is through the TATRFormattedTable.from_dict() method.
To see the necessary format, check out github.
This example is taken from the internal test for serialization.
tiny_info = {
"filename": "data/pdfs/tiny.pdf",
"page_no": 0,
"bbox": [76.66205596923828, 162.82687377929688, 440.9659729003906, 248.67056274414062],
"confidence_score": 0.9996763467788696,
"label": 0,
"config": {},
"outliers": {},
"fctn_results": {
"scores": [
0.9999045133590698,
0.9998310804367065,
0.9999147653579712,
0.9998205304145813,
0.9999688863754272,
0.9998650550842285,
0.9998096823692322,
0.9897574186325073,
0.9998759031295776
],
"labels": [2, 2, 1, 2, 1, 1, 2, 3, 0],
"boxes": [
[-0.3175201416015625, 43.53631591796875, 362.50933837890625, 67.26876831054688],
[-0.5251426696777344, 19.269771575927734, 362.5640869140625, 43.460350036621094],
[-0.41268157958984375, 0.794677734375, 128.8265838623047, 86.2611312866211],
[-0.4305534362792969, 0.80535888671875, 362.67877197265625, 18.99618148803711],
[129.67820739746094, 0.8213462829589844, 252.4720458984375, 86.1773452758789],
[251.82122802734375, 0.8133773803710938, 362.7557678222656, 86.11017608642578],
[-0.3641777038574219, 67.27252197265625, 362.414794921875, 86.34217834472656],
[-0.4329795837402344, 0.8099098205566406, 362.6827087402344, 18.966079711914062],
[-0.43839263916015625, 0.771270751953125, 362.543212890625, 86.21470642089844]
]
}
}
table = TATRFormattedTable.from_dict(tiny_info, page)
These are the labels for bboxes: (the source location is TATRFormattedTable.id2label)
id2label = {
0: 'table',
1: 'table column',
2: 'table row',
3: 'table column header',
4: 'table projected row header',
5: 'table spanning cell',
6: 'no object',
}
The fctn_results field of a FormattedTable can also be specified with the appropriate structure.
Changing the bboxes in this way should affect subsequent calls to FormattedTable.df().