Skip to content

Releases: conjuncts/gmft

v0.2.2

04 Sep 05:10
Compare
Choose a tag to compare

Changes

  • is_projecting_row is removed, with the information now available under FormattedTable._projecting_indices
  • Formally removed timm as a dependency
  • Slight tweak to captions with the aim to better reflect paragraph word height, still WIP. See #8 and be93159
  • Fix: return result so image can be used outside of notebook by @brycedrennan in #15

Full Changelog: v0.2.1...v0.2.2

v0.2.1

04 Sep 04:55
Compare
Choose a tag to compare

Full Changelog: v0.2.0...v0.2.1

v0.2.0

04 Sep 04:51
Compare
Choose a tag to compare

Features:

  • Multiple headers; multi-index tables (6225043)
  • Spanning cells on both the top and left (bbbbd7c)
  • Captions for tables (ca18bcc)
  • "Margin" parameter allows text outside of table bbox to be included (ab81f22)
  • Return visualized images as PIL image; allow padding or margin around visualized (ab81f22)

Several tweaks to formatting algorithm that may result in different outputs compared to prior versions.

  • Automatically drop rows whose only non-null values is the "is_projecting_row" column
  • Fill in gaps between table rows, to reduce skipped text
  • Non-maxima suppression, as seen in inference.py (ab81f22)
    • "total overlap" metric has become less useful in favor of "rows removed by NMS"
  • Widen out the rows to same length
  • Several tweaks to conditions, parameters, heuristics
    • superscripts/subscripts now more likely to be merged to their parent rows

Many possibly breaking changes to config.

  • TableDetectorConfig.confidence_score_threshold has been renamed to TableDetectorConfig.detector_base_threshold
  • TableFormatter.deduplication_iob_threshold has been removed in favor of nms_iob_threshold
  • spanning_cell_minimum_width, corner_clip_outlier_threshold, and aggregate_spanning_cells have been removed
  • Tweaks to default settings may yield different results
  • no_timm is now the default, which fixes #1.
    • this might cause slightly different bboxes

v0.1.1

04 Sep 04:39
Compare
Choose a tag to compare
v0.1.1 Pre-release
Pre-release
  • Created AutoTableFormatter and AutoTableDetector for future flexibility (v0.1.1, a840488)
  • Renamed is_spanning_row to is_projecting_row (v0.1.1, a840488)

Older:

  • Even better accuracy for large tables (v0.1.0, 8c537ed)

Full Changelog: v0.1.0...v0.1.1

v0.0.4

04 Sep 04:38
Compare
Choose a tag to compare
v0.0.4 Pre-release
Pre-release
  • Added support for rotated tables (5aeb80d)