Releases · UW-COSMOS/Cosmos

Table context enrichment during ingestion. Enabling (via the --use-table-context-enrichment option on the ingest CLI) will match detected tables to mentions within the body text, adding a context_from_text field to the output parquet.
The retrieval API has been updated to search either:
- local_content field (default) - the text content of the table and its associated caption, if any
- full_content field - local_content plus context_from_field
- Any of the three fields separately (content, caption_content, context_from_text)
Text normalization. Enabling (via the --use-text-normalization option on the ingest CLI) will do basic unicode normalization to regularize ligature usage and mojibake issues from the text layer.
ASKE-ID lookup within the retrieval API.

Provide feedback