Skip to content

Commit

Permalink
Spec: add a note about mapping of Parquet BYTE_ARRAY type to Arrow ty…
Browse files Browse the repository at this point in the history
…pes (#190)

Fixes #187
  • Loading branch information
rouault committed Nov 17, 2023
1 parent e52ace1 commit 9f96beb
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions format-specs/geoparquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ This is version 1.1.0-dev of the GeoParquet specification. See the [JSON Schema
## Geometry columns

Geometry columns MUST be stored using the `BYTE_ARRAY` parquet type. They MUST be encoded as [WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary).

Implementation note: when using the ecosystem of Arrow libraries, Parquet types such as `BYTE_ARRAY` might not be directly accessible. Instead, the corresponding Arrow data type can be `Arrow::Type::BINARY` (for arrays that whose elements can be indexed through a 32-bit index) or `Arrow::Type::LARGE_BINARY` (64-bit index). It is recommended that GeoParquet readers are compatible with both data types, and writers preferably use `Arrow::Type::BINARY` (thus limiting to row groups with content smaller than 2 GB) for larger compatibility.

See the [encoding](#encoding) section below for more details.

### Nesting
Expand Down

0 comments on commit 9f96beb

Please sign in to comment.