Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Integer conversion in BigQueryIO's Storage Write API method. #24366

Merged
merged 9 commits into from
Dec 13, 2022

Conversation

slilichenko
Copy link
Contributor

Improve error handling when converting TableRows to a schema containing BigQuery's INTEGER type:

  • Allow values to be of BigDecimal and BigInteger types
  • Capture the failed value and the field name and type to help with troubleshooting

@slilichenko slilichenko marked this pull request as draft November 27, 2022 17:10
@slilichenko slilichenko marked this pull request as ready for review November 28, 2022 16:38
@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @apilloud for label java.
R: @ahmedabu98 for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

try {
return Long.valueOf((String) value);
} catch (NumberFormatException e) {
// Expected. Element will be added to the failed element PCollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are going to result in a generic SchemaDoesntMatchException("Unexpected value :"...) being thrown which will make this potentially difficult to debug when it happens. If we can't just let the exception bubble up here, you should wrap it with a new exception rather than just dropping it. (Same thing for the exceptions below.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit. Only implemented for the INTEGER type at the moment.

slilichenko added 3 commits November 28, 2022 19:59
…Introduced a dedicated Exception. Improved error messages by capturing the source failure and reducing message sizes.
…Introduced a dedicated Exception; improved error messages by capturing the source failure.
Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

slilichenko added 2 commits November 30, 2022 10:53
# Conflicts:
#	sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java
#	sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java
@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2022

Reminder, please take a look at this pr: @apilloud @ahmedabu98

@github-actions
Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @lukecwik for label java.
R: @pabloem for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

…am/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java
@lukecwik
Copy link
Member

remind me after tests pass

@github-actions
Copy link
Contributor

Ok - I'll remind @lukecwik after tests pass

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply spotless

@lukecwik
Copy link
Member

retest this please

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix weird merge

…am/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java
@lukecwik lukecwik merged commit b896cd5 into apache:master Dec 13, 2022
lostluck pushed a commit to lostluck/beam that referenced this pull request Dec 22, 2022
…pache#24366)

* Improve Integer conversion in BigQueryIO's Storage Write API method.

* Improve Integer conversion in BigQueryIO's Storage Write API method. Introduced a dedicated Exception. Improved error messages by capturing the source failure and reducing message sizes.

* Improve Integer conversion in BigQueryIO's Storage Write API method. Introduced a dedicated Exception; improved error messages by capturing the source failure.

* Fixed a minor typo.

* Merged PR with new commits.

* Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java

* Apply suggestions from code review

Apply spotless

* Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java

Co-authored-by: slilichenko <Sergei Lilichenko>
Co-authored-by: Lukasz Cwik <lcwik@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants