Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't load entire bigquery query results in memory #1638

Merged
merged 1 commit into from
Jun 11, 2021

Conversation

tsotnet
Copy link
Collaborator

@tsotnet tsotnet commented Jun 11, 2021

Signed-off-by: Tsotne Tabidze tsotne@tecton.ai

What this PR does / why we need it: In BigQueryOfflineStore.get_historical_features Feast calls entity_df_job.result() which causes the entity data frame to be loaded in memory. We only need to wait for the job to be done and for the schema to be returned, while not changing the existing query (because the results of the query are being used for joining with offline feature data). This PR changes the call to entity_df_job.result(max_results=0), which at the same time:

  1. Executes the same query in BigQuery, so the table that gets joined does not change
  2. Does not load any of the query rows in memory
  3. Still allows the schema to be acquired from the resulting object

I checked all other places where we call .result() in the codebase, but none of them needed fixing due to couple of different reasons (happy to elaborate if anyone is curious).

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Fix BigQuery entity dataframe SQL query results being completely loaded in memory

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2021

Codecov Report

Merging #1638 (2e66974) into master (0d7e858) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1638      +/-   ##
==========================================
+ Coverage   83.59%   83.60%   +0.01%     
==========================================
  Files          67       67              
  Lines        6017     6027      +10     
==========================================
+ Hits         5030     5039       +9     
- Misses        987      988       +1     
Flag Coverage Δ
integrationtests 83.52% <100.00%> (+0.01%) ⬆️
unittests 76.43% <0.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/infra/offline_stores/bigquery.py 92.48% <100.00%> (ø)
sdk/python/tests/test_e2e_local.py 100.00% <0.00%> (ø)
sdk/python/feast/infra/offline_stores/file.py 96.70% <0.00%> (+0.03%) ⬆️
sdk/python/feast/errors.py 66.66% <0.00%> (+0.87%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d7e858...2e66974. Read the comment docs.

Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, tsotnet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 731bca7 into feast-dev:master Jun 11, 2021
tsotnet pushed a commit that referenced this pull request Jun 17, 2021
Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants