Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8212] Added extra config for Billing big query project id #11956

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ad1happy2go
Copy link
Collaborator

Change Logs

When the big query dataset project id is different than billing project id, the job has to be submitted in different project.

Impact

none

Risk level (write none, low medium or high below)

low

Documentation Update

Big Query sync Documentation update is necessary for the extra config. Will raise a PR for that too.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Sep 18, 2024
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

public static final ConfigProperty<String> BIGQUERY_SYNC_BILLING_PROJECT_ID = ConfigProperty
.key("hoodie.gcp.bigquery.sync.project_id")
.noDefaultValue()
.markAdvanced()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add .sinceVersion("1.0.0")

@@ -58,6 +58,12 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable
.markAdvanced()
.withDocumentation("Name of the target project in BigQuery");

public static final ConfigProperty<String> BIGQUERY_SYNC_BILLING_PROJECT_ID = ConfigProperty
.key("hoodie.gcp.bigquery.sync.project_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.key("hoodie.gcp.bigquery.sync.project_id")
.key("hoodie.gcp.bigquery.sync.billing_project_id")

@@ -47,6 +48,7 @@ public class TestBigQuerySyncConfig {
public void testGetConfigs() {
Properties props = new Properties();
props.setProperty(BIGQUERY_SYNC_PROJECT_ID.key(), "fooproject");
props.setProperty(BIGQUERY_SYNC_BILLING_PROJECT_ID.key(), "fooproject");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a different project ID for billing in the test?

@@ -62,6 +62,7 @@

public class TestHoodieBigQuerySyncClient {
private static final String PROJECT_ID = "test_project";
private static final String BILLING_PROJECT_ID = "test_project";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here.

@@ -124,7 +128,7 @@ public void createOrUpdateTableUsingBqManifestFile(String tableName, String bqMa
QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query)
.setUseLegacySql(false)
.build();
JobId jobId = JobId.newBuilder().setProject(projectId).setRandomJob().build();
JobId jobId = JobId.newBuilder().setProject(billingProjectId).setRandomJob().build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the billing project ID used here only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

billing project id is just used for submitted the jobs. the dataset still lies in another project id.

Copy link
Contributor

@yihua yihua Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so it should only be used bycreateOrUpdateTableUsingBqManifestFile. Could you update the config docs to reflect this so the user knows how this new billing project ID is used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihua I will raise the separate docs PR today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can directly update

.withDocumentation("Name of the billing project id in BigQuery. By default it will use the BIGQUERY_SYNC_PROJECT_ID");

When we cut release docs, the config docs is going to be automatically updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S PR with lines of changes in (10, 100]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants