Skip to main content

Google Cloud Storage Setup

Configure Google Cloud Storage (GCS) for Kafka backups.

Prerequisites

  • Google Cloud project
  • gcloud CLI installed and configured
  • Permissions to create buckets and service accounts

Create GCS Bucket

Using gcloud CLI

# Set variables
PROJECT_ID="my-project"
BUCKET_NAME="my-kafka-backups"
REGION="us-west1"

# Create bucket
gcloud storage buckets create gs://$BUCKET_NAME \
--project=$PROJECT_ID \
--location=$REGION \
--uniform-bucket-level-access

# Enable versioning
gcloud storage buckets update gs://$BUCKET_NAME --versioning

# Set lifecycle policy
cat > lifecycle.json << 'EOF'
{
"lifecycle": {
"rule": [
{
"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
"condition": {"age": 30}
},
{
"action": {"type": "SetStorageClass", "storageClass": "COLDLINE"},
"condition": {"age": 90}
},
{
"action": {"type": "Delete"},
"condition": {"age": 365}
}
]
}
}
EOF

gcloud storage buckets update gs://$BUCKET_NAME --lifecycle-file=lifecycle.json

Using Terraform

gcs.tf
resource "google_storage_bucket" "kafka_backups" {
name = "my-kafka-backups"
location = "US-WEST1"
project = var.project_id

uniform_bucket_level_access = true

versioning {
enabled = true
}

lifecycle_rule {
condition {
age = 30
}
action {
type = "SetStorageClass"
storage_class = "NEARLINE"
}
}

lifecycle_rule {
condition {
age = 90
}
action {
type = "SetStorageClass"
storage_class = "COLDLINE"
}
}

lifecycle_rule {
condition {
age = 365
}
action {
type = "Delete"
}
}
}

Authentication

Service Account

# Create service account
gcloud iam service-accounts create kafka-backup \
--display-name="Kafka Backup Service Account" \
--project=$PROJECT_ID

# Get service account email
SA_EMAIL="kafka-backup@${PROJECT_ID}.iam.gserviceaccount.com"

# Grant bucket access
gcloud storage buckets add-iam-policy-binding gs://$BUCKET_NAME \
--member="serviceAccount:$SA_EMAIL" \
--role="roles/storage.objectAdmin"

# Create key file
gcloud iam service-accounts keys create kafka-backup-key.json \
--iam-account=$SA_EMAIL

Workload Identity (GKE)

# Enable workload identity on cluster
gcloud container clusters update my-cluster \
--zone=us-west1-a \
--workload-pool=${PROJECT_ID}.svc.id.goog

# Create Kubernetes service account
kubectl create serviceaccount kafka-backup -n kafka-backup

# Bind Kubernetes SA to Google SA
gcloud iam service-accounts add-iam-policy-binding $SA_EMAIL \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:${PROJECT_ID}.svc.id.goog[kafka-backup/kafka-backup]"

# Annotate Kubernetes SA
kubectl annotate serviceaccount kafka-backup \
-n kafka-backup \
iam.gke.io/gcp-service-account=$SA_EMAIL

Application Default Credentials

On GCE VMs or Cloud Run:

# No explicit configuration needed
# Uses instance metadata service automatically

Configuration

With Service Account Key

backup.yaml
storage:
backend: gcs
bucket: my-kafka-backups
prefix: production/daily
service_account_json: /path/to/kafka-backup-key.json

With Environment Variable

backup.yaml
storage:
backend: gcs
bucket: my-kafka-backups
prefix: production/daily
# Uses GOOGLE_APPLICATION_CREDENTIALS environment variable

With Workload Identity (GKE)

backup.yaml
storage:
backend: gcs
bucket: my-kafka-backups
prefix: production/daily
# No credentials needed - uses workload identity

Environment Variables

VariableDescription
GOOGLE_APPLICATION_CREDENTIALSPath to service account JSON key
GOOGLE_CLOUD_PROJECTDefault project ID
CLOUDSDK_CORE_PROJECTAlternative project variable

IAM Roles

RoleDescription
roles/storage.objectViewerRead backups
roles/storage.objectCreatorCreate backups
roles/storage.objectAdminFull access (recommended)
roles/storage.adminBucket management

Minimum required permissions:

- storage.objects.create
- storage.objects.delete
- storage.objects.get
- storage.objects.list

Storage Classes

ClassUse CaseMinimum StorageRetrieval Cost
STANDARDFrequent accessNoneFree
NEARLINEMonthly access30 days$
COLDLINEQuarterly access90 days$$
ARCHIVEYearly access365 days$$$

Set Default Storage Class

gcloud storage buckets update gs://$BUCKET_NAME \
--default-storage-class=NEARLINE

Dual-Region / Multi-Region

For high availability:

# Create multi-region bucket
gcloud storage buckets create gs://$BUCKET_NAME \
--location=US \
--uniform-bucket-level-access

# Create dual-region bucket
gcloud storage buckets create gs://$BUCKET_NAME \
--location=NAM4 \
--uniform-bucket-level-access

Location options:

TypeExamplesUse Case
Regionus-west1Single region
Dual-regionNAM4 (Iowa + SC)HA within continent
Multi-regionUS, EU, ASIAGlobal access

Object Lifecycle

# View current lifecycle
gcloud storage buckets describe gs://$BUCKET_NAME --format="json(lifecycle)"

# Update lifecycle
cat > lifecycle.json << 'EOF'
{
"lifecycle": {
"rule": [
{
"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
"condition": {"age": 30, "matchesPrefix": ["production/"]}
},
{
"action": {"type": "Delete"},
"condition": {"age": 365}
},
{
"action": {"type": "Delete"},
"condition": {"numNewerVersions": 3}
}
]
}
}
EOF

gcloud storage buckets update gs://$BUCKET_NAME --lifecycle-file=lifecycle.json

Security Best Practices

Enable Uniform Bucket-Level Access

gcloud storage buckets update gs://$BUCKET_NAME \
--uniform-bucket-level-access

Enable Object Versioning

gcloud storage buckets update gs://$BUCKET_NAME --versioning

Configure CMEK (Customer-Managed Encryption Keys)

# Create key ring
gcloud kms keyrings create kafka-backup-ring \
--location=us-west1 \
--project=$PROJECT_ID

# Create key
gcloud kms keys create kafka-backup-key \
--keyring=kafka-backup-ring \
--location=us-west1 \
--purpose=encryption \
--project=$PROJECT_ID

# Set bucket encryption
gcloud storage buckets update gs://$BUCKET_NAME \
--default-encryption-key=projects/$PROJECT_ID/locations/us-west1/keyRings/kafka-backup-ring/cryptoKeys/kafka-backup-key

Enable Access Logging

# Create logging bucket
gcloud storage buckets create gs://${BUCKET_NAME}-logs \
--location=$REGION

# Enable logging
gcloud storage buckets update gs://$BUCKET_NAME \
--log-bucket=gs://${BUCKET_NAME}-logs

Testing

Verify Access

# Test with gcloud
gcloud storage ls gs://$BUCKET_NAME/

# Test write
echo "test" | gcloud storage cp - gs://$BUCKET_NAME/test.txt
gcloud storage rm gs://$BUCKET_NAME/test.txt

# Test with gsutil
gsutil ls gs://$BUCKET_NAME/

Test Authentication

# Check active account
gcloud auth list

# Test with service account
gcloud auth activate-service-account --key-file=kafka-backup-key.json
gcloud storage ls gs://$BUCKET_NAME/

Test from Application

# Set credentials
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/kafka-backup-key.json"

# Run backup
kafka-backup backup --config backup.yaml

# List backups
kafka-backup list --path gs://$BUCKET_NAME/production/daily

Kubernetes Deployment

deployment.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: kafka-backup
namespace: kafka-backup
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: kafka-backup # For workload identity
containers:
- name: kafka-backup
image: ghcr.io/osodevops/kafka-backup:latest
args: ["backup", "--config", "/config/backup.yaml"]
volumeMounts:
- name: config
mountPath: /config
# Only needed if not using workload identity:
# - name: gcp-sa
# mountPath: /var/secrets/google
# env:
# - name: GOOGLE_APPLICATION_CREDENTIALS
# value: /var/secrets/google/key.json
volumes:
- name: config
configMap:
name: kafka-backup-config
# - name: gcp-sa
# secret:
# secretName: gcp-sa-key

Troubleshooting

Permission Denied

# Check IAM policy
gcloud storage buckets get-iam-policy gs://$BUCKET_NAME

# Check service account permissions
gcloud projects get-iam-policy $PROJECT_ID \
--filter="bindings.members:$SA_EMAIL" \
--format="table(bindings.role)"

Workload Identity Issues

# Verify workload identity binding
gcloud iam service-accounts get-iam-policy $SA_EMAIL

# Check pod service account annotation
kubectl get sa kafka-backup -n kafka-backup -o yaml

Slow Transfers

  • Use regional buckets close to your Kafka cluster
  • Enable parallel composite uploads
  • Check network bandwidth

Next Steps