Feature/aliyun oss adapter#180
Conversation
|
FYI I think #110 will make the route replace fixes in this PR unnecessary. |
|
Is there not S3 compatibility? GCS was used for the initial POC but we already support S3 API. |
Good point! OSS does support S3-compatible API (endpoint: s3.oss-{region}.aliyuncs.com), and our use case (GetObject/PutObject for snapshots) is fully covered. The current S3 adapter just needs an endpoint override to work with OSS directly. I'll update this PR to:
|
…tomization - Rename internal/ategcs to internal/objectstorage with clear file separation (storage.go, gcs.go, s3.go) - Add AWS_ENDPOINT_URL_S3 support to S3 adapter, enabling connection to any S3-compatible storage (Alibaba Cloud OSS, MinIO, R2, etc.) - Remove dedicated OSS adapter in favor of S3-compatible mode - Update atelet.yaml with S3+OSS configuration example - Retain ParseObjectURL support for oss:// URL scheme This allows Alibaba Cloud OSS users to use the existing S3 adapter by simply setting AWS_ENDPOINT_URL_S3 to the OSS S3-compatible endpoint (e.g. https://s3.oss-cn-hangzhou-internal.aliyuncs.com).
ec45241 to
c35ab30
Compare
| serverboot.Fatal(ctx, "Failed to load S3 config", err) | ||
| } | ||
| s3Client = s3.NewFromConfig(cfg, func(o *s3.Options) { | ||
| if endpoint := os.Getenv("AWS_ENDPOINT_URL_S3"); endpoint != "" { |
There was a problem hiding this comment.
Should we be replacing AWS_ENDPOINT_URL references then?
| value: http://opentelemetry-collector.gke-managed-otel.svc.cluster.local:4317 | ||
| - name: ATE_STORAGE_BACKEND | ||
| value: "gcs" | ||
| # For S3-compatible storage (e.g. Alibaba Cloud OSS, MinIO, Cloudflare R2): |
There was a problem hiding this comment.
see also the kind overlay (local conformant cluster, no cloud ties), which should probably be updated
feat: Add Alibaba Cloud OSS storage backend & fix gVisor network restore on ACK clusters
Problem
OSS Storage Backend
Agent Substrate currently only supports GCS as the snapshot storage backend, preventing deployment in Alibaba Cloud environments. OSS support is needed for multi-cloud compatibility.
gVisor Network Restore Failure
On ACK (Alibaba Container Service for Kubernetes) clusters, ResumeActor fails 100% of the time with:
Root cause: Terway CNI uses 169.254.1.1 (link-local) as the pod gateway. This address is unreachable from gVisor's interior network namespace since it exists on the host side of the veth pair.
Solution
OSS Adapter
oss://bucket/pathURL scheme in ParseObjectURLATE_STORAGE_BACKEND=oss,OSS_ENDPOINT,OSS_ACCESS_KEY_ID,OSS_ACCESS_KEY_SECRETLink-Local Route Graceful Fallback
restoreLink(), whennetlink.RouteReplace()fails, check if the gateway is a link-local address usingnet.IP.IsLinkLocalUnicast()Testing
Unit Tests
Benchmark Results
Environment: ACK cluster, OSS internal endpoint (oss-ap-northeast-1-internal.aliyuncs.com), single worker serial loop.
Acceptance Results