-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: S3 SDK users might believe that OpenDAL does not support multipart upload pre-sign URL #4627
Comments
Hi, OpenDAL has its own abstractions that hide implementation details. By design, we do not expose these APIs to users. When users create a This design allows us to build a unified API layer across azblob, gcs and many other storage services. No matter which storage the users use, they only need to: let mut w = op.writer(path).await?;
w.write(bs1).await?;
w.write(bs2).await?;
w.close()?; |
I thought we discussed pre-sign support for multipart in #4282. It seems there's some misunderstanding here. So:
|
My idea was to find a way to generate pre-signed URLs for multipart operations and let the client app run them, such as on the browser or mobile. It will inevitably call underlying apis like So, I just dump my thoughts from other SDK usage experiences. |
That's what opendal wants to avoid. The problem is finding a good API design that can generate such an upload URL for users without leaking the storage details. |
I'm guessing the most important part is allowing the client to upload data, right? The initiate multipart and complete multipart can be called directly from the server. Is supporting URL generation for part uploads sufficient for you? |
Yes, it allows the client to upload data, but it's through multipart operations if the case is for a large object (like over 100 MB in size). The purpose is that the upload operation can continue in any part if the network connection is suddenly broken. In addition, it may need ListParts to know the current progress and continue it.
No, my reason is above. But I know your point of view. I'm thinking about it. |
I dump my rough idea here. # trigger the underlying CreateMultipartUpload API and make a MultiPartUploadFile including UploadID
multipart_file = op.multipart("filename")
# Calling the part(...) will get a PartUploadFile
# Calling presign for a PartUploadFile object to expose a URL to serve the client
multipart_file.part(number).presign(expire_second)
# or
multipart_file.presign_part(part_number: int, expire_second: int)
# Calling the underlying CompleteMultipartUpload API to tell the S3 server
multipart_file.complete([{'ETag': etag, 'PartNumber': part}...]) I'm happy to hear your feedback. |
Hi, as I mentioned in previous comments, OpenDAL's vision is to provide free data access. Any design that violates this vision is unlikely to be accepted. In reality, opendal users should be able to write code without knowing which underlying storage service is being used. In our current design, there are following places that doesn't work:
I think it's hard to create such an abstraction for multipart presign since it needs the data returned by Other storage services like |
Oops, I thought that gcs and azblob had the same ability, but they actually don't have that. |
A Python code snippet for a multipart upload demo
In the OpenDAL SDK, I believe the following three APIs are missing to fully support multipart uploads, which are available in the AWS S3 Python SDK (boto3):
Note: Copied from boto/boto3#2305 (comment)
The provided Python code snippet demonstrates how these APIs are used in boto3 for a multipart upload. This could serve as a reference for implementing similar functionality in OpenDAL.
Additional Reference
It's worth noting that for full compatibility with the AWS S3 Multipart Uploads API.
If the OpenDAL team needs an S3 API compatibility document for reference, it's the following.
(Note: copied from minio's s3-api-compatibility for the Multipart Uploads part)
The text was updated successfully, but these errors were encountered: