Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-s3-assets: Object keys don't preserve the full extension name for Python project #30257

Open
marcelokscunha opened this issue May 17, 2024 · 3 comments
Labels
@aws-cdk/aws-s3-assets bug This issue is a bug. effort/small Small work item – less than a day of effort p2

Comments

@marcelokscunha
Copy link

Describe the bug

Same as #12699 but for Python project.

Neither simple upload nor bundling asset with command in Docker container maintain full extension (HASH.tar.gz -> HASH.gz).

"tar.gz" file is expected from SageMaker endpoint.

Expected Behavior

I expect the s3 object key is ended with tar.gz

Current Behavior

When using aws-s3-assets to bundle the asset HASH.tar.gz by executing a command in a Docker container and upload to S3 it is renamed to HASH.gz

Same occurs when passing HASH.tar.gz directly and is uploaded to S3 but renamed to HASH.gz

Reproduction Steps

Reproduction Steps for Docker (same behavior when omitting bundling and passing path="path/my.tar.gz"

Code:

import pathlib

import aws_cdk as cdk
from aws_cdk import (
    Stack,
    aws_s3_assets,
)
from constructs import Construct

HERE = pathlib.Path(__file__).parent

class CdkErrorStack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        self.models_s3_artifact = aws_s3_assets.Asset(
            self,
            "Models",
            path=str(HERE),
            bundling=cdk.BundlingOptions(
                image=cdk.DockerImage.from_registry("debian"),
                command=[
                    "bash", "-c", "tar -czvf /asset-output/dummy.tar.gz --files-from /dev/null"
                ],
                output_type=cdk.BundlingOutput.ARCHIVED
            )
        )
        cdk.CfnOutput(self, "TarURI", value=self.models_s3_artifact.s3_object_url)

It is possible to see that locally in cdk.out the tarball has correct extensions:
error

But in S3 and CFN output is wrong:
errors3

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.142.1

Framework Version

No response

Node.js Version

v18.17.1

OS

MacOSX

Language

Python

Language Version

Python 3.9.12

Other information

No response

@marcelokscunha marcelokscunha added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 17, 2024
@ashishdhingra ashishdhingra self-assigned this May 17, 2024
@ashishdhingra ashishdhingra added needs-reproduction This issue needs reproduction. and removed needs-triage This issue or PR still needs to be triaged. labels May 17, 2024
@ashishdhingra
Copy link
Contributor

ashishdhingra commented May 17, 2024

Reproducible using below code:

import pathlib

import aws_cdk as cdk
from aws_cdk import (
    Stack,
    aws_s3_assets,
    aws_lambda
)
from constructs import Construct

HERE = pathlib.Path(__file__).parent

class Issue30257Stack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        self.models_s3_artifact = aws_s3_assets.Asset(
            self,
            "Models",
            path=str(HERE),
            bundling=cdk.BundlingOptions(
                image=cdk.DockerImage.from_registry("debian"),
                command=[
                    "bash", "-c", "tar -czvf /asset-output/dummy.tar.gz --files-from /dev/null"
                ],
                output_type=cdk.BundlingOutput.ARCHIVED
            )
        )
        lambdaFunction = aws_lambda.Function(
           self, "TestLambdaFunction",
           runtime=aws_lambda.Runtime.NODEJS_18_X,
           code=aws_lambda.Code.from_asset("lambda"),
           handler="hello.handler",
           environment=dict(
                S3_BUCKET_NAME=self.models_s3_artifact.s3_bucket_name,
                S3_OBJECT_KEY=self.models_s3_artifact.s3_object_key,
                S3_OBJECT_URL=self.models_s3_artifact.s3_object_url)

        )
        cdk.CfnOutput(self, "TarURI", value=self.models_s3_artifact.s3_object_url)

Output of cdk synth below (notice the value in Outputs):

Resources:
  TestLambdaFunctionServiceRole0C9E0634:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
        Version: "2012-10-17"
      ManagedPolicyArns:
        - Fn::Join:
            - ""
            - - "arn:"
              - Ref: AWS::Partition
              - :iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
    Metadata:
      aws:cdk:path: Issue30257Stack/TestLambdaFunction/ServiceRole/Resource
  TestLambdaFunctionC089708A:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        S3Bucket:
          Fn::Sub: cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}
        S3Key: 3e4845b15c28e9e3882e7f9016a42054e94d051bd8662ff76e8e9f9d0ae50c48.zip
      Environment:
        Variables:
          S3_BUCKET_NAME:
            Fn::Sub: cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}
          S3_OBJECT_KEY: 09fdccd2f59ef7a5487e651683b13996cb2ce8e4d258c878b1325cbacea20082.gz
          S3_OBJECT_URL:
            Fn::Sub: s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/09fdccd2f59ef7a5487e651683b13996cb2ce8e4d258c878b1325cbacea20082.gz
      Handler: hello.handler
      Role:
        Fn::GetAtt:
          - TestLambdaFunctionServiceRole0C9E0634
          - Arn
      Runtime: nodejs18.x
    DependsOn:
      - TestLambdaFunctionServiceRole0C9E0634
    Metadata:
      aws:cdk:path: Issue30257Stack/TestLambdaFunction/Resource
      aws:asset:path: asset.3e4845b15c28e9e3882e7f9016a42054e94d051bd8662ff76e8e9f9d0ae50c48
      aws:asset:is-bundled: false
      aws:asset:property: Code
  CDKMetadata:
    Type: AWS::CDK::Metadata
    Properties:
      Analytics: v2:deflate64:H4sIAAAAAAAA/02MUQqDMBBEz+J/3BLTC1Shvy16AFljardqUtwNUsS7F5VCv94wM7wM9DkDneDMqW37dKAGlkrQ9gpnrhc2NTI7YbhsUGwgj7Z3kiM7NeDYtAjLNXorFLwqHv6XV0U4wlKGwW31xnVVu6US7Mh3qnQc4mT3/RblHWV//rVF8C0dtvtHnsGfDGgNJnkxUTpFLzQ6KA9+AZvg6wnMAAAA
    Metadata:
      aws:cdk:path: Issue30257Stack/CDKMetadata/Default
    Condition: CDKMetadataAvailable
Outputs:
  TarURI:
    Value:
      Fn::Sub: s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/09fdccd2f59ef7a5487e651683b13996cb2ce8e4d258c878b1325cbacea20082.gz
...
...

Also reproducible using TypeScript code below (create assets/dummy.tar.gz via command tar -czvf dummy.tar.gz --files-from /dev/null):

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as s3assets from 'aws-cdk-lib/aws-s3-assets';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as path from 'path';

export class TypescriptStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const lambdaAsset = new s3assets.Asset(this, "LambdaAsset", {
      path: path.join(path.resolve(__dirname, '..'), 'assets/dummy.tar.gz')
    });

    new lambda.Function(this, "myLambdaFunction", {
      code: lambda.Code.fromAsset('lambda'),
      runtime: lambda.Runtime.NODEJS_18_X,
      handler: "index.lambda_handler",
      environment: {
        'S3_BUCKET_NAME': lambdaAsset.s3BucketName,
        'S3_OBJECT_KEY': lambdaAsset.s3ObjectKey,
        'S3_OBJECT_URL': lambdaAsset.s3ObjectUrl
      }
    });
    new cdk.CfnOutput(this, "AssetURI", {
      value: lambdaAsset.s3ObjectUrl
    });
  }
}

The related issue #12699 was fixed for CDK v1 (which is depreciated now). Perhaps the same fix needs to be ported for CDK v2.

@ashishdhingra ashishdhingra added p2 effort/small Small work item – less than a day of effort and removed needs-reproduction This issue needs reproduction. labels May 17, 2024
@ashishdhingra ashishdhingra removed their assignment May 17, 2024
@ashishdhingra ashishdhingra added p1 and removed p2 labels May 17, 2024
@pahud
Copy link
Contributor

pahud commented May 18, 2024

OK I read this from SageMaker doc:

SageMaker Neo requires models to be stored as a compressed TAR file. Repackage it as a compressed TAR file (*.tar.gz):

And that model has to be uploaded to an Amazon S3 customer-managed bucket. Is this your use case?

If that is the case, I don't think you should use S3 Assets like that. You should use s3 bucket deployment this way:

import {
aws_s3 as s3,
aws_s3_deployment as s3d,
} from 'aws-cdk-lib';

export class DummyStack extends Stack {
	constructor(scope: Construct, id: string, props: StackProps) {
		super(scope, id, props);
    const sageMakerModelBucket = new s3.Bucket(this, 'SageMakerModelBucket', {
      removalPolicy: RemovalPolicy.DESTROY,
    })
    const deployed = new s3d.BucketDeployment(this, 'SageMakerModelDeployment', {
      destinationBucket: sageMakerModelBucket,
      sources: [s3d.Source.asset(path.join(__dirname, '../assets'))],
      extract: true,
    })

   new CfnOutput(this, 'SageMakerModelBucketOutput', { value: `s3://${sageMakerModelBucket.bucketName}`});
  };
}
  1. prepare your dummy.tar.gz in local assets dir:
% ls -al assets/dummy.tar.gz 
  1. npx cdk deploy this stack

You should see the bucketName in the output. Try aws s3 ls it:

Outputs:
dummy-stack.SageMakerModelBucketOutput = s3://dummy-stack-sagemakermodelbucket6aea4c01-xxx
% aws s3 ls s3://dummy-stack-sagemakermodelbucket6aea4c01-xxx
2024-05-18 08:31:30       1610 dummy.tar.gz

That's it!

Generally, aws-s3-assets is to create CDK assets needed by CDK App, for example, lambda functions or anything that requires staging assets to be stored in the cdk file assets bucket created in cdk bootstrap in the format of cdk-hnb659fds-assets-<YOUR_ACCOUNT_ID>-us-east-1. Assets in that bucket is fully managed by CDK and we should not expect the assets to be stored in a customized way. In your use case, you should deploy the local tar.gz into a user-managed bucket rather than the CDK-managed assets bucket. You literally can specify ANY buckets you manage and deploy that local tar.gz into that bucket exactly as the local filename.

I hope it clarifies. Let me know if it works for you.

@pahud pahud added p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed p1 labels May 18, 2024
@marcelokscunha
Copy link
Author

Hi @pahud,

I'm deploying a custom PyTorch inference logic as part of a SageMaker endpoint. It's a standard flow as described here, where I've created a custom inference logic.

I have something similar to the described in the doc. Note that I didn't compressed all into model.tar.gz yet, and I was hoping I could use aws_s3_assets.Asset with that end (as this is only for packing and deploying the endpoint I don't want to keep running multiple tar -czvf ... commands and the Asset's functionality fits perfectly doing that for us):

# Current local structure
model_dir
|- model.pth
|- code/
  |- inference.py
  |- requirements.txt  # only for versions 1.3.1 and higher

I've developed the model locally and now inference is working locally (have the pre-trained weights model.pth and the working inference.py for custom inference logic). To deploy to the endpoint one must create the tarball, upload to S3 and create the endpoint.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-s3-assets bug This issue is a bug. effort/small Small work item – less than a day of effort p2
Projects
None yet
Development

No branches or pull requests

3 participants