Copy S3 Bucket Objects Across Separate AWS Accounts Programmatically

Mark Gituma
4 min readFeb 27, 2019

--

There are quite a few tutorials which focus on how to transfer objects across s3 buckets, but they largely relying on the terminal by making use of the aws cli. However, I had a requirement where I had to automate the transfer of a fairly large object from one bucket to another in a separate account, and this action was to occur daily. An option to do this was to set up a server and create a cron job to do the transfer, however the issue with this is due to the periodicity of the task, the server will spend quite a lot of time doing nothing. As such, AWS Lambda proved to be a good candidate as it was free for my use case which were below the free tier limits.

In this tutorial I will be showing how to transfer objects across 2 separate AWS accounts using AWS Lambda with certain constraints which include:

  • AWS Lambda provides no access to the terminal, so unable to use the AWS CLI.
  • AWS Lambda has a maximum memory limit of 3008 MB (at the time of the writing of this document). This means unable to store large files on the file system during transfer.
  • The AWS accounts have separate AWS credentials which becomes problematic when creating a boto3 session.

Prerequisites

  • Python boto3 library installed.
  • IAM users in the source and destination AWS accounts which implies 2 sets of aws_access_key_id and aws_secret_access_key key ids (see this doc to create IAM user for AWS account).
  • Two AWS accounts with S3 buckets configured (one as the source S3 bucket and another as the destination S3 bucket). It assumed the buckets are not publicly accessible and thus will need an IAM user to perform actions on the buckets.
  • An AWS Lambda instance with appropriate credentials (optional as the python code can run in any location).

I am not going to focus on how to install boto3, set up the AWS IAM users or configure AWS Lambda as there are quite a lot of tutorials on how to do that. My main focus is on the specific setup that allows for programmatic transfer.

Step 1: Setting up the AWS S3 source bucket policy

Attach the following policy to the source bucket (instructions can be found in the following doc).

{
"Version": "2012-10-17",
"Id": "Policy1546558291129",
"Statement": [
{
"Sid": "Stmt1546558287955",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<AWS_IAM_USER>"
},
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::<SOURCE_BUCKET>/",
"Resource": "arn:aws:s3:::<SOURCE_BUCKET>/*"
}
]
}

The above policy should be fairly intuitive if you have configured an AWS bucket before, we define the Principal as the user that will be doing the operations listed in Actions on the objects within the given Resource.

Sometimes setting the correct Action permissions can be a challenging task, so I suggest you use the insecure”Action”: “s3:*” configuration first to run your experiments to make sure everything works before securing further.

Step 2: Setting up the AWS S3 destination bucket policy

The destination bucket policy is similar to the source policy as can be seen below.

{
"Version": "2012-10-17",
"Id": "Policy22222222222",
"Statement": [
{
"Sid": "Stmt22222222222",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<AWS_IAM_DESTINATION_USER>",
"arn:aws:iam::<AWS_IAM_LAMBDA_ROLE>:role/
]
},
"Action": [
"s3:ListBucket",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": "arn:aws:s3:::<DESTINATION_BUCKET>/",
"Resource": "arn:aws:s3:::<DESTINATION_BUCKET>/*"
}
]
}

The only exceptions are that in the Principal field, we defined 2 principals, where the first is the IAM user for the destination bucket, and the second which is optional is for the AWS Lambda role. The Action field is also different in that it allows the users to store objects into the bucket. (Again if you run into issues set ”Action” as “s3:*” to make sure the flow is working and then you can secure further as necessary).

Step 3: The code

The following bit of code is what allows for programmatic uploading which can be automated.

source_client = boto3.client(
's3',
SOURCE_AWS_ACCESS_KEY_ID,
SOURCE_AWS_SECRET_ACCESS_KEY,
)
source_response = soure_client.get_object(
Bucket=<SOURCE_BUCKET>,
Key=<OBJECT_KEY>
)
destination_client = boto3.client(
's3',
DESTINATION_AWS_ACCESS_KEY_ID,
DESTINATION_AWS_SECRET_ACCESS_KEY,
)
destination_client.upload_fileobj(
source_response['Body'],
DESTINATION_BUCKET,
<FOLDER_LOCATION_IN_DESTINATION_BUCKET>,
)

From the above code we need to create 2 boto3 client sessions using the source and destination IAM user credentials. This is because when a boto3 client session is created it can only hold a single users credentials (as far as I know). In this case we have a source_client and a destination_client session.

From the source_client session, we can get the object required by setting the OBJECT_KEY and theSOURCE_BUCKET in the get_object method. The content for this operation is returned as a StreamingBody using the ‘Body' key (see the doc if you want to delve further). A stream is a collection of data which doesn’t have to be available at once meaning the data doesn’t have to fit in memory and can be read in chunks.

The second part is to allow the uploading of the SOURCE_OBJECT to our destination bucket. This is done using the upload_fileobject function on our destination_client. The beauty of this is that it does it using a multipart upload which means it uploads the data in chunks in multiple threads therefore bypassing the need to write to the file system.

These streaming download and multipart upload works well together and allows the transfer of very large files with only a small amount of memory available for storage.

Conclusion

This problem took me a couple of days to figure out but I was quite happy with the solution. Hopefully it will save you time to do more important stuff.

If you like this post, don’t forget to like and/or recommend it. You can find me on Twitter as @MarkGituma.

References

--

--