A quick guide to S3 Batch Operations

Thrupthi
6 min readJun 11, 2021

--

S3 Batch Operations is a simple solution from AWS to perform large-scale storage management actions like copying of objects, tagging of objects, changing access controls, etc. It makes working with a large number of S3 objects easier and faster.

S3 Batch Operations can be used to perform the below tasks:

  1. Copy objects to the required destination
  2. Replace all object tags
  3. Delete all object tags
  4. Replace Acess Control list (ACL)
  5. Restore archived objects
  6. Enable Object Lock
  7. Enable Object Lock legal hold
  8. Invoke AWS Lambda function to perform complex data processing

In this article, we will look at how to create object tags using S3 Batch Operations. We will generate an inventory report for a test S3 bucket, create and run the S3 Batch Job to create tags, and use the newly tagged object in the lifecycle policy.

Let’s get started!

Setup:

To begin with, create a test bucket and upload few objects.

S3 bucket created with objects
Objects without tags

Note that, all the uploaded objects do not have any tags attached to them.

Inventory Setup:

AWS S3 provides automated inventory, providing visibility of S3 objects which would otherwise be very tedious when dealing with millions of objects.

Let’s set up inventory on the S3 bucket to pull the required info about the S3 objects. Go to the ‘Management’ section and ‘Inventory configurations’ and click on ‘Create inventory configuration’.

create new inventory

Enter the inventory name and choose the scope of inventory creation.

inventory name and scope

Under ‘Report details’, enter the destination bucket for pushing the generated inventory reports. An S3 bucket policy will automatically be created and applied to the destination bucket.

inventory report details

Choose the frequency, format, and encryption in which the inventory reports have to be generated.

frequency, format, and encryption of report

Choose any additional fields as required and create the inventory.

additional fields in the report

The first inventory report will take up to 48 hrs to generate and will be published in the destination provided. Folders with dates in the name will contain manifest files and a resultant inventory list under the data folder.

Inventory
Manifest files folder

manifest.json contains details of all S3 object details that satisfy the condition for the current inventory report. ‘fileSchema’ contains all the object properties that are collected in the inventory report. ‘files’ provides the path for the resultant inventory list.

manifest.json file example

manifest.checksum file is the MD5 content of the manifest.json file created to ensure integrity.

The data folder contains the CSV inventory files which are generated based on the frequency set in inventory configuration.

CSV inventory files

Inventory is now ready to be configured with S3 batch operations.

IAM Role Creation:

Next up, an IAM Role is required, that grants access to S3 Batch Operations on the S3 bucket to perform required actions.

Create an IAM policy with the below JSON after updating the name of your S3 bucket. This link provides additional info on permissions required for different operations.

{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:PutObject",
"s3:PutObjectAcl",
"s3:PutObjectTagging"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::s3-batch-trial-test/*"
},
{
"Action": [
"s3:GetObject",
"s3:GetObjectAcl",
"s3:GetObjectTagging",
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::s3-batch-trial-test",
"arn:aws:s3:::s3-batch-trial-test/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": [
"arn:aws:s3:::s3-batch-trial-test/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::s3-batch-trial-test/*"
]
}
]
}

Create an IAM role with any AWS service and attach the IAM policy created in the previous step. Update the trust relationship of the role to trust S3 batch operations.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "batchoperations.s3.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
IAM policy attached
Trust relationships updated

We have all the necessary items checked to proceed to setup our first S3 batch operations job.

Setting up batch job:

S3 Batch Operations can be accessed via the S3 console on the left-hand pane. Click on ‘Create job’ to start congiuring.

Choose the region for setting up the job. Select the path of inventory manifest.json.

The other option is to directly import the CSV file which contains the object details on which you want to perform the batch operation. CSV file must contain bucket name, object name (and object version in case of versioned files).

manifest.csv file

Next, choose the operation you want to perform. For this article, let’s try replacing the object tags. Enter the tag name that must be added or updated.

Choose the operation
Replace tag operation

Next, proceed to configure additional properties. Enter the Description and set a job Priority. Choose an S3 path to store the logs of batch jobs.

Additional properties

Choose the IAM role created in previous section from the dropdown.

Choose IAM role

Review the configuration and proceed to create the job.

Dashboard of jobs

Once the job is successfully created, status will be set to ‘Awaiting your confirmation to run’. In case of any failures to create the job, check the job report file stored in the path provided earlier, fix the error and clone the job to proceed with previous configuration.

Select the job and click on ‘Run job’. Review the settings and run it.

Run the job

Status of the job changes to Ready > Active > Completed.

Let’s check the properties of the object to see if the tags are added and here we go!

Tags are added
Another view!

We can now use the newly tagged object as filters in lifecycle policy.

Tag based lifecycle policy

Let me give you an actual example of use of S3 batch operations. We had to set lifecycle policies across all buckets that would transition S3 objects to Glacier, 90 days after their creation. In one of the cases, we had to copy S3 objects from one bucket to another, which made S3 objects lose their original ‘last modified date’. As a result of this, lifecycle policies that were required to transition objects to Glacier did not run on the destination bucket, even though the objects were older than 90 days. This led to increased S3 cost. S3 Batch Operations was then used to re-tag the objects and then transition them to the correct storage class, using lifecycle policies.

In summary, S3 batch operations can be used to perform otherwise tedious S3 operational tasks easily with very few simple steps saving lot of time and efforts!

--

--