S3 Incomplete Multipart Uploads Are Dangerous
Dhaval Nagar / CEO
The Hidden S3 Trap
Every experienced AWS engineer knows that Amazon S3 is simple, until it isn’t. Recently, a developer shared a jaw-dropping story on Reddit: they discovered over 1 TB of invisible data consuming space (and money) in an S3 bucket. The culprit? Incomplete multipart uploads.
These uploads were never completed, never visible in the bucket listing, and yet were silently increasing the S3 bills.
Let’s break down how that happens and how you can avoid being the next headline in AWS Bill Horror Stories.
Multipart Uploads 101 (and Why They Can Go Wrong)
Multipart uploads are a powerful feature of S3, allowing you to upload large files (over 100 MB, and especially over 5 GB) in parallel parts.
How it works:
- You initiate a multipart upload with CreateMultipartUpload.
- You upload one or more parts (up to 10,000 per object).
- You complete the upload with CompleteMultipartUpload.
If something fails between steps 1–3, the uploaded parts remain in S3, but:
- They don’t appear in the console or aws s3 ls.
- They still count toward your storage costs.
- They persist indefinitely, unless manually or automatically cleaned up.
In serverless workflows such as Lambda + S3 multipart uploads for large files or ETL pipelines — transient failures, timeouts, or code errors can easily leave hundreds or thousands of incomplete uploads behind.
The Cost Shock: +1 TB of Orphaned Data
In this real-world case:
- The Lambda function uploaded large objects to S3 using multipart upload.
- Some invocations timed out or retried before calling CompleteMultipartUpload.
- Each failed attempt left orphaned parts in the bucket.
- Over time, the bucket accumulated 1 TB of incomplete upload data; unseen, but billable.
Because incomplete multipart uploads don’t show up in the S3 Console or CLI listing, they went unnoticed until the monthly cost anomaly report flagged it.
How to Detect Incomplete Multipart Uploads
Here’s how you can check your buckets for such orphaned data.
Option 1: AWS CLI
Run this command to list incomplete uploads:
aws s3api list-multipart-uploads --bucket your-bucket-name
You’ll get a JSON response with ongoing uploads, their keys, and upload IDs. If you see dozens (or hundreds) of entries — that’s your red flag.
You can even script a quick cleanup:
aws s3api list-multipart-uploads --bucket your-bucket-name \
--query 'Uploads[].UploadId' --output text | tr '\t' '\n' | while read upload_id; do
aws s3api abort-multipart-upload --bucket your-bucket-name --key your-object-key --upload-id "$upload_id"
done
(Make sure you replace your-object-key with logic to match the specific object for each upload.)
Option 2: S3 Storage Lens or AWS Cost Explorer
You can correlate S3 storage usage anomalies using:
- S3 Storage Lens (under Metrics and insights → Storage metrics).
- AWS Cost Explorer → S3 usage by storage type (look for “S3 Standard” usage spikes in unexpected buckets).
If the data doesn’t align with your known object count, incomplete uploads may be lurking.
The Real Fix: Lifecycle Policies
Fortunately, AWS offers a built-in safeguard for this — you just need to enable it.
Add a Lifecycle rule to automatically clean up incomplete multipart uploads after a few days:
{
"Rules": [
{
"ID": "AbortIncompleteMultipartUpload",
"Status": "Enabled",
"Filter": {},
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
}
]
}
You can configure this in the S3 console or via Terraform/CloudFormation. This ensures any incomplete multipart uploads older than 7 days are automatically aborted — no surprises, no orphaned data.
Best Practices to Avoid the Trap
Always Abort on Failure:
In Lambda or any backend logic using multipart uploads, explicitly call AbortMultipartUpload when errors occur or retries are triggered.
Use Retry Logic with Caution:
Ensure retries don’t start new multipart sessions unless the previous one is cleaned up.
Automate Lifecycle Rules:
Every bucket handling large or frequent uploads should have an AbortIncompleteMultipartUpload rule — by default.
Set Up Cost Anomaly Detection:
Use AWS Cost Anomaly Detection or CloudWatch metrics to alert you when S3 storage usage spikes unexpectedly.
Use aws s3 sync for Simpler Cases:
If possible, avoid multipart uploads altogether for small to medium objects.
Hidden Costs, Hidden Risks
Incomplete multipart uploads are a silent cost drain — and a reliability risk. They don’t just waste storage; they create uncertainty in usage metrics, impact cost forecasts, and complicate troubleshooting.
For serverless engineers and S3-heavy workloads, the takeaway is clear:
Always assume uploads can fail — and automate the cleanup.
Summary
AWS gives us immense flexibility — but also expects us to handle edge cases responsibly. So, the next time your S3 bill looks suspiciously high, remember: The data you can’t see might still be costing you money.