Having a look inside public Amazon’s buckets

Hi there,

today I want to share with you guys the results of my analysis on public Amazon’s buckets.

In the context of my research activity on data security in the cloud and the security of Amazon EC2, I was inspired by the recent analysis performed by Rapid7. After looking at their study, I started wondering which kind of data can be found in public Amazon’s buckets and how easy it would be for anyone to have access to these data. Besides security considerations, another interesting point is to have an idea of what data people (or companies) tend to store in Amazon’s buckets and identify a few popular use-cases.

Therefore, I started googling about the subject and I found Bucket Finder, an interesting and pretty simple tool developed by DigiNinja. Here is how the script works: given a list containing possible bucket names, for each of them the script sends a request to check if the bucket exists and whether it is public (files can be listed) or not. If so, a request for each file is sent in order to identify public files. If a file is public, anyone can download it!

In order to run my experiment, I made use of this script and a list of English names and surnames, which can be found on Packet Storm, together with a personal list of websites that I can’t disclose. Of course, if I was an attacker I would have used a more sophisticated technique to generate more realistic bucket names.

The result of this pretty simple scan was really unexpected. With my huge surprise, I managed to find 13937 public files!
Let’s have a deeper look at the results. Starting from the names and surnames list, I made 74357 attempts and found 1717 buckets of which 92 were public. In these public buckets I found 18779 files of which 13937 were public. These information confirm the expected trend: more than the 5% of buckets are public and it’s very likely to find public files in public buckets.

Images 9262
Videos and Music 1032
Web 1030
Archives 122
Documents 804
Executables/Applications 4
Database Backup 2
Other 1681
All files 13937

public files amazon s3

Most important, let’s have a look at what kind of data can be found in public buckets. Among the 13937 public files at my disposal, most of them were images (67%), in particular personal photos. This confirms the fact that many people use Amazon S3 as a reliable backup system for personal files but is also somewhat surprising: why do people store their personal (and sensitive) data on public buckets? I think there are two possible answers to this question: the first one is negligence and the second one is ease of sharing.

Beside images, a remarkable part of these data was composed by videos, music, documents (.pdf|.doc|.docx|.xls|.xlsx|.ppt|.pptx) and web files (.html|.css|.js|.swf|.php). I have to admit that I was tempted to have a look at each of these files to learn more about their content, but I felt it was not the right thing to do, so I only grabbed a small set of these files and none of them was highly confidential data (fortunately). However, I managed to find a prototype version of a website (what if this happened to a startup working on some innovative idea?) and a bunch of official documents which may potentially contain private information.

I would definitely like to do a deeper study in order to have a more general view on the data stored at Amazon S3 and find out what an attacker could do with it. Also, a more representative dataset may be found by using a smarter technique for generating bucket names.

The take-away from this post is straightforward: never make your buckets public unless you really need to! If you just want to share your data with some friends or colleagues, there is no need to expose your data to the entire world! If you want to share your data, use Amazon S3 policies!

Disclaimer: The only objective of this study is to warn users about the potential risks when storing sensitive data in public Amazon’s buckets. For any of the public files that have been found, no copy has been stored.

2 thoughts on “Having a look inside public Amazon’s buckets

  1. Hi Pasquale! Great post, as usual!

    I don’t think that finding many images, videos and music on Amazon S3 is that surprising though.
    It is a service for developers, the home page for S3 says “Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.”. So I think it is mainly used to serve content that must be displayed on a website faster and more easily: images, videos…
    For instance http://fr.openclassrooms.com/ hosts all of its images on Amazon S3.

    That is an interesting study in any case! Have a good day!

    • Hi Francois, that’s a good point! Indeed, Amazon S3 is meant for serving static content such as images on wesbites. By the way, looking at the data stored in these public buckets, I have the impression that as usual some users are misusing the service.

      Anyways, I think it would be very interesting to go deeper and see what these data are actually used for and what an attacker might be able to do with them!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s