today I want to share with you guys the results of my analysis on public Amazon’s buckets.
In the context of my research activity on data security in the cloud and the security of Amazon EC2, I was inspired by the recent analysis performed by Rapid7. After looking at their study, I started wondering which kind of data can be found in public Amazon’s buckets and how easy it would be for anyone to have access to these data. Besides security considerations, another interesting point is to have an idea of what data people (or companies) tend to store in Amazon’s buckets and identify a few popular use-cases.
Therefore, I started googling about the subject and I found Bucket Finder, an interesting and pretty simple tool developed by DigiNinja. Here is how the script works: given a list containing possible bucket names, for each of them the script sends a request to check if the bucket exists and whether it is public (files can be listed) or not. If so, a request for each file is sent in order to identify public files. If a file is public, anyone can download it!
In order to run my experiment, I made use of this script and a list of English names and surnames, which can be found on Packet Storm, together with a personal list of websites that I can’t disclose. Of course, if I was an attacker I would have used a more sophisticated technique to generate more realistic bucket names.
The result of this pretty simple scan was really unexpected. With my huge surprise, I managed to find 13937 public files!
Let’s have a deeper look at the results. Starting from the names and surnames list, I made 74357 attempts and found 1717 buckets of which 92 were public. In these public buckets I found 18779 files of which 13937 were public. These information confirm the expected trend: more than the 5% of buckets are public and it’s very likely to find public files in public buckets.
|Videos and Music||1032|
Most important, let’s have a look at what kind of data can be found in public buckets. Among the 13937 public files at my disposal, most of them were images (67%), in particular personal photos. This confirms the fact that many people use Amazon S3 as a reliable backup system for personal files but is also somewhat surprising: why do people store their personal (and sensitive) data on public buckets? I think there are two possible answers to this question: the first one is negligence and the second one is ease of sharing.
Beside images, a remarkable part of these data was composed by videos, music, documents (.pdf|.doc|.docx|.xls|.xlsx|.ppt|.pptx) and web files (.html|.css|.js|.swf|.php). I have to admit that I was tempted to have a look at each of these files to learn more about their content, but I felt it was not the right thing to do, so I only grabbed a small set of these files and none of them was highly confidential data (fortunately). However, I managed to find a prototype version of a website (what if this happened to a startup working on some innovative idea?) and a bunch of official documents which may potentially contain private information.
I would definitely like to do a deeper study in order to have a more general view on the data stored at Amazon S3 and find out what an attacker could do with it. Also, a more representative dataset may be found by using a smarter technique for generating bucket names.
The take-away from this post is straightforward: never make your buckets public unless you really need to! If you just want to share your data with some friends or colleagues, there is no need to expose your data to the entire world! If you want to share your data, use Amazon S3 policies!
Disclaimer: The only objective of this study is to warn users about the potential risks when storing sensitive data in public Amazon’s buckets. For any of the public files that have been found, no copy has been stored.