Use S3 instead of Google

This might be a slightly different post of mine as it is not directly related to my work. However, I believe it is a good example of how I used my technical skills in my personal life. The case was relatively simple: I wanted to save money and yet keep my data safe in the cloud. Let me explain the situation first.

Situation

For a long time I had chosen to keep separate online archives for my photos and data. I used Google Photos for my old photos from the old life, the time that I was not happy to remind on daily or even monthly basis. Then I decided to use Amazon Photos for the new life and the contents I felt happy to look at every day and sharing with my friends. Since I do take a ton of photos, I had to buy the premium account so soon, 2$/month for 100GB.

Photo is not the only data I keep online. I also used Google Drive for keeping old docs such as documents about my university and OneDrive for some other docs about the college. I didn’t have any recent uses for these files, yet they were not useless enough to be removed. On the other hand, for more frequently used docs I had to switch to iCloud. It was easy to use as I was mainly using my iPhone and iPad and I could get access to them with least efforts. Since I had already separated photos, I never had to worry about the limit of 5GB storage on iCloud.

I think you now have a better idea of the problem: it was such a mess. Even though I was not eager to review some of them, I couldn’t afford to lose them. Then there came another obstacle: I had to switch from my old iPhone to an Android device in the past winter, so you can imagine the headaches of syncing and accessing my day-to-day docs from the new device. Problems in synchronization, data being all over the place, and the increasing costs of Amazon Photos made me think about a solution: How about archiving all my data in one place?

Solution

So if you have read my other posts (or even check the homepage of this blog), you know that I am an DevOps engineer that deal with Cloud at a daily basis. It is my job to find solutions for the cases above, and I am paid for doing so. So I started thinking with myself: why can’t I see myself as my own employer? How can I help myself with this problem? So the journey began.

I started by reading about S3 Class options on AWS. I knew S3 and I liked how it works, but a general purpose bucket would be too expensive for me as I had around 200GB of data which I don’t use frequently for the majority of them. So I started with the following plan:

1. Gather the data

I first started by gathering all of my data. It wasn’t easy, because as much as these online services encourage you to bring your data, they make it as hard as possible to take them away. Heh! this is another reason for my journey!

So I started slowly to download my data, my Google photos, my Amazon photos, my documents on iCloud and Google and OneDrive. It took several days for this process, but I finally managed to organize them in separate folders.

2. Find and remove duplicates

If you have ever worked with Cloud, you know that they mostly charge you based on either the number of items or the size of your data. Apart from that, who wants to see a same photo being duplicated in their gallery? So next step was to find and remove duplicates. For this, I made a simple Python script to iterate over folders, find duplicated files and then move them into a recycle bin for later review. Considering the number of items and size, this took almost a day on my M1 Mac Mini to finish, but I was quite happy with the results.

3. Archive the data

This was another important step. If you ever tried to use S3 Glacier classes, you know that you are charged for two things:

  1. The size of items being fetched or stored AND
  2. Number of items being transferred.

Which in practice means you need to use “bulk uploads” to avoid transferring millions of requests (well if you don’t want to go bankrupt). Even though this has a downside for specific cases, in my case it was totally fine. If in the future I decide to review those old files, I don’t mind downloading all of the archive and extract it. This, however, doesn’t look like a good plan for a web service.

4. Find the class

I had to choose which classes of S3 to use. They may seem all the same but in fact they offer different price ranges for different use-cases. To start I decided to organizing my data with in these categories:

Based on the analysis above, I decided to use the following classes:

Results

I used to pay 2$ for 100GB of data on Amazon Photos, which was filled almost by 95%. The next available option would cost me 10$ for 1TB. So with all the above steps, I managed to not just get all of my data in one place, but reduce the monthly fee from 10$ to just above 1.1$ for 200GB of data. I also have a better organized data, which is a huge plus for me.