This might be a slightly different post of mine as it is not directly related to my work. However, I believe it is a good example of how I used my technical skills in my personal life. The case was relatively simple: I wanted to save money and yet keep my data safe in the cloud. Let me explain the situation first.
For a long time I had chosen to keep separate online archives for my photos and data. I used Google Photos for my old photos from the old life, the time that I was not happy to remind on daily or even monthly basis. Then I decided to use Amazon Photos for the new life and the contents I felt happy to look at every day and sharing with my friends. Since I do take a ton of photos, I had to buy the premium account so soon, 2$/month for 100GB.
Photo is not the only data I keep online. I also used Google Drive for keeping old docs such as documents about my university and OneDrive for some other docs about the college. I didn’t have any recent uses for these files, yet they were not useless enough to be removed. On the other hand, for more frequently used docs I had to switch to iCloud. It was easy to use as I was mainly using my iPhone and iPad and I could get access to them with least efforts. Since I had already separated photos, I never had to worry about the limit of 5GB storage on iCloud.
I think you now have a better idea of the problem: it was such a mess. Even though I was not eager to review some of them, I couldn’t afford to lose them. Then there came another obstacle: I had to switch from my old iPhone to an Android device in the past winter, so you can imagine the headaches of syncing and accessing my day-to-day docs from the new device. Problems in synchronization, data being all over the place, and the increasing costs of Amazon Photos made me think about a solution: How about archiving all my data in one place?
So if you have read my other posts (or even check the homepage of this blog), you know that I am an DevOps engineer that deal with Cloud at a daily basis. It is my job to find solutions for the cases above, and I am paid for doing so. So I started thinking with myself: why can’t I see myself as my own employer? How can I help myself with this problem? So the journey began.
I started by reading about S3 Class options on AWS. I knew S3 and I liked how it works, but a general purpose bucket would be too expensive for me as I had around 200GB of data which I don’t use frequently for the majority of them. So I started with the following plan:
I first started by gathering all of my data. It wasn’t easy, because as much as these online services encourage you to bring your data, they make it as hard as possible to take them away. Heh! this is another reason for my journey!
So I started slowly to download my data, my Google photos, my Amazon photos, my documents on iCloud and Google and OneDrive. It took several days for this process, but I finally managed to organize them in separate folders.
If you have ever worked with Cloud, you know that they mostly charge you based on either the number of items or the size of your data. Apart from that, who wants to see a same photo being duplicated in their gallery? So next step was to find and remove duplicates. For this, I made a simple Python
script to iterate over folders, find duplicated files and then move them into a recycle bin for later review. Considering the number of items and size, this took almost a day on my M1 Mac Mini to finish, but I was quite happy with the results.
This was another important step. If you ever tried to use S3 Glacier classes, you know that you are charged for two things:
Which in practice means you need to use “bulk uploads” to avoid transferring millions of requests (well if you don’t want to go bankrupt). Even though this has a downside for specific cases, in my case it was totally fine. If in the future I decide to review those old files, I don’t mind downloading all of the archive and extract it. This, however, doesn’t look like a good plan for a web service.
I had to choose which classes of S3 to use. They may seem all the same but in fact they offer different price ranges for different use-cases. To start I decided to organizing my data with in these categories:
Long-term archives of Google Photos: I really don’t expect to touch these files in any near future. They are there, they are mine, they are my history, but I am not eager to see them anytime soon, at least not in a foreseeable future.
Mid-term files from Amazon Photos and Google/OneDrive Docs: These files are not frequently used, but if I need them, I need to access them fast. I might need them in a year or two, so I need to have them in a class that I can access them in a reasonable time while keeping the costs low.
Short-term daily docs: These are the files that I might need to access every day. I don’t want to wait for hours to download them, but I don’t want to pay a lot for them either.
Based on the analysis above, I decided to use the following classes:
S3 Glacier Deep Archive: Reliable cheap archival storage for my old life records. I don’t mind waiting for days to download them even though I have no such plans in mind. So I can afford to keep them there for the minimum required time (6 months) and thus, save a lot of money.
S3 Glacier Instant Retrieval: For anything that can be archived but might be needed more frequently than the deep archive. It is slightly more expensive than the deep archive, but it has less waiting time for retrieval and it can be accessed relatively faster.
Google Drive: For few docs remained and few recent Photos that I enjoy to see every day, I decided to keep sticking to the free plan of Google Drive. The old albums will are reviewed on yearly basis and move to S3 if I find them less enjoyable.
I used to pay 2$ for 100GB of data on Amazon Photos, which was filled almost by 95%. The next available option would cost me 10$ for 1TB. So with all the above steps, I managed to not just get all of my data in one place, but reduce the monthly fee from 10$ to just above 1.1$ for 200GB of data. I also have a better organized data, which is a huge plus for me.