Aloha, friends, welcome back! In the world of computers, things have changed a lot. There used to be a time when to transport 5MB HDD IBM used a truck in 1956. And, now we are in an era where you can carry TBs of data storage in your pocket. Isn’t it amazing? In this blog, let’s take a deep dive into AWS EC2 instance storage, a topic that has not been discussed from a beginner’s or layman’s point of view.
Among several excellent AWS services, EC2 instances play an essential role by providing highly scalable, flexible, and on-demand computing services. However, the storage solution is also crucial in making the best use of EC2 instances. In this blog, we will take a deep dive into the storage solution provided by AWS for EC2 instances. Let’s demystify the EC2 storage services.
What is an EC2 instance?
Whatever website or application you plan to deploy or publish, you will need a server. Traditionally, getting a server and configuring it used to be tedious, and sometimes it takes days or even months. In an era where competition in the market is so high, you can’t wait this long just for infrastructure. There comes cloud computing, which gives you Amazon Elastic Compute Cloud or EC2 instances, making it easy for you to create virtual machines or servers of your own choice within no time.
You simply choose the type of instance you need, select the AMI image like Linux or Windows, choose the most appropriate size from the large collection of instance family sizes, and with a few clicks, you are done from the AWS management console.
In a nutshell, Amazon Elastic Compute Cloud (EC2) is a service provided by AWS that allows users to rent virtual servers (also called Instances) on which they can run their own applications.
Storage services in EC2
Before talking about storage services in EC2 lets talk about a bit of fundamentals.
What is Data?
Data is any useful information, such as the selfies you take, your resume, any PDF file, any ebook, or your notes. All these things can be considered to be data.
In the context of cloud computing or the computer world, the data is generally stored, processed, and transmitted in digital form by a computer system. It can be of various types and formats like text, images, videos, audio, documents, pdf, jpeg, png, databases, and many more.
In a nutshell, data can be any digital representation of information that is complete and useful to humans.
What is Storage?
To store digital data, we definitely need storage services. Storage, in the context of computing, refers to the process of saving data in a persistent form so that it can be accessed and retrieved whenever needed. Storage plays an important role in saving, storing, and processing various types of data, whether it be a text document, images, videos, or databases. It involves various devices like HDDs, SSDs, cloud storage, NAS, and SANs for data retention.
Types of storage
Based on the fundamental units of data storage and access methods, we can categorize storage into three types: BLOCK LEVEL, OBJECT LEVEL, and FILE storage services. Let’s understand each of these:
Files Storage:
- A file storage or file level storage stores the data in a tree-like structure or hierarchy structure. The files are organized into folders and subfolders, or you can say directories and subdirectories.
- It allows you to access data through protocols like NFS (Network File System) and SMB (Server Message Block).
- Users can access files through their names and locations in the directory structure.
- File storage services are often used for shared file systems, network-attached storage (NAS), servers, and applications that need file-based access and sharing.
Block Storage:
- In block storage, the data is divided into fixed sizes of blocks, and then it is stored as individual units. The size of a block can be KBs to MBs.
- In OS, each block is given a unique address or block number captured in a data lookup table.
- It allows direct access to individual blocks using protocols such as SCSI (Small Computer System Interface) or Fibre Channel.
- Best suitable for structured data storage, databases, virtual machines (VMs), and operating systems that require low-level access and high performance.
Object Storage:
- Object-level storage organizes data as discrete objects, each containing data, metadata, and a unique identifier.
- The object storage systems are stored without a hierarchical structure because they use a flat namespace.
- We can access the data via HTTP-based RESTful APIs, which give users the power to manipulate objects as individual entities.
- It is ideal for unstructured data, large-scale data storage, cloud storage, and applications requiring scalability and cost-effectiveness.
In the context of cloud computing or AWS, the fundamental definitions of block, object, and file-level storage are similar. However, implementing and adding a few more advanced features by AWS or other cloud providers provides them more power and makes them best suitable for various use cases.
Let’s talk about the storage services in AWS EC2 instance:
Overview of EC2 Storage Options
There are two types of storage services available for EC2 instances: Elastic Block Storage (EBS) and Instance Store. So, why are we talking about these? Why can’t we use S3 storage? Basically, EBS and Instance Stores are connected with EC2 storage as root volumes.
What is a Root Volume or Boot Volume?
A root volume is a storage where you store the operating system that helps your EC2 instance to start. This boot volume or root volume is always kept in EBS or Instance Store.
Most of the time, you will find instances are EBS-backed, which means the OS is stored in EBS. If you find the EC2 instance is Instance Store-backed, that means the OS is in Instance Store.
Introduction to Elastic Block Storage
Elastic block storage is a block-level storage service provided by AWS to give you persistence and network-attached storage. It is a high-performance storage that can be attached to EC2 instances as a disk volume.
An EBS is a unique type of storage that feels like regular storage even though it is not. Because it is a Network storage it is dependent in the speed of the network.
Do you know?
If your EC2 instance is NOT EBS-backed, you will LOOSE your data when Terminating the EC2!
Yes, this is the reason the EBS is used a lot in AWS EC2 with EBS volume you can attach and detach the storage to an EC2 instance, and even after terminating the EC2 instance your data is going to be safe in EBS volume and you can access that data by attaching and mounting that EBS volume to an another EC2 instance.
EBS volume supports both Windows and Linux machines you can create attach and mount the volume to the EC2 instances and make use of it.
One thing to know is that at a time you can attach one EBS volume to one EC2 instance and both EC2 instance and EBS volumes should be in the same availability zone.
EBS volume Types
EBS storage performance is measured in two ways.
IOPS: The number of Input and output operations a storage device can complete in one second.
Throughput: The speed of data transfer measured in megabytes per second. This is based on the number of I/O operations and the block size of each Input / Output operation. (Throughput = Number of IOPS * Block size)
The Latency means the time that a storage array takes to send and respond to a read or write request.
If you divide EBS volume types broadly, you have one SSD-backed volume and one HDD-backed volume. Other than these, there is also the Magnetic standard type. Later on, between 2012 and 2015, AWS introduced various other types of volumes. Currently, we have five volume types. Let’s understand SSD and HDDs first.
SSD vs HDD
SSD is a solid-state drive, and HDD stands for hard disk drive. AWS provides These two types of storage devices for users to store their data.
SSDs are faster, and HDDs are slower. Therefore, SSDs are costlier.
SSD is a type of storage device that uses memory chips to store data. It is much faster than an HDD because it has no moving Parts. It can access and transfer data much faster. It is used for applications that require fast data transfer speeds, like video streaming, gaming, or high-performance Computing.
Types:
- General Purpose SSD (GP2 and GP3)
- Provisioned IOPS SSD (IO1, IO2, and IO2 Block Express)
HDD is a storage device that uses a Spinning Disk to store data. It is slower than an SSD because it has moving Parts. Still, it can store more data at a lower cost, which makes it ideal for applications that require large amounts of storage, such as backups or archives.
Types:
- Throughput Optimized HDD
- Cold HDD
Let me put a table of comparison of each type of volume types
What is an Instance Store / Ephemeral storage?
- Instance Store for EC2 instances is a type of storage that is physically attached to the host computer of an EC2 instance.
- Unlike EBS volumes, which are network-attached and can be used even after the EC2 instance is terminated, Instance Store volumes get DELETED when the EC2 instance is terminated. Instance Store volumes are ephemeral in nature which means that they are temporary and exist only until the EC2 instance lifecycle.
- The Instance store is limited to 10GB per EC2 instance.
- The instance store is available only for a limited type of EC2 instances that require exceptionally high-speed local storage for fast booting and for applications where you need high performance.
What is an Elastic File Storage?
Let’s say you’re working on a small project with multiple team members collaborating. However, managing the files becomes challenging as they are scattered across different computers. The team members need help accessing the latest versions of files, leading to confusion and often requiring people to redo the same work. The risk of file loss due to hardware failures is also a big problem.
So what could be a solution? Perhaps a central location, correct?
AWS gives you the Elastic File System (EFS), which you can attach to many EC2 instances simultaneously. The storage will be accessible from any EC2 instances, allowing you to centrally store, manage, and collaborate on files in the cloud. It’s a serverless architecture, meaning AWS will care for provisioning and maintenance.
With Amazon EFS, you can create a centralized file storage solution, allowing all team members to access duplicate files from any location. Each team member can mount the EFS file system on their EC2 instance, which allows all team members to access the latest versions of files.
With Amazon EFS, you do not have to transfer files or rely/depend on individual computers manually. The risk of file loss due to hardware failures is reduced since files are securely stored in the cloud rather than scattered across multiple devices.
Moreover, since Amazon EFS is a serverless architecture, AWS takes care of provisioning and maintenance, allowing your team to focus on collaboration and project work without the overhead of managing file servers.
How to choose the best storage solution for an EC2 instance?
Choosing the best suitable storage solution for your EC2 instance in real-time can be tricky, it depends on various factors, I generally ask a few questions to myself or the other stakeholders and then make a decision. Here are the 5 common questions we should ask:
1. Performance:
- What are my performance requirements?
- Understand the required speed and latency for optimal performance.
2. Data Persistency:
- Is data loss acceptable, or do we need persistent storage?
- Assess the need for data persistency and durability, think of redundancy and backup options also.
3. Budget:
- Does the cost of storage services fit into my budget?
- Evaluate the cost implications and balance performance with the budget allowed.
4. Type of Data:
- What type of data do I have structured or unstructured?
- Determine whether we need file-based or block-level access for storage.
5. Management:
- How do I want to manage the storage?
- Decide on the level of management based on resources and expertise available.
If you are selecting a storage solution for an EC2 instance, consider factors like what type of workload you have, the performance requirements, and if you need data persistence needs. Choose between instance-store volumes for temporary data or Amazon EBS volumes for persistent storage, selecting the appropriate EBS volume type based on performance characteristics and cost considerations. We should Prioritize encryption and backup strategies for data security and resilience while monitoring storage usage for optimization.
That’s all from me for today! Keep learning, and see you in the next blog.
No comment yet, add your voice below!