Skip to main content

Storage Scenarios and Architectures

Learn how to use Equinix Metal resources to fit your Storage needs.

Storage Scenarios and Architectures

Storage Use Cases on Equinix Metal

How do you find the right storage offerings and architectures for your needs on Equinix Metal?

In this series, we will describe various use cases, and the storage offerings and supporting architectures that you are likely to want.

This series is not product documentation but solution guides. We want to help you put together the right solution for your needs, using Equinix Metal products and services.

Once you have the architecture you want for your needs, you can use the Equinix Metal product documentation for the different products, as well as our Learning Center storage guides, to build out your desired architecture.

Categories

In this series, we will cover the following categories of storage usage:

  • Cloud-Adjacent Storage
  • Backup/Disaster Recovery
  • Databases
  • AI and Machine Learning
  • Virtualization
  • File shares

Cloud-Adjacent Storage

Cloud-adjacent storage is storage located outside your primary cloud provider, in a location that is accessible to your processes in that cloud provider, and suitable for your purposes.

The suitability of the location depends on your use case:

  • If you are looking to protect your data in case of a data center, or availability zone, failure, then you can have the data in the same metro or region as the cloud provider.
  • If you are looking to protect your data in the case of a regional failure, then you want to have the data in a different region.
  • If you are looking to reduce latency, then you need the data in the same metro as the cloud provider.

Common use cases for Cloud-Adjacent Storage are:

  • More affordable long-term storage. Cloud providers vary greatly in their storage costs, both for storage and for data transfer. Storing data on Equinix Metal can be a very good value for large amounts of data.
  • Protection from vendor lock-in. Many enterprises worry about being locked into a single cloud provider. When all your data is tied up inside that single provider, migrating or replicating processes to another vendor often is the biggest such concern. By storing data on Equinix Metal, you protect yourself by keeping critical data under your control.
  • Protection from cloud provider availability issues. Simply by having your data stored on an alternate provider, with their own independent infrastructure, you can protect yourself from cloud provider outages. Cloud providers work very hard to prevent outages, let alone those that spread beyond one availability zone or region, but they do happen.
  • Data sovereignty. Many countries have laws that require data to be stored within the country. This can be a problem if you are using a cloud provider that does not have a data center in that country. By storing data on Equinix Metal, you can meet those requirements. Read the full list of Equinix locations.
  • Data storage compliance. Many jurisdictions and industries have compliance regulations that require you to store data on physical storage media, servers or services dedicated to your use, rather than in a shared environment. With Equinix Metal's bare metal offerings, you are guaranteed that no one but you operates on your servers or at your storage.

Finally, it is important to acknowledge the value of data in today's market, especially for AI training. The only way to guarantee that your data is not used by others is to store it on your own, especially since data privacy policies can change without adequate warning.

With Equinix Metal offerings, you rent not by the gigabyte or terabyte, colocated with many other tenants on unknown storage arrays. Rather, you pay for the actual storage devices, whether drives on storage-optimized Equinix Metal servers or dedicated managed storage arrays, and put your data on drives dedicated to you.

Backup and Disaster Recovery

Backup and disaster recovery are like insurance. They're the situations you don't want to think about, until you really need to. Equinix has an excellent guide to disaster recovery.

Backup and disaster recovery have two distinct purposes. Backup is about protecting your data in case of sudden loss or human error. For example, your database becomes corrupted, someone accidentally deletes critical files, or your storage array has electrical issues that delete everything on it. Disaster recovery, on the other hand, is about protecting your data in case of a catastrophic event, such as a fire or earthquake that destroys your data center, most importantly along with your data storage.

Because of these different purposes, you may have different requirements for each.

From a storage perspective, for both backup and disaster recovery, you need to have sufficient storage to keep all of the data you will need. You also need to have a plan - and regularly tests that plan - for recovering that data in time of need.

The differences in the requirements are as follows.

For backup, you need not only a copy of the most recent set of data, but also periodic copies. This is where versioning technology is helpful. Backups handle not just the case of, "oops, I deleted a file", but also, "what were the contents of that file 6 days ago? how about 2 weeks or 3 months ago?" Additionally, storage snapshots can be utilized to create point-in-time copies of your data, allowing for quick recovery and minimal data loss. Both versioning and snapshots require more storage and the proper management software to enable you to manage and retrieve different versions of your data over time.

For disaster recovery, you need only the most current copy of data, just like in your regular systems. However, unlike backups, you need to have the data ready-to-run in a format that can be brought online quickly, matching your existing systems expectations.

Further, because disaster recovery includes events that affect entire geographies, like hurricans and earthquakes, you need to have the data stored in a location sufficiently distant geographically from your primary location for data, such as in another region, so that it is unaffected by those events. For backups, you can store the data in a location in close proximity to your primary processes, since they are intended for quick retrieval of lost or corrupted data.

Databases

Databases are the lifeblood of most businesses. They are the repository of all of the data that you need to run your organization: user accounts, balances, orders, activities, everything is in there.

Most critically, databases often are required for real-time critical information. When your user wants to place an order online, they will not wait minutes to access some distant, slow storage location.

For these reasons, databases almost always have storage in close proximity to the database process itself, usually on the same server, or on a well-cached network storage device in the same data center. On the other hand, because of the critical nature of the data, databases are often replicated to a distant location.

This merges the best features of local performance and disaster recovery: the real-time performance of access to local data, and the protection of that critical data by having as close to real-time duplicate as possible in a distant location, with another copy of the database colocated with it.

AI and Machine Learning

Artificial Intelligence and Machine Learning are critical technologies in the IT world. Every organization either is actively adopting AI or exploring its usage in their organization, both internally and externally.

While AI, and especially Large Language Models, or LLMs, are famous for requiring large amounts of compute, notably specialized GPUs, both training and inference, they also requires large amounts of storage.

Storage for AI models has several purposes.

First, if you are starting by training a model, you need someplace to store your untrained model. Normally, this is either in a typical programming language, like Python, which is popular among AI programmers and has a wide array of tools for working with AI, with the files typical software language files, like .py. Some AI frameworks do not require programming, instead enabling you to use existing frameworks and control them via configuration files, such as json or xml. These often are stored in a version control system, like Git.

Second, you need a place to store your training data. This is the data that you will use to train your model. You run your model over some period of time, having it learn using your input data. Input data is a very large set of data, sometimes in terabytes or larger. You need a place to store your training dataset. This storage is necessary during training phase. However, you also need to keep your training data around after training. You will want it to test improving models or training datasets, which requires comparing it to existing datasets. You also might be required to keep the training data for compliance reasons.

Training data comes in all formats, from comma-separated values (CSV) files to JPEG images to MP4 videos to simple text files and everything in between.

Third, when your model is done training, you need a place to store the trained model. This is the model that actually will analyze incoming data and make decisions or respond to requests. This model is the valuable part of the AI process, and you need to keep it safe and secure. Trained models are saved in a variety of formats, such as PyTorch's .pth, HDF5 .h5, SciKit's Pickle .pkl, or .onnx.

Finally, if your model is in production, you need a place to store the data that the model is analyzing and the results provided. This is a good idea in general, subject to regulatory and privacy constraints, as it allows you to analyze the data later, to improve your model, or to understand the results of your model. It also allows users to go back and see their history. In some regulated environments, you may be required to keep this data.

All of these data types - untrained models, input training data, trained models, and production data - need to be stored in a reliable, accessible location, from whence you and your AI model can access them when requested.

Virtualization

Virtualization is the process of running multiple virtual machines on a single physical machine. This is done using a hypervisor, which is a piece of software that runs on the physical machine and manages the virtual machines. Open-source hypervisors include Xen and KVM, while commercial hypervisors include VMware, Hyper-V and Nutanix.

Virtualization is a critical element of organizations' IT strategy. It has many benefits, including:

  • better utilization of hardware, by running multiple virtual machines on a single physical machine
  • better isolation of services, by running each service in its own isolated virtual machine.
  • easier management of services, by running each machine as a virtual machine, an item that can be stopped, started or replaced at will, without affecting the underlying hardware

In order to support virtualization, two kinds of storage are necessary.

First, the storage for the virtual machine images themselves. These are the files that define the virtual machine, normally read-only, and are launched. Most organizations have "golden images", IT-approved images, that are the base for all virtual machines. These images sometimes then are added on to to include critical software, creating purpose-specific images. Both the golden images and the purpose-specific images need to be stored in a reliable, accessible location, from whence the hypervisor can launch them when requested.

Second, virtual machines, like all machines, need storage for their data. This is the storage that the virtual machine mounts and accesses to store data for whichever process is running on it: databases, file shares, web servers, etc.

File shares

File shares are the simple storage of files and sharing them that have existed since the dawn of computing. Yet, despite their simplicity, most organizations could not exist without them. Every organization needs a place to store Word docs, Excel sheets, PowerPoint presentations, PDF records, images and videos for marketing brands.

File shares have several key characteristics. First, they must be reliable. When file shares stop working, people stop working. Second, they have to have a reasonable latency. They do not need to be sub-millisecond like a real-time database access; people expect to wait a few moments while their Excel spreadsheet loads up, but they are unwilling to wait a minute or longer. Third, they must be shared via protocols that basic computers understand. For modern desktops, laptops and even smartphones, that generally means SMB.

A key element is ensuring that your file share servers are accessible to all your workers, whether via direct connections, VPN or over the Internet.

Last updated

03 July, 2024

Category

Tagged

Article
Subscribe to our newsletter

A monthly digest of the latest news, articles, and resources.