Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleGoals of the case study
  • To specify information that needs to be exchanged between Warren and the client Data Center (DC), in order to decide whether fulfilling the requirements of DC is feasible by the current feature set of Warren.

  • To gather all necessary data, that takes into account a wide variety of technical possibilities that will satisfy the service model and commercial goals of the DC.

  • For Warren for the discovery of necessary required features for DC together with business value vs development effort estimates to provide maximum business value for the DC.

Table of Contents


Introduction

Necessary information from the Data Centre can be gathered with a set of simple, unambiguous questions that are divided into three subsets, based on the goal they are meant to achieve. Once answers to all subset questions are defined any ambiguity and misunderstanding over technical and determining factors will be cleared. The technical clarity achieved will be the fundamental premise to lay groundwork for a successful cooperation.  

...

    1.  Architectural Decisions:
      1. Which external libraries, components and standards to use to meet the requirements of the majority of DCs in Warren's client target groups. 
      2. How How to architect features across components, to provide value Warren aims to offer, while maintaining quality of services and maintaining necessary processes the DC had before adopting Warren. 
    2.  Business Value and Marketing:
      1. Can Warren guarantee to offer the functionality we are claiming to offer?
      2. Can Warren provide the functionality at a sufficient level of reliability in a particular domain of service for the DC?
      3. Will development effort for Warren be in proportion to the business value the developed functionality is forecasted to provide?  
      4. Are/will features and functionality of Warren be in correlation with the actual requirements of the DC?

Trade-Offs between Availability vs Locality and Software vs Hardware Defined

...

Control

There are two fundamental tradeoffs that a DC needs to decide on:

...

Network and Storage are tightly coupled, as decisions in one domain are influenced by the properties of the other domain. Once the connection between Network and Storage domains are analysed in full, decision-making in the following two trade-offs is possible. 

Availability vs Locality

 The biggest trade-off decision is in multi-location computing, (- distributed cache and storage are simultaneously good and evil (wink)). 

  1. Availability in this context denotes:
    1. Spacial continuity - data or service is concurrently available to recipients/consumers in different locations rather than just one 
      Example: many Virtual Machines using the same database that resides in distributed storage
    2. Temporal continuity - data or service is kept available even in case of soft- or hardware failures ("High Availability")

    In cloud computing both spacial and temporal continuity may seem desirable, but the downside is lower delivery speed and latency. For example spacial availability with distributed storage without high-end hardware may not have the optimal latency for storage-sensitive applications. To assure application high-availability several software and hardware redundancies are involved. This requires buying and maintaining additional hardware, which makes the Total Cost of Ownership higher. 

  2. Locality (local storage vs distributed storage)denotes the physical distance of a functional domain from compute resources. 
    While (CPU, RAM) resources (local storage vs distributed storage)

    Locality is also not free from redundancy costs as High Availability metrics are received achieved by involving both distributed and redundant resources, the locality is also not free from redundancy cost, however, it . Luckily that is usually on the one of sub-server level , so definitely and is therefore less expensive. Local storage has also much lower latency which is desirable, but total storage capacity in contrast is very limited; and. In addition, as data on this local storage is not available to outer devices without additional control and services, it introduces additional data duplication demand besides the one meant for "High Availability".demands are introduced on top of requirements to achieve High Availability. This means extra development work and more costs involved. 

Software- vs Hardware-

...

Defined Domains of

...

Control

This can mostly be described as: 

  1. SDSoftware-Defined* - slow, but yet flexible, automatically reconfigurable and easily portable
  2. HDHardware-Defined* - high speed, low portability, automatic configuration is limited or impossible

The general tendency is towards a concept “software“solely software-defined DC”, largely because of the automation and management benefits it offers. The An exception to this tendency is the popularity of bare-metal provisioning popularity that could be explained the still-existing , as a demand for direct control over hardware , still exists. This is required by some type of applications, that require independence from general software-level system failures and speed.

The responsibility borderline between the administration and the support

...

Clearly Defined Roles of DC System Administration and Warren System Support

DC system administrators' role depends on the size of a company, the DC and the nature and the complexity of the system itself;  the infrastructure, regional peculiarities, job description , and many other factors. However, there is usually In cooperation between the DC and Warren a strict distinction between a DC system administrator and Warren system support personnel roles. Latter one, in turn, role can be made. It is important to note, that Warrens system support role differs from third party support software so to conclude, where that borderline is drawn varies, and that’s the reason it should be discussed.This is one of the topics that definitely software support. Depending on the DC and installation requirements setting and defining the border needs to be discussed.

The roles and responsibilities must be addressed thoroughly before a final settlement for Warren adoption. It’s of adopting warren. It is a clerical error to assume that the matter of supportability is a second-grade one and matter that can be addressed after a production system is up and runningset up of production grade systems. As a matter of fact, it is so important part one of the most important parts of service provisioning so that it deserves the a chapter in the development documentation! Complex software systems must be developed with efficient observability and support in mind. 


Considerations in the

...

Network Domain

There are several factors in DCs network setup that dictates what we need to think through in the Warren application development process. Such factors include:

Network topology (tree, clos, fat-tree, thorus, etc)

This aspect defines network traffic between components, servers, racks also between DC and the internet. It sets DCs physical extendability properties, thus, we need to consider:

...

Obviously, we cannot fine-tune our setup for every topology type because it's not a standalone factor, so the set of variables in such analysis is large and too costly compared to the business-value of the outcome. But we can target the solution that covers topologies mostly used in DCs with a sufficient degree of quality. Metrics of service reliability and availability standards are something that cannot be purely theoretically calculated in the platform that is under heavy development. Thus, they will rather be deduced from DCs adoption process. The current assumption is that the most widely used topologies in the probable target DC group are fat-tree and various forms of clos. Based on that, most optimizations are made for the latter two topology types.

Nature of applications and services offered by DC

Although, both, this and next point seem to be trivial compared to a real problem magnets like network topology, adopting SDN solution, or better yet, consolidating different SDN solutions; this has become a major issue in public clouds (and presumably also in private ones, where such issues are usually not materialized as a series of scientific papers). Like almost all (except for SDN maybe) network-related considerations, also this one has the quantity-dependent nature. 

In-DC traffic amount between racks

The bigger the amounts of data-flow between hardware devices, the bigger of a problem it tends to be. This traffic (and also In-DC traffic between silos, if larger DC is under consideration), is the one that measures the service system (Warren) efficiency. It's a two-fold problem, first the traffic that is generated by the clients, secondly the one that is generated by Warren as a management system. The goal of Warren is to reallocate resources to minimize in-DC traffic and in rare cases, it can, by doing so, destabilize the network flow for a short period of time. Management flow must always take precedence when client flow is causing problems, even if it decreases client throughput further. Because it’s purpose is to restore the previous state, or at least maximize the efficiency with the currently limited amount of available resources. 

Existing SDN solution

In general, all SDN systems are based on the same principles an in major part, derived from two prevalent frameworks for SDN generation. There are several types of protocols when it comes to network device configuration, among which, OpenFlow is still the most dominant one. Almost all needed routing protocols are also supported by all major SDN solutions. 

To conclude the above, there shouldn’t arise any drastic problems on a connection basis (which doesn't mean it's a trivial task!). However, there is an exception to that hypothetical balance - the security domain. All SDN systems implement some (or more) security domains, whether it’s client level or system-wide. To configure 2 or more SDN systems to cooperate simultaneously on that domain, might be more time consuming than configure the whole system to use adopt a new one.


Considerations in the storage domain

Warren storage domain consists of three options:

...

To determine the right solution, one must consider several factors that are required to implement a particular storage type. As storage holds the most valuable part - client data, the impact on the reliability and to QoS. Afterall - network outage only affects the availability of data, whereas storage problems may lead to permanent data loss.

Distributed - Expensive, but reliable, multi-functional, and scalable

The cost of a distributed storage (that may also be shared-distributed) comes from the fact that distributed is usually (not always - one exception is HCI) implemented as a separate cluster(s). So there are three main types of costs and an additional, optional one:

...

The reliability advantage compared to shared or local storage should be obvious.

Shared - cheaper, faster, half-baked reliability

If infrastructure includes a direct-attached storage unit used as a shared storage solution, there is a high chance that the vendor has included the device software that operates with the device. There may even be distributed solution working in this unit but it must be kept in mind that this kind of storage is distributed within the device itself. If the storage device should fail, all the data is still unreachable - the data protection works at the disk level. 

To raise the protected sphere to the rack level, several such storage units must be placed in one rack. Now if Infrastructure contains more than 2 racks (which should be the normal case for DC), why are they not separated from compute units to form the autonomous distributed storage cluster? One answer to that might be the performance. As off-the-shelf storage units usually include “real RAID” controllers (with detached CPU and cache) and connection to compute units is direct (not over the network) the performance may be significantly higher than that of the distributed storage could offer.

Local - cheapest, not definitely fastest

Nowadays, the cost of the TB as a single disk is very low compared to the same capacity implemented in the form of an advanced storage device. However fast the single disk might be, it couldn’t compare with the direct-attached, performance-tuned shared storage system. 

...