...
- Architectural decisions:
- which external libraries, components, and standards to use to cope with the requirements of the majority of DCs in the target group?
- How to design the functionality in component systems, so that we provide the value we claim to be offering, without causing the decrease in quality of services and processes existing there before Warren adoption?
- Marketing content and business value:
- Can we actually offer the functionality we are claiming to offer?
- Can we offer the functionality at a sufficient level of reliability in a particular service domain of a DC?
- Are we doing it in a sensible way, e.g. the development effort is comparable to the actual value the development result is providing?
- Are all the features and functionality we are/will be providing also in correlation with the actual requirements?
- Architectural decisions:
Conflicting nature in service service requirements
To enhance the analysis result and make it directly usable as an input to the development process, let's partition the hypothetical DC stack (hardware, firmware, software) into functional domains that have common properties to according Warren components. Two of DC functional stack domains that have been there before Warren adoption are more influential than others, both future development- and adoption process-wise. These are Network and Storage. They are also tightly coupled, as decisions in one domain heavily depend on the properties of the other. If analyzed, the connection between these two domains is expressed best in the decision-making process, as two fundamental trade-offs :- availability vs locality and software- vs hardware-defined domains of control.
Availability vs Locality
The biggest trade-off there is in multi-site computing, (thus distributed cash and storage are simultaneously good and evil at the same time ).
- Availability in this context denotes:
- Spacial - data or service is concurrently available to recipients/consumers in different locations rather than just one (many Virtual Machines using the same database that resides in distributed storage)
- Temporal continuity - data or service is kept available even in case of soft- or hardware failures ("High Availability")
Both of these aspects may seem very desirable, especially in cloud computing, but the downside is delivery speed in various forms. For example, distributed storage without high-end hardware may not have sufficient latency for storage-sensitive applications. Also, to keep the application availability rate high, there are software and several levels of hardware redundancy involved which means buying additional devices and keep them constantly running. - Locality denotes the physical distance of some functional domain from compute resource (local storage vs distributed storage)
While High Availability metrics are received by involving distributed and redundant resources, the locality is also not free from redundancy cost, however, it is usually the one of sub-server level, so definitely less expensive. Local storage has also much lower latency, but total capacity is very limited; and, as data on this storage is not available to outer devices without additional control and services, it introduces additional data duplication need in addition to one that is demand besides the one meant for "High Availability".
Software- vs Hardware-defined domains of control
...
This can mostly be described as:
...
The general tendency is towards a concept “software-defined DC”, largely because of automation and management benefits it offers. The exception to tendency is bare-metal provisioning popularity that could be explained the still-existing demand for direct control over hardware, required some type of applications, independence from general software-system failures and speed.
The responsibility borderline between the administration and the support
System administrators’ role depends on the size of a company, nature and the complexity of the system itself; the regional peculiarities, job description, and many other factors. However, there is usually a strict distinction between system administrator and system support personnel roles. Latter one, in turn, differs from third party support software so to conclude, where that borderline is drawn varies, and that’s the reason it should be discussed.
This is one of the topics that definitely must be addressed thoroughly before a final settlement for Warren adoption. It’s a clerical error to assume that the matter of supportability is a second-grade one and can be addressed after a production system is up and running. As a matter of fact, it is so important part of service provisioning that deserves the chapter in development documentation! Complex software systems must be developed efficient observability and support in mind.
Considerations in the network domain
...
The cost of a distributed storage (that may also be shared-distributed) comes from the fact that distributed is usually (not always - one exception is HCI) implemented as a separate cluster(s). So there are three main types of costs and an additional, optional one:
Upfront cost - devices itself, including explicit network for storage (fixed cost)
Repair/management costs + cost of space (fixed over a long period of time)
Energy cost (usually fixed over a long period of time with its seasonal spikes)
Optional license cost when the commercial distributed storage system is applied
When summed up and divided into monthly payments over a time period that equals server units service life, by far, the highest one is the energy cost. To conclude, although it seems wiser to make use of old server hardware for storage clusters, it is actually not so at all. Much wiser is to buy new, specially configured, low power consumption hardware that may even come with installed and configured distributed storage systems. Such specially configured devices offer another benefit - fast cluster extendability.
In a typical distributed storage solution, there are implemented both object and block storage, giving in such a way an opportunity to (when implemented as separate clusters) to use object storage also as a base for backup or disaster recovery implementations, in addition to its main purpose.
The reliability advantage compared to shared or local storage should be obvious.
Shared - cheaper, faster, half-baked reliability
...
Nowadays, the cost of the TB as a single disk is very low compared to the same capacity implemented in the form of an advanced storage device. However fast the single disk might be, it couldn’t compare with the direct-attached, performance-tuned shared storage system.
Arguments that local storage is less expensive to network resources, like the two other options above, are not exactly correct if you value the data on those disks. To be prepared for hardware failures one has to constantly back up the data and it is meaningless if not done outside the machine/cluster. Which doesn’t mean that if there are local disk placed in servers, they cannot be used. There are a lot of properties that need caching or swapping and local storage is a perfect case for such needs.
Warren components placement in DC
...