Virtualization: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Overview=
=Overview=


This document describes the rational, possible design, and possible
This document describes the rationale, high-level software choices, and
problems of a move to a virtualized infrastructure.
possible problems of a move to a virtualized infrastructure.
 
=Progress=


Updates on tests and experimental implementations of a virtualized
Updates on tests and experimental implementations of a virtualized
infrastructure at BBOP can be found here: [[Virtualization_progress]]
infrastructure at BBOP can be found here: [[Virtualization_progress]]


=Rational=
=Rationale=


In addition to the general advantages of virtualization in a computing
In addition to the general advantages of virtualization in a computing
infrastructure, there are potentially several advantages especially
infrastructure, there are potentially several advantages especially
important for GO.
important for GO.
==Consistency of Development Platform==
As an example, AmiGO's environments (production, development,
3rd-party development) are very distinct--different operating systems,
package managment, and maintenance cycles. This can make agreement on
packages and versions not only hard to discover, but also hard to
maintain due to dependencies that may not be directly related to
software development.
Using a shared VM image eliminates the above problems, giving a single
unified development platform. In a less intensive form, a VM would
allow there to be a common reference platform that development had to
be tested against before being accepted.
In addition, especially with programmers at different locations,
virtualization could help deal with hard-to-debug problems with by
literally sending a copy of the current machine to somebody who might
understand the issues better.


==GO Software Distribution==
==GO Software Distribution==
Line 19: Line 36:
VM images are potentially a great way to distribute GO software.
VM images are potentially a great way to distribute GO software.
Either as part of a larger infrastructure or from the command line on
Either as part of a larger infrastructure or from the command line on
an individual machine, where possible, getting a VM running on a
an individual machine, getting a VM running on a machine greatly
machine greatly simplifies the installation process of complicated
simplifies the installation process of complicated software packages
software packages.
(similar to a bootable ISO image).


Also, with the larger available size of a VM image, it would be
Also, with the flexible size of a VM image (compared to around 5GB max
possible to add live databases and large data files directly into the
for a DVD image), it would be possible to add live databases and large
image for easy use. For researchers who might not have local software
data files directly into the image for easy use. For researchers who
support to help in the setup and maintenance of complicated packages
might not have local software support to help in the setup and
and dependencies, it could reduce the amount of time spent on
maintenance of complicated packages and dependencies, it could reduce
non-productive activities.
the amount of time spent on non-productive activities.


===Speed to Production===
===Speed to Production===
Line 34: Line 51:
A bonus to having this as a software distribution method is that it
A bonus to having this as a software distribution method is that it
could be used internally to speed the release cycle of server-based GO
could be used internally to speed the release cycle of server-based GO
software (e.g. AmiGO). A small set of changes could convert a
software (e.g. AmiGO, Solr and data files). A small set of changes
development image into a production image, which could then be
could convert a development image into a production image, which could
immediately deployed. If a new production image was problematic, it
then be immediately deployed. If a new production image was
could be immediately switched with an older one, causing minimal
problematic, it could be immediately switched with an older one,
disruption to users.
causing minimal disruption to users.
 
==Consistency of Development Platform==
 
AmiGO's production and development environments are very distinct,
 
 
is currently developed on a different platform
This
 
This would also make the


==Redundancy and Recovery==


==Redundancy and Recovery==
Given that no infrastructure is bullet-proof.


Given that no infrastructure is bullet-proof, the ability to take a
* Ability to take a running image and move it to a different machine or facility with minimal downtime is a great feature.
running image and move it to a different machine or facility with
* Ability to rollback great swaths of infrastructure to a known working point in the past can help to deal with unexpected problems in a way that does not affect end users.
minimal downtime is a great feature. This is also nice in the sense
that


==Cloud Computing (buzzword!)==
===Cloud Computing (buzzword!)===


Although we do not currently have plans to implement any
Although we do not currently have any (non-experimental) plans to
infrastructure on the cloud (permanently or on-demand), it seems like
implement any infrastructure on the cloud (permanently or on-demand),
a good option to keep open; for example, it could be useful as an
it seems like a good option to keep open; for example, it could be
emergency backup or to create regional mirrors of data.
useful as an emergency backup or to create regional mirrors of data.


=Platforms=
=Platforms=


In addition to the software mentioned below (what we currently believe
In addition to the software mentioned below (what we currently believe
to be the best solution for what we want), we have tried: Open Nebula,
to be the best solution for what we want), we have tried: UEC/Eucalyptus, Open Nebula, XenServer, CentOS, Debian, and openSUSE.
libvirt, XenServer, CentOS, Debian, and openSUSE.


Below is a brief discussion of the experimental list and some
Below is a brief discussion of the current experimental list and some
explanation.
explanation.


==Ubuntu LTS==
==Ubuntu LTS==


Currently, the preferred platform is Ubuntu 10.04 (two paragraphs of
The preferred platform is Ubuntu 10.04 (two paragraphs of reasons cut
reasons cut out here). In particular, Ubuntu has tight integration and
out here). In particular, Ubuntu will have integration and support for
support for the virtualization infrastructure we currently prefer.
the virtualization infrastructure we will likely currently prefer and has been active in making virtualization work well out of the box with the system.


==UEC/Eucalyptus==
==KVM==


This is the virtualization infrastructure mainly supported by Ubuntu.
KVM seems to be better supported by the Linux kernel and is the preferred method of virtualization for Ubuntu. In addition, Xen can be hard to maintain (especially dealing with kernel changes), we've had some filesystem issues problems in the past, and Citrix has been unresponsive to renewal issues.
While there are some annoying hooks (i.e. Canonical's Landscape
service), it seems to be more community driven and open than the
Citrix (Xen) equivalent and more usable than Open Nebula.


It supplies both high-level VM and storage interfaces and supports a
==UEC/?==
fairly broad range of underlying technologies.


===Amazon API===
While we attempted to implement infrastructure using UEC/Eucalyptus and Amazon services, there was a fundamental mismatch between our choice of KVM and Amazon/Eucalyptus. Since we have good reasons for the former, the latter had to go. Fortunately, with the next release of Ubuntu, a more flexible and open stack will be available: OpenStack.


Eucalyptus also integrates well with Amazon's "cloud" APIs (EC2 and
Again, It should supply both high-level VM and storage interfaces and supports a
S3), allowing some fuzzing between our private infrastructure and
fairly broad range of underlying technologies. It should also be
anybody who implements Amazon's computing and storage APIs (Amazon,
possible to integrate with remote infrastructure.
other Eucalyptus installations, S3 providers).


==KVM==
===OpenStack API (future)===


KVM seems to be better supported by the Linux kernel and is the
UEC (or a similar Ubuntu package) should hopefully integrate well with OpenStack's "cloud" APIs, allowing some fuzzing between our private infrastructure and anybody who implements their API.
preferred method of virtualization for Ubuntu.


=Caveats=
=Caveats=


While the cons list would seems to be fairly minimal with
(The word "cons" will be saved for a discussion on Lisp.)
virtualization, there are several points which need to be considered.
 
There are several points which need to be considered.


==Initial infrastructure cost==
==Initial infrastructure cost==
Line 124: Line 124:
infrastructure is planned and rolled out over time, additional cost
infrastructure is planned and rolled out over time, additional cost
could be minimal.
could be minimal.
Also, at least in the beginning, most of the virtualization will be
done at single sites where networking will go fairly fast internally.
While the full benefits aren't realized without fast networking
between sites, a lot of the significant ones are.


==Increased complexity==
==Increased complexity==
Line 137: Line 142:
connected to para-virtualization versus full-virtualization).
connected to para-virtualization versus full-virtualization).


Some of this is inevitable--computing has always gotten more complex
Some of this is inevitable when higher-level abstractions are created
over time as higher-level abstractions are created over old ones. The
over old ones. The increased flexibility presented by virtualization
increased flexibility presented by virtualization will hopefully pay
will hopefully pay for the increased complexity and problems
for the increased complexity and problems associated with it.
associated with it.


==Inexperience==
==Inexperience==
Learning is fun, even the hard way.


Related to the above, as with any new way of doing things, there is
Related to the above, as with any new way of doing things, there is
going to be a learning period where the solutions to new problems are
going to be a learning period where the solutions to new problems are
slow and non-optimal. Hopefully, by using a well-supported software
slow and non-optimal (things go "sproing!" and nobody knows why).
infrastructure with a large community around it, this will be kept to a
Hopefully, by using a well-supported software infrastructure with a
minimum.
large community around it, this will be kept to a minimum.


==Monoculture==
==Monoculture==
Line 155: Line 162:
: — Pudd'nhead Wilson's Calendar
: — Pudd'nhead Wilson's Calendar


Currently, AmiGO is developed on one platform, put into production on
As an example, AmiGO is currently developed on one platform, put into
two others, with software developed for on a fairly wide variety. Just
production on two others, with software developed for on a fairly wide
the act of having AmiGO function in all of these different places
variety. Just the act of having AmiGO function in all of these
helps maintain a clean architecture and good coding practices (e.g.
different places helps maintain a clean architecture and good coding
bugs that are not apparent on one platform cause crashes on another).
practices (e.g. bugs that are not apparent on one platform cause
To some extent, it becomes tradeoff between robustness and time to
crashes on another). To some extent, it becomes tradeoff between
develop and get into production. Given resource limitations, favoring
robustness and time to develop and get into production. Given resource
the latter is probably the best at this time.
limitations, favoring the latter is probably the best at this time.


On a more paranoid angle, monoculture also increases the risk of a bug
On a more paranoid angle, monoculture also increases the risk of a bug
or security hole on one instance being trivially exploitable on all
or security hole on one instance being trivially exploitable on all
instances. I'm unaware of any specific attack against AmiGO/GO software,
instances. I'm unaware of any specific attack against GO software,
(only general attacks again, say, LBL), but the potential is there.
(only general attacks again, say, LBL), but the potential is there.
==OpenStack==
It seems possible that eventually the Eucalyptus/AWS architecture will
be replaced by OpenStack.
* More open
* Not dependent on a commercial software provider (a la Citrix)
** Less lock-in
** Less for-pay feature-itis
Ubuntu has started to include this into their repositories, but there
is no currently no information about when/if this will become the
preferred architecture.
This possible future change is likely not a large problem anyways as
we will still gain necessary experience and should be able to port
most of our VM infrstructure over intact.





Latest revision as of 15:18, 24 March 2011

Overview

This document describes the rationale, high-level software choices, and possible problems of a move to a virtualized infrastructure.

Updates on tests and experimental implementations of a virtualized infrastructure at BBOP can be found here: Virtualization_progress

Rationale

In addition to the general advantages of virtualization in a computing infrastructure, there are potentially several advantages especially important for GO.

Consistency of Development Platform

As an example, AmiGO's environments (production, development, 3rd-party development) are very distinct--different operating systems, package managment, and maintenance cycles. This can make agreement on packages and versions not only hard to discover, but also hard to maintain due to dependencies that may not be directly related to software development.

Using a shared VM image eliminates the above problems, giving a single unified development platform. In a less intensive form, a VM would allow there to be a common reference platform that development had to be tested against before being accepted.

In addition, especially with programmers at different locations, virtualization could help deal with hard-to-debug problems with by literally sending a copy of the current machine to somebody who might understand the issues better.

GO Software Distribution

VM images are potentially a great way to distribute GO software. Either as part of a larger infrastructure or from the command line on an individual machine, getting a VM running on a machine greatly simplifies the installation process of complicated software packages (similar to a bootable ISO image).

Also, with the flexible size of a VM image (compared to around 5GB max for a DVD image), it would be possible to add live databases and large data files directly into the image for easy use. For researchers who might not have local software support to help in the setup and maintenance of complicated packages and dependencies, it could reduce the amount of time spent on non-productive activities.

Speed to Production

A bonus to having this as a software distribution method is that it could be used internally to speed the release cycle of server-based GO software (e.g. AmiGO, Solr and data files). A small set of changes could convert a development image into a production image, which could then be immediately deployed. If a new production image was problematic, it could be immediately switched with an older one, causing minimal disruption to users.

Redundancy and Recovery

Given that no infrastructure is bullet-proof.

  • Ability to take a running image and move it to a different machine or facility with minimal downtime is a great feature.
  • Ability to rollback great swaths of infrastructure to a known working point in the past can help to deal with unexpected problems in a way that does not affect end users.

Cloud Computing (buzzword!)

Although we do not currently have any (non-experimental) plans to implement any infrastructure on the cloud (permanently or on-demand), it seems like a good option to keep open; for example, it could be useful as an emergency backup or to create regional mirrors of data.

Platforms

In addition to the software mentioned below (what we currently believe to be the best solution for what we want), we have tried: UEC/Eucalyptus, Open Nebula, XenServer, CentOS, Debian, and openSUSE.

Below is a brief discussion of the current experimental list and some explanation.

Ubuntu LTS

The preferred platform is Ubuntu 10.04 (two paragraphs of reasons cut out here). In particular, Ubuntu will have integration and support for the virtualization infrastructure we will likely currently prefer and has been active in making virtualization work well out of the box with the system.

KVM

KVM seems to be better supported by the Linux kernel and is the preferred method of virtualization for Ubuntu. In addition, Xen can be hard to maintain (especially dealing with kernel changes), we've had some filesystem issues problems in the past, and Citrix has been unresponsive to renewal issues.

UEC/?

While we attempted to implement infrastructure using UEC/Eucalyptus and Amazon services, there was a fundamental mismatch between our choice of KVM and Amazon/Eucalyptus. Since we have good reasons for the former, the latter had to go. Fortunately, with the next release of Ubuntu, a more flexible and open stack will be available: OpenStack.

Again, It should supply both high-level VM and storage interfaces and supports a fairly broad range of underlying technologies. It should also be possible to integrate with remote infrastructure.

OpenStack API (future)

UEC (or a similar Ubuntu package) should hopefully integrate well with OpenStack's "cloud" APIs, allowing some fuzzing between our private infrastructure and anybody who implements their API.

Caveats

(The word "cons" will be saved for a discussion on Lisp.)

There are several points which need to be considered.

Initial infrastructure cost

While most newer machines support virtualization in the hardware (necessary to get to make virtualization worthwhile), it is not worthwhile on older hardware or, given KVM's current connection to x86x architectures, non-x86x systems.

In addition, moving large images around can me a slow and time-consuming process. Either patience or switching to faster networking hardware (where not already in place) would be necessary to get all of the benefits.

Both of these issues are at least partially addressed by the fact that new infrastructure rollout is necessary anyways as machines are replaced in the normal course of things. If a virtualization infrastructure is planned and rolled out over time, additional cost could be minimal.

Also, at least in the beginning, most of the virtualization will be done at single sites where networking will go fairly fast internally. While the full benefits aren't realized without fast networking between sites, a lot of the significant ones are.

Increased complexity

Even though virtualization gives better management and resource allocation at a high level, the fact is that there is at least one additional layer between software and hardware, and management and resource allocation at a low level becomes much more complicated, albeit largely hidden from the administrator.

This can make troubleshooting of some kinds of problems more difficult and increases the number of ways that things can go wrong (also connected to para-virtualization versus full-virtualization).

Some of this is inevitable when higher-level abstractions are created over old ones. The increased flexibility presented by virtualization will hopefully pay for the increased complexity and problems associated with it.

Inexperience

Learning is fun, even the hard way.

Related to the above, as with any new way of doing things, there is going to be a learning period where the solutions to new problems are slow and non-optimal (things go "sproing!" and nobody knows why). Hopefully, by using a well-supported software infrastructure with a large community around it, this will be kept to a minimum.

Monoculture

Put all your eggs in the one basket and--WATCH THAT BASKET.
— Pudd'nhead Wilson's Calendar

As an example, AmiGO is currently developed on one platform, put into production on two others, with software developed for on a fairly wide variety. Just the act of having AmiGO function in all of these different places helps maintain a clean architecture and good coding practices (e.g. bugs that are not apparent on one platform cause crashes on another). To some extent, it becomes tradeoff between robustness and time to develop and get into production. Given resource limitations, favoring the latter is probably the best at this time.

On a more paranoid angle, monoculture also increases the risk of a bug or security hole on one instance being trivially exploitable on all instances. I'm unaware of any specific attack against GO software, (only general attacks again, say, LBL), but the potential is there.

OpenStack

It seems possible that eventually the Eucalyptus/AWS architecture will be replaced by OpenStack.

  • More open
  • Not dependent on a commercial software provider (a la Citrix)
    • Less lock-in
    • Less for-pay feature-itis

Ubuntu has started to include this into their repositories, but there is no currently no information about when/if this will become the preferred architecture.

This possible future change is likely not a large problem anyways as we will still gain necessary experience and should be able to port most of our VM infrstructure over intact.