[Saga-devel] saga-projects SVN commit 880: /papers/clouds/
amerzky at cct.lsu.edu
amerzky at cct.lsu.edu
Mon Jan 26 04:31:37 CST 2009
User: amerzky
Date: 2009/01/26 04:31 AM
Added:
/papers/clouds/
adaptor_arch.tex, application_setup.tex, cloud_setup.tex
Log:
snippets from Andre. Not spell checked etc, sorry. Hopefully
in time...
Please let me know where else I can best contribute. Best
possibly by mobile phone: +49 - 1 51 - 56 06 53 04
Cheers, Andre.
File Changes:
Directory: /papers/clouds/
==========================
File [added]: adaptor_arch.tex
Delta lines: +216 -0
===================================================================
--- papers/clouds/adaptor_arch.tex 2009-01-26 04:00:07 UTC (rev 879)
+++ papers/clouds/adaptor_arch.tex 2009-01-26 10:31:17 UTC (rev 880)
@@ -0,0 +1,216 @@
+
+\newcommand{\T}[1]{\texttt{#1}}\xspace
+\newcommand{\I}[1]{\textit{#1}}\xspace
+\newcommand{\B}[1]{\textbf{#1}}\xspace
+
+\newcommand{\ssh}[1]{\texttt{ssh}}\xspace
+\newcommand{\scp}[1]{\texttt{scp}}\xspace
+\newcommand{\sshfs}[1]{\texttt{sshfs}}\xspace
+
+ % this section describes how the adaptors used for the experiments
+ % have been implemented. It assumes that the adaptor based
+ % architecture of SAGA has (shortly) been explained before.
+
+ The adaptor implementation for the presented Cloud-Grid
+ interoperabilty experiments is rather straight forward.
+
+ This section describes the various sets of adaptors used for the
+ presented Cloud-Grid interoperabilty experiments.
+
+
+ \subsection{Local Adaptors}
+
+ Although SAGA's default local adaptors have not much to do with the
+ rpesented experiments, its importance for the used implementation of
+ the various used remote adaptors will become clear later on.
+
+ The local job adaptor is utilizing \T{boost::process} (on Windows)
+ and plain \T{fork/exec} (on Unix derivates) to spawn, control and
+ watch local job instances. The local file adaptor is using the
+ \T{boost::filesystem} classes for filesystem navigation, and
+ \T{std::fstream} for local file I/O. % 'nuf said?
+
+
+ \subsection{SSH adaptors}
+
+ The SSH adaptors are based on three different command line tools,
+ namely \Tssh, \Tscp and \Tsshfs. Further, all ssh adaptors rely on
+ the availability of ssh security credentials for remote operations.
+ The ssh context adaptor implements some mechanisms to (a) discover
+ available keypairs automatically, and (b) to verify the validity and
+ usability of the found and otherwise specified credentials.
+
+ \ssh is used to spawn remote job instances. For that, the ssh job
+ adaptor instanciates a \I{local} \T{saga::job::service} instance,
+ and submits the respective ssh command lines to it. The local job
+ adaptor described above is then taking care of process I/O,
+ detachement, etc.
+
+ A significant drawback of that approach is that several SAGA methods
+ act upon the local ssh process instead of the remote application
+ instance. That is clearly not wanted. Some of these operations can
+ be mitigated to the remote hosts, via separate ssh calls, but that
+ process is complicated due to the fact that ssh is not reporting the
+ remote process ID back to the local job adaptor. We circumvent that
+ problem by setting a uniquely identifying environment variable for
+ the remote process, which allows us to identify that
+ process\footnote{that scheme is not completely implemented, yet}.
+
+ \shfs is used to access remote files via ssh services. \sshfs is a
+ user space file system driver which uses FUSE\ref{fuse}, and is
+ available for MacOS, Linux, and some other Unix derivates. It
+ allows to mount a remote file system into the local namespace, and
+ transparently forwards all file access operations via ssh to the
+ remote host. The ssh file adaptor uses the localo job adaptor to
+ call the sshfs process, to mount the remote filesystem, and then
+ forward all file access requests to the local file adaptor, which
+ operates on the locally mounted file system. The ssh adaptor is
+ thereby translating URLs from the ssh namespace into the local
+ namespace, and back.
+
+ \scp is used by both the ssh job and file adaptor to transfer
+ utility scripts to the remote host, e.g. to check for remote system
+ configuration, or to distribute ssh credentials.
+
+
+ \subsubsection{SSH/SSHFS credential management}
+
+ When starting a remote application via ssh, we assume valid SSH
+ credentials (i.e. private/public key pairs, or gsi credentials
+ etc.) to be available. The type and location of these credentials
+ is specified by the local application, by using respective
+ \T{saga::context} instances. In order to facilitate home-calling,
+ i.e. the ability of the remotely started application to use the
+ same ssh infrastructure to call back to the original host, e.g. by
+ spawning jobs in the opposite direction, or by accessing the
+ original host's file system via sshfs, we install the originally
+ used ssh credential in a temporary location on the remote host.
+ The remote application is informed about these cedentials, and the
+ ssh context adaptor picks them up by default, so that home-calling
+ is available w/o the need for any application level intervention.
+ Also, a respective entry to the local \T{authorized\_keys} file is
+ added\footnote{ssh key distribution is optional, and disabled by
+ default}.
+
+ For example, the following pseudo code would be possible
+
+ \verb|
+ --- local application -------------------
+ saga::context c ("ssh", "$HOME/.ssh/my_ssh_key");
+ saga::session s (c);
+
+ saga::job::service js (s, "ssh://remote.host.net");
+ saga::job::job j = js.run_job ("saga-ls ssh://local.host.net/data/");
+ -----------------------------------------
+
+ --- remote application (saga-ls) --------
+ saga::context c ("ssh"); // pick up defaults
+ saga::session s (c);
+
+ saga::filessystem::directory d (argv[1]);
+ std::vector <saga::url> ls = d.list ();
+ ...
+ -----------------------------------------
+ |
+
+ The remote application would ultimately call \sshfs (see above) to
+ mount the original filesystem, and then use the local job adaptor
+ to access that mounted file system for I/O. The complete key
+ management is transparent.
+
+
+
+ \subsection{AWS adaptors}
+
+ SAGA's AWS\footnote{\B{A}mazon \B{W}eb \B{S}ervices} adaptor suite
+ interfaces to services which implement the cloud web service
+ interfaces as specified by Amazon\ref{aws-devel-url}. These
+ interfaces are not only used by Amazon to allow programmatic access
+ to their Cloud infrastructures EC2 and S3, amongst others, but are
+ also used by several other Cloud service providers, such as
+ Eucalyptus\ref{euca} and Nimbus\ref{nimbus}. The AWS job adaptor is
+ thus able to interface to a variety of Cloud infrastructures, as
+ long as they adhere to the AWS interfaces.
+
+ The AWS adaptors are not directly communication with the remote
+ services, but instead rely on Amazon's set of java based command
+ line tools. Those are able to access the different infrastructures,
+ when configured correctly via specific environment variables.
+
+ The aws job adaptor is using the local job adaptor to manage the
+ invocation of the command line tools, e.g. to spawn new virtual
+ machine (VM) instances, to search for existing VM instances, etc.
+ Once a VM instance is found to be available and ready to accept
+ jobs, a ssh job service instance for that VM is created, and is
+ henceforth taking care of all job management operations. The aws
+ job adaptor is thuse only respnsoble for VM discovery and management
+ -- the actual job creation and operations are performed by the ssh
+ job adaptor (which in turn utilizes the local job adaptor for its
+ operations).
+
+ The security credentials to be used by the internal ssh job service
+ instance are derived from the security credentials used to create or
+ access the VM instance: upon VM instance creation, a aws keypair is
+ used to authenticate the user against her 'cloud account'. That
+ keypair is automatically registered at the new VM instance to allow
+ for remote ssh access. The aws context adaptor is collecting both
+ the public and private aws keys\footnote{The public key needs to be
+ collected from the remote instance}, creates a respective ssh context,
+ and thus allows the ssh adaptors to perform job and file based SAGA
+ operations on the VM instance.
+
+ Note that there is an important semantic difference between 'normal'
+ (e.g. grid based) and 'cloud' job services in SAGA: a normal job
+ service is assumed to have a lifetime which is completely
+ independent from the application which accesses that service. For
+ example, a Gram gatekeeper has a lifetime of days and weeks, and
+ allows a large number of application to utilize it. A aws job
+ service however points to a potentially volatile resource, or even
+ to a non-existing resource -- the resource needs then to be created
+ on the fly.
+
+ That has two important implications. For one, the startup time for
+ a aws job service is typically much larger than for other remote job
+ service, at least in the case where a VM is created on the fly: the
+ VM image needs to be deployed to some remote resource, the image
+ must be booted, and potentially needs to be configured to enable the
+ hosting of custom applications\footnote{The aws job adaptor allows
+ to execute custom startup scripts on newly instantiated VMs, to
+ allow for example to install additional software packages, or to
+ test for the availaility of certain resources.}.
+
+ The second implication is that the \I{end} of the job service
+ lifetime is usually of no consequence for normal remote job
+ services. For a dynamically provisioned VM instance, however, it
+ raises the question if that instance should be closed down, or if it
+ should automatically shut down after all remote applications
+ finished, or if it should survive for a specific time, or forever.
+ Ultimately, it is not possible to control these VM lifetime
+ attributes via the current SAGA API (by design). Instead, we allow
+ to choose one of these policies either implicitely (e.g. by using
+ special URLs to request dynamic provisioning), or explicitely over
+ SAGA config files or environment variables\footnote{only some of
+ these policies are implemented at the moment.}. Future SAGA
+ extensions, in particular Resource Discovery and Resource
+ Reservation extensions, may have a more direct and explicit notion
+ of resource lifetime management.
+
+
+ \subsection{Globus Adaptors}
+
+ SAGA's Globus adaptor suite belongs is amongst the most-utilized
+ adaptors. As with ssh, security credentials are expected to be
+ managed out-of-bounds, but different credentials can be utilized by
+ pointing \T{saga::context} instances to them as needed. Other than
+ the aws and ssh adaptors, the Globus adaptors do not rely on command
+ line tools, but rather link directly against the respective Globus
+ libraries: the Globus job adaptor is thus a gram client, the Globus
+ file adaptor a gridftp client.
+
+ In the presented experiments, non-cloud jobs have been started
+ either by using gram or ssh. In either case, file I/O has been
+ performed either via ssh, or via a shared Lustre filesystem -- the
+ gridftp functionality has thus not been tested in these
+ experiments\footnote{For performance comparision between the Lustre
+ FS and GridFTP, see\ref{micelis}.}.
+
File [added]: application_setup.tex
Delta lines: +97 -0
===================================================================
--- papers/clouds/application_setup.tex 2009-01-26 04:00:07 UTC (rev 879)
+++ papers/clouds/application_setup.tex 2009-01-26 10:31:17 UTC (rev 880)
@@ -0,0 +1,97 @@
+
+ The single most prominent feature of ous SAGA based MapReduce
+ implementation is the ability to run the application withoude code
+ changes in a wide range of infrastructures, such as clusters, Grids,
+ Clouds, and in fact any other local or distributed compute system
+ which can be accessed by the respective set of SAGA adaptors. When
+ deploying compute clients on a \I{diverse} set of remote nodes, the
+ question arises if and how these clients need to be configured to
+ function properly in the overall application scheme.
+
+ Our MapReduce compute clients (aka 'workers') require two
+ pieces of information to function: (a) the contact address of the
+ advert service used for coordinating the clients, and for
+ distributing work items to them; and (b) a unique worker ID to
+ register with in that advert service, so that the master can start to
+ assign work items. Both information are provided via command line
+ parameters to the worker, at startup time.
+
+ The master application requires a number of additional information:
+ it needs a set of systems where the workers are supposed to be
+ running, the location of the input data, the location of the output
+ data, and also the contact point for the advert service for
+ coordination and communication.
+
+ A typical configuration file looks like this (slightly shortened for
+ presentation):
+
+ \verb|
+ <?xml version="1.0" encoding="..."?>
+ <MRDL version="1.0" xmlns="..." xmlns:xsi="..."
+
+ <MapReduceSession name="WordCount" ...>
+
+ <OrchestratorDB>
+ <Host> advert://fortytwo.cct.lsu.edu/ </Host>
+ </OrchestratorDB>
+
+ <TargetHosts>
+ <Host OS="globus" ...> gram://qb1.loni.org:2119/jobmanager-pbs </Host>
+ <Host OS="ec2" ...> ec2://i-760c8c1f/ </Host>
+ <Host OS="ec2" ...> ec2:// </Host>
+ </TargetHosts>
+
+ <ApplicationBinaries>
+ <BinaryImage arch="i386" OS="globus" ...> /lustre/merzky/saga/bin/mapreduce_worker </BinaryImage>
+ <BinaryImage arch="i386" OS="ec2" ...> /usr/local/saga/bin/mapreduce_worker </BinaryImage>
+ </ApplicationBinaries>
+
+ <OutputPrefix>any://qb3.loni.org/lustre/merzky/mapreduce/</OutputPrefix>
+
+ <ApplicationFiles>
+ <File> any://merzky@qb4.loni.org/lustre/merzky/mapreduce/1GB.txt </File>
+ </ApplicationFiles>
+
+ </MapReduceSession>
+
+ </MRDL>
+ |
+
+ In this example, we will create three worker instances: on is started
+ via gram and PBS on qb1.loni.org, one is started on a
+ pre-instantiared ec2 image (instance-id \T{i-760c8c1f}), and one will
+ be running on a dynamically deployed ec2 instance (no instance id
+ given). Note that the startup times for the individual workers may
+ vary over several orders of magnitutes, depending on the PBS queue
+ waiting time and VM startup time. The mapreduce master will start to
+ utilize workers as soon as they are able to register themselfs, so
+ will not wait until all workers are available. That mechanism both
+ minimizes time-to-solution, and maximizes resilience against worker
+ loss.
+
+ The example configuration file above also includes another important
+ feature, in the URL of the input data set, which is given as
+ \T{any://merzky@qb4.loni.org/lustre/merzky/mapreduce/1GB.txt}. The
+ scheme \T{any} acts here as a placeholder for SAGA, so that the SAGA
+ engine can choose whatever adaptor fits the task best. The master
+ would access the file via the default local file adaptor. The Globus
+ clients may use either the GridFTP or ssh adaptor for remote file
+ success (but in our experimental setup would actually also suceed
+ with using the local file adaptor, as the lustre FS is mounted on the
+ cluster nodes), and the ec2 workers would use the ssh file adaptor
+ for remote access. Thus, the use of the placeholder scheme frees us
+ from specifying and maintaining a concise list of remote data access
+ mechanisms per worker. Also, it allows for additional resilience
+ against service errors and changing configurations, as it leaves it
+ up to the SAGA engine's adaptor selection mechanism to fund a
+ suitable access mechanism at runtime -- as we have seen above, the
+ globus nodes can utilize a variety of mechanisms for accessing the
+ data in question.
+
+ % include as needed
+ A parameter not shown in the above configuration example controls the
+ number of workers created on each compute node. By increasing that
+ number, the chances are good that copute and communication times can
+ be interleaved, and that the overall system utilization can increase.
+
+
File [added]: cloud_setup.tex
Delta lines: +54 -0
===================================================================
--- papers/clouds/cloud_setup.tex 2009-01-26 04:00:07 UTC (rev 879)
+++ papers/clouds/cloud_setup.tex 2009-01-26 10:31:17 UTC (rev 880)
@@ -0,0 +1,54 @@
+
+ In order to fully utilize cloud infrastructures for SAGA
+ applications, the VM instances need to fullfill a couple or
+ prerequisites: the SAGA libraries and its dependencies need to be
+ deployed, as need some external tools which are used by the SAGA
+ adaptors at runtime, such as ssh, scp, and sshfs. The latter needs
+ the FUSE kernel module to function -- so if remote access to the
+ cloud compute node's file system is wanted, the respective kernel
+ module needs to be installed as well.
+
+ There are two basic options to achieve the above: either a
+ customized VM image which includes the respecitve software is used;
+ or the respective packages are installed after VM instantiation, on
+ the fly. Hybrid approaches are possible as well of course.
+
+ We support the runtime configuration of VM instances by staging a
+ preparation script to the VM after its creation, and executing it
+ with root permissions. In particular for apt-get linux distribution,
+ the post-instantiation software deployment is actually fairly
+ painless, but naturally adds a significant amount of time to the
+ overall VM startup\footnote{The long VM startup times encourage the
+ use of SAGA's asynchronous operations.}.
+
+ For the presented experiments, we prepared custom VM images with all
+ prerequisites pre-installed. We utilize the preparation script
+ solely for some fine tuning of parameters: for example, we are able
+ to deploy custom saga.ini files, or ensure the finalization of
+ service startups before application deployment\footnote{For example,
+ when starting SAGA applications are started befor the VM's random
+ generator is initialized, our current uuid generator fails to
+ function properly -- the preperation script checks for the
+ availability of proper uuids, and delays the application deployment
+ as needed.}.
+
+ % as needed:
+ Eucalyptus and Nimbus VM images \amnote{please confirm for Nimbus}
+ are basically customized Xen hypervisor images, as are amazons VM
+ images. Customized means in this context that the images are
+ accompanied by a set of metadata which tie it to specific kernel and
+ ramdisk images. Also, the images contain specific configurations and
+ startup services which allow the VM to bootstrap cleanly in the
+ respective Cloud enviroment, e.g. to obtain the enccessary user
+ credentials, and tp perform the wanted firewall setup etc.
+
+ As these systems all use Xen based images, a conversion of these
+ images for the different cloud systems should be straight forward.
+ The sparse documentation and lack of automatic tools, however, amount
+ to a certain challenge to that, at least to the average end user.
+ Compared to that, the derivation of customized images frim existing
+ images is well documented and tool supported, as long as the target
+ image is to be used in the same Cloud system as the original one.
+
+ % add text about gumbo cloud / EPC setup here, if we need / want it
+
More information about the saga-devel
mailing list