[Saga-devel] saga-projects SVN commit 876: /papers/clouds/

Sun Jan 25 17:33:54 CST 2009

User: sjha
Date: 2009/01/25 05:33 PM

Modified:
 /papers/clouds/
  saga_cloud_interop.tex

Log:
 added some text from andre
   on cloud adapters

File Changes:

Directory: /papers/clouds/
==========================

File [modified]: saga_cloud_interop.tex
Delta lines: +233 -4
===================================================================

--- papers/clouds/saga_cloud_interop.tex	2009-01-25 21:47:27 UTC (rev 875)
+++ papers/clouds/saga_cloud_interop.tex	2009-01-25 23:33:53 UTC (rev 876)
@@ -75,6 +75,15 @@
 \newcommand{\upp}{\vspace*{-0.5em}}
 \newcommand{\up}{\vspace*{-0.25em}}
 
+\newcommand{\T}[1]{\texttt{#1}}
+\newcommand{\I}[1]{\textit{#1}}
+\newcommand{\B}[1]{\textbf{#1}}
+
+\newcommand{\ssh}[1]{\texttt{ssh}}
+\newcommand{\scp}[1]{\texttt{scp}}
+\newcommand{\sshfs}[1]{\texttt{sshfs}}
+ 
+
 \begin{document}
 
 \maketitle
@@ -402,13 +411,202 @@
 
 \subsection{Clouds Adaptors: Design and Implementation}
 
+ % this section describes how the adaptors used for the experiments
+ % have been implemented.  It assumes that the adaptor based
+ % architecture of SAGA has (shortly) been explained before.
 
+ The adaptor implementation for the presented Cloud-Grid
+ interoperabilty experiments is rather straight forward. 
+ 
+ This section describes the various sets of adaptors used for the
+ presented Cloud-Grid interoperabilty experiments.  
 
+
+ \subsection{Local Adaptors}
+
+  Although SAGA's default local adaptors have not much to do with the
+  rpesented experiments, its importance for the used implementation of
+  the various used remote adaptors will become clear later on.
+
+  The local job adaptor is utilizing \T{boost::process} (on Windows)
+  and plain \T{fork/exec} (on Unix derivates) to spawn, control and
+  watch local job instances.  The local file adaptor is using the
+  \T{boost::filesystem} classes for filesystem navigation, and
+  \T{std::fstream} for local file I/O. % 'nuf said? 
+
+
+ \subsection{SSH adaptors}
+
+ The SSH adaptors are based on three different command line tools,
+ namely {\texttt ssh, scp} and {\texttt sshfs}.  Further, all ssh
+ adaptors rely on the availability of ssh security credentials for
+ remote operations.  The ssh context adaptor implements some
+ mechanisms to (a) discover available keypairs automatically, and (b)
+ to verify the validity and usability of the found and otherwise
+ specified credentials.
+  
+  \ssh is used to spawn remote job instances.  For that, the ssh job
+  adaptor instanciates a \I{local} \T{saga::job::service} instance,
+  and submits the respective ssh command lines to it.  The local job
+  adaptor described above is then taking care of process I/O,
+  detachement, etc.
+
+  A significant drawback of that approach is that several SAGA methods
+  act upon the local ssh process instead of the remote application
+  instance.  That is clearly not wanted.  Some of these operations can
+  be mitigated to the remote hosts, via separate ssh calls, but that
+  process is complicated due to the fact that ssh is not reporting the
+  remote process ID back to the local job adaptor.  We circumvent that
+  problem by setting a uniquely identifying environment variable for
+  the remote process, which allows us to identify that
+  process\footnote{that scheme is not completely implemented, yet}.
+
+  \sshfs is used to access remote files via ssh services.  \sshfs is a
+  user space file system driver which uses FUSE\ref{fuse}, and is
+  available for MacOS, Linux, and some other Unix derivates.  It
+  allows to mount a remote file system into the local namespace, and
+  transparently forwards all file access operations via ssh to the
+  remote host.  The ssh file adaptor uses the localo job adaptor to
+  call the sshfs process, to mount the remote filesystem, and then
+  forward all file access requests to the local file adaptor, which
+  operates on the locally mounted file system.  The ssh adaptor is
+  thereby translating URLs from the ssh namespace into the local
+  namespace, and back.
+
+  \scp is used by both the ssh job and file adaptor to transfer
+  utility scripts to the remote host, e.g. to check for remote system
+  configuration, or to distribute ssh credentials.
+
+  
+  \subsubsection{SSH/SSHFS credential management}
+
+   When starting a remote application via ssh, we assume valid SSH
+   credentials (i.e. private/public key pairs, or gsi credentials
+   etc.) to be available.  The type and location of these credentials
+   is specified by the local application, by using respective
+   \T{saga::context} instances.  In order to facilitate home-calling,
+   i.e. the ability of the remotely started application to use the
+   same ssh infrastructure to call back to the original host, e.g. by
+   spawning jobs in the opposite direction, or by accessing the
+   original host's file system via sshfs, we install the originally
+   used ssh credential in a temporary location on the remote host.
+   The remote application is informed about these cedentials, and the
+   ssh context adaptor picks them up by default, so that home-calling
+   is available w/o the need for any application level intervention.
+   Also, a respective entry to the local \T{authorized\_keys} file is
+   added\footnote{ssh key distribution is optional, and disabled by
+   default}.
+
+   For example, the following pseudo code would be possible
+
+   \begin{verbatim}
+   --- local application -------------------
+   saga::context c ("ssh", "$HOME/.ssh/my_ssh_key");
+   saga::session s (c);
+
+   saga::job::service js (s, "ssh://remote.host.net");
+   saga::job::job     j = js.run_job ("saga-ls ssh://local.host.net/data/");
+   -----------------------------------------
+
+   --- remote application (saga-ls) --------
+   saga::context c ("ssh"); // pick up defaults
+   saga::session s (c);
+
+   saga::filessystem::directory d (argv[1]);
+   std::vector <saga::url> ls = d.list ();
+   ...
+   -----------------------------------------
+\end{verbatim}
+
+
+   The remote application would ultimately call \sshfs (see above) to
+   mount the original filesystem, and then use the local job adaptor
+   to access that mounted file system for I/O.  The complete key
+   management is transparent.
+
+
+ 
+ \subsection{AWS adaptors}
+
+  SAGA's AWS\footnote{\B{A}mazon \B{W}eb \B{S}ervices} adaptor suite
+  interfaces to services which implement the cloud web service
+  interfaces as specified by Amazon\ref{aws-devel-url}.  These
+  interfaces are not only used by Amazon to allow programmatic access
+  to their Cloud infrastructures EC2 and S3, amongst others, but are
+  also used by several other Cloud service providers, such as
+  Eucalyptus\ref{euca} and Nimbus\ref{nimbus}.  The AWS job adaptor is
+  thus able to interface to a variety of Cloud infrastructures, as
+  long as they adhere to the AWS interfaces.
+
+  The AWS adaptors are not directly communication with the remote
+  services, but instead rely on Amazon's set of java based command
+  line tools.  Those are able to access the different infrastructures,
+  when configured correctly via specific environment variables.
+
+  The aws job adaptor is using the local job adaptor to manage the
+  invocation of the command line tools, e.g. to spawn new virtual
+  machine (VM) instances, to search for existing VM instances, etc.
+  Once a VM instance is found to be available and ready to accept
+  jobs, a ssh job service instance for that VM is created, and is
+  henceforth taking care of all job management operations.  The aws
+  job adaptor is thuse only respnsoble for VM discovery and management
+  -- the actual job creation and operations are performed by the ssh
+  job adaptor (which in turn utilizes the local job adaptor for its
+  operations).
+
+  The security credentials to be used by the internal ssh job service
+  instance are derived from the security credentials used to create or
+  access the VM instance: upon VM instance creation, a aws keypair is
+  used to authenticate the user against her 'cloud account'.  That
+  keypair is automatically registered at the new VM instance to allow
+  for remote ssh access.  The aws context adaptor is collecting both
+  the public and private aws keys\footnote{The public key needs to be
+  collected from the remote instance}, creates a respective ssh context,
+  and thus allows the ssh adaptors to perform job and file based SAGA
+  operations on the VM instance.
+
+  Note that there is an important semantic difference between 'normal'
+  (e.g. grid based) and 'cloud' job services in SAGA: a normal job
+  service is assumed to have a lifetime which is completely
+  independent from the application which accesses that service.  For
+  example, a Gram gatekeeper has a lifetime of days and weeks, and
+  allows a large number of application to utilize it.  A aws job
+  service however points to a potentially  volatile resource, or even
+  to a non-existing resource -- the resource needs then to be created
+  on the fly.
+
+  That has two important implications.  For one, the startup time for
+  a aws job service is typically much larger than for other remote job
+  service, at least in the case where a VM is created on the fly: the
+  VM image needs to be deployed to some remote resource, the image
+  must be booted, and potentially needs to be configured to enable the
+  hosting of custom applications\footnote{The aws job adaptor allows
+  to execute custom startup scripts on newly instantiated VMs, to
+  allow for example to install additional software packages, or to
+  test for the availaility of certain resources.}.
+
+  The second implication is that the \I{end} of the job service
+  lifetime is usually of no consequence for normal remote job
+  services.  For a dynMICly provisioned VM instance, however, it
+  raises the question if that instance should be closed down, or if it
+  should automatically shut down after all remote applications
+  finished, or if it should  survive for a specific time, or forever.
+  Ultimately, it is not possible to control these VM lifetime
+  attributes via the current SAGA API (by design).  Instead, we
+  allow to choose one of these policies either implicitely (e.g. by
+  using special URLs to request dynamic provisioning), or explicitely
+  over SAGA config files or environment variables\footnote{only some
+  of these polcies are implemented at the moment.}.  Future SAGA
+  extensions, in particular Resource Discovery and Resource
+  Reservation extensions, may have a more direct and explicit notion
+  of resource lifetime management.
+
+
 \begin{figure}[!ht]
-\upp
+\upp 
  \begin{center}
   \begin{mycode}[label=SAGA Job Launch via GRAM gatekeeper]
-   { // contact a GRAM gatekeeper
+  { // contact a GRAM gatekeeper
     saga::job::service     js;
     saga::job::description jd;
     jd.set_attribute (``Executable'', ``/tmp/my_prog'');
@@ -425,6 +623,34 @@
 \upp
 \end{figure}
 
+
+% \begin{figure}[!ht]
+% \upp 
+%  \begin{center}
+%   \begin{mycode}[label=Stuff]
+%    --- local application -------------------
+%    saga::context c ("ssh", "$HOME/.ssh/my_\ssh_\key");
+%    saga::session s (c);
+
+%    saga::job::service js (s, "ssh://remote.host.net");
+%    saga::job::job     j = js.run_\job ("saga-ls ssh://local.host.net/data/");
+%    -----------------------------------------
+
+%    --- remote application (saga-ls) --------
+%    saga::context c ("ssh"); // pick up defaults
+%    saga::session s (c);
+
+%    saga::filessystem::directory d (argv[1]);
+%    std::vector <saga::url> ls = d.list ();
+%    ...
+%    -----------------------------------------
+%   \end{mycode}
+%   \caption{}
+%  \end{center}
+% \upp
+% \end{figure}
+
+
 \begin{figure}[!ht]
 \upp
  \begin{center}
@@ -821,7 +1047,7 @@
 \begin{tabular}{ccccc}
   \hline
   \multicolumn{2}{c}{Number-of-Workers}  &  data size   &  $T_c$  & $T_{spawn}$ \\   
-  TeraGrid &  AWS &   (GB)  & (sec) & (sec)  \\
+  TeraGrid &  AWS &   (MB)  & (sec) & (sec)  \\
   \hline
   6 & 0 & 10   &  153.5 & 103  \\
   10 & 0 & 10  &  433.0  & 299 \\
@@ -834,7 +1060,10 @@
   \hline \hline
 \end{tabular}
 \upp
-\caption{} 
+\caption{Performance data for different configurations of worker placements. The master is always on a desktop, with the choice of workers placed on either Clouds or on the TeraGrid (QueenBee). The configurations can be classified
+  as of three types -- all workers on EC2, all workers on the TeraGrid and workers divied between the TeraGrid and EC2. Every worker is assigned to a unique
+  VM. It is interesting to note the significant
+  spawning times, and its dependence on the number of VM. \jhanote{Andre you'll have to work with me to determine if I've parsed the data-files correctly} }
 \label{stuff}
 \upp
 \upp