[Saga-devel] saga-projects SVN commit 893: /papers/clouds/
sjha at cct.lsu.edu
sjha at cct.lsu.edu
Wed Jan 28 00:51:50 CST 2009
User: sjha
Date: 2009/01/28 12:51 AM
Modified:
/papers/clouds/
saga_cloud_interop.tex
Log:
further refinements and redundancy removed
File Changes:
Directory: /papers/clouds/
==========================
File [modified]: saga_cloud_interop.tex
Delta lines: +94 -106
===================================================================
--- papers/clouds/saga_cloud_interop.tex 2009-01-28 06:19:30 UTC (rev 892)
+++ papers/clouds/saga_cloud_interop.tex 2009-01-28 06:51:48 UTC (rev 893)
@@ -971,37 +971,34 @@
against service errors and changing configurations, as it leaves it up
to the SAGA engine's adaptor selection mechanism to find a suitable
access mechanism at runtime.
-
+%A parameter not shown in the above configuration example
+A simple parameter can control the number of workers created on each
+compute node, by varying which, the chances are good that compute and
+communication times can be interleaved, and that the overall system
+utilization can increase (especially in the absence of precise
+knowledge of the execution system).
+
% As we have seen above, the globus nodes
% can utilize a variety of mechanisms for accessing the data in
% question.
-
% include as needed
-A parameter not shown in the above configuration example controls the
-number of workers created on each compute node. By increasing that
-number, the chances are good that copute and communication times can
-be interleaved, and that the overall system utilization can increase.
-
\section{SAGA-MapReduce on Clouds and Grids}
-... Thanks to the low overhead of developing adaptors, SAGA has been
-deployed on three Cloud Systems -- Amazon, Nimbus~\cite{nimbus} and
+% \subsection*{Infrastructure Used} We first describe the infrastructure
+% that we employ for the interoperabilty tests. \jhanote{Kate}
+% {\it Amazon EC2:}
+% {\it Eucalyptus, ECP:}
+% {\it Eucalyptus, GumboCloud:}
+% % And describe in a few sentences.
+
+Thanks to the low overhead of developing adaptors, SAGA has been
+deployed on three Cloud Systems -- Amazon,
Eucalyptus~\cite{eucalyptus} (we have a local installation of
-Eucalyptus, referred to as GumboCloud). In this paper, we focus on
-EC2 and Eucalyptus.
+Eucalyptus at LSU -- named GumboCloud) and Nimbus. In this paper, we
+focus on EC2 and Eucalyptus.
+\subsection{Deployment Details}
-\subsection*{Infrastructure Used} We first describe the infrastructure
-that we employ for the interoperabilty tests. \jhanote{Kate}
-
-{\it Amazon EC2:}
-
-{\it Eucalyptus, ECP:}
-
-{\it Eucalyptus, GumboCloud:}
-
-% And describe in a few sentences.
-
In order to fully utilize cloud infrastructures for SAGA applications,
the VM instances need to fullfill a couple or prerequisites: the SAGA
libraries and its dependencies need to be deployed, need some external
@@ -1019,110 +1016,102 @@
root permissions. In particular for apt-get linux distribution, the
post-instantiation software deployment is actually fairly painless,
but naturally adds a significant amount of time to the overall VM
-startup\footnote{The long VM startup times encourage the use of SAGA's
- asynchronous operations.}.
+startup (which encourages the use of asynchronous operations)
+% \footnote{The long VM startup times encourage the use of SAGA's
+% asynchronous operations.}.
+For the experiments in this paper, we prepared custom VM images with
+all prerequisites pre-installed. We utilize the preparation script
+solely for some fine tuning of parameters: for example, we are able to
+deploy custom saga.ini files, or ensure the finalization of service
+startups before application deployment\footnote{For example, when
+ starting SAGA applications are started befor the VM's random
+ generator is initialized, our current uuid generator failed to
+ function properly -- the preparation script checks for the
+ availability of proper uuids, and delays the application deployment
+ as needed.}.
-For the presented experiments, we prepared custom VM images with all
-prerequisites pre-installed. We utilize the preparation script solely
-for some fine tuning of parameters: for example, we are able to deploy
-custom saga.ini files, or ensure the finalization of service startups
-before application deployment\footnote{For example, when starting SAGA
- applications are started befor the VM's random generator is
- initialized, our current uuid generator fails to function properly
- -- the preperation script checks for the availability of proper
- uuids, and delays the application deployment as needed.}.
-
- % as needed:
Eucalyptus VM images are basically customized Xen hypervisor images,
as are EC2 VM images. Customized in this context means that the
images are accompanied by a set of metadata which tie it to a specific
kernel and ramdisk images. Also, the images contain specific
configurations and startup services which allow the VM to bootstrap
cleanly in the respective Cloud enviroment, e.g. to obtain the
-enccessary user credentials, and to perform the wanted firewall setup
-etc.
-
-As these systems all use Xen based images, a conversion of these
+neccessary user credentials, and to perform the wanted firewall setup
+etc. As these systems all use Xen based images, a conversion of these
images for the different cloud systems is in principle
straight-forward. But sparse documentation and lack of automatic
-tools however, amount to a certain challenge to that, at least to the
-average end user. Compared to that, the derivation of customized
-images frim existing images is well documented and tool supported, as
+tools however, make it challenging, at least to the average end
+user. Compared to that, the derivation of customized images from
+existing images is well documented with well supported tools -- as
long as the target image is to be used in the same Cloud system as the
original one.
- % add text about gumbo cloud / EPC setup here, if we need / want it
-
-
-
-
-\subsection{Deployment Details}
-
-We have also deployed \sagamapreduce to work on Cloud platforms. It
-is critical to mention that the \sagamapreduce code did not undergo
-any changes whatsoever. The change lies in the run-time system and
-deployment architecture. For example, when running \sagamapreduce on
-EC2, the master process resides on one VM, while workers reside on
-different VMs. Depending on the available adaptors, Master and Worker
-can either perform local I/O on a global/distributed file system, or
-remote I/O on a remote, non-shared file systems. In our current
-implementation, the VMs hosting the master and workers share the same
-ssh credentials and a shared file-system (using sshfs/FUSE).
+% We have also deployed \sagamapreduce to work on Cloud platforms. It
+% is critical to mention that the \sagamapreduce code did not undergo
+% any changes whatsoever.
+In executing \sagamapreduce on different Clouds, the change lies in
+the run-time system and deployment architecture. For example, when
+running \sagamapreduce on EC2, the master process resides on one VM,
+while workers reside on different VMs. Depending on the available
+adaptors, Master and Worker can either perform local I/O on a
+global/distributed file system, or remote I/O on a remote, non-shared
+file systems. % In our current implementation, the VMs hosting the
+% master and workers share the same ssh credentials and a shared
+% file-system (using sshfs/FUSE).
Application deployment and configuration (as discussed above) are also
-performed via that sshfs. \jhanote{Andre, Kate please add on the
- above..}
+performed via that sshfs. On EC2, we created custom virtual machine
+(VM) image with pre-installed SAGA. For Eucalyptus, a boot strapping
+script equips a standard VM instance with SAGA, and SAGA's
+prerequisites (mainly boost). To us, a mixed approach seemed most
+favourable, where the bulk software installation is statically done
+via a custom VM image, but software configuration and application
+deployment are done dynamically during VM startup.
-On EC2, we created custom virtual machine (VM) image with
-pre-installed SAGA. For Eucalyptus, a boot strapping script equips a
-standard VM instance with SAGA, and SAGA's prerequisites (mainly
-boost). To us, a mixed approach seemed most favourable, where the
-bulk software installation is statically done via a custom VM image,
-but software configuration and application deployment are done
-dynamically during VM startup. \jhanote{more details on how we create
- VMs, how we launch jobs and transfer files to these backend}
+%\jhanote{Andre, Kate please add on the above..}
-\subsection{Demonstrating Cloud-Grid Interoperabilty}
+\subsection{Demonstrating Interoperabilty}
-There are several aspects to Cloud Interoperability. A simple form of
+There are several aspects to Interoperability. A simple form of
interoperability -- more akin to inter-changeable -- is that any
application can use either of the three Clouds systems without any
changes to the application: the application simply needs to
instantiate a different set of security credentials for the respective
-runtime environment, aka cloud. Interestingly, SAGA provides this
-level of interoperability quite trivially thanks to the adaptors. By
-almost trivial extension, SAGA also provides Grid-Cloud
-interoperability, as shown in Fig.~\ref{gramjob} and ~\ref{vmjob},
-where exactly the same interface and functional calls lead to job
-submission on Grids or on Clouds. Although syntactically identical,
-the semantics of the calls and back-end management are somewhat
-different. For example, for Grids, a \texttt{job\_service} instance
-represents a live job submission endpoint, whilst for Clouds it
-represents a VM instance created on the fly.
+runtime environment; we refer to this as Cloud-Cloud
+interoperabilty. By almost trivial extension, SAGA also provides
+Grid-Cloud interoperability, as shown in Fig.~\ref{gramjob} and
+~\ref{vmjob}, where exactly the same interface and functional calls
+lead to job submission on Grids or on Clouds. Although syntactically
+identical, the semantics of the calls and back-end management are
+somewhat different. As discussed, SAGA provides interoperability
+quite trivially thanks to the dynamic loading of adaptors.
-\subsection{Experiments} In an earlier paper
-(Ref~\cite{saga_ccgrid09}), we had carried out the following tests, to
-demonstrate how \sagamapreduce utilizes different infrastructrure and
-control over task-data placement, and gain insight into performance on
-``vanilla'' Grids. Some specific tests we performed and used to
-understand performance of \sagamapreduce in distributed environments
-were:
-\begin{enumerate}
-\item We began by distributing \sagamapreduce workers (compute) and
- the data they operate on locally. We varied the number of workers
- vary from 1 to 10, and the data-set sizes varying from 1 to
- 10GB.
-\item We then understood the effect of performance when using a
- distributed FS, that is we had \sagamapreduce workers compute local
- (to master), but using a distributed FS (HDFS). We varied
- the underlying distributed-FS (used KFS in lieu of HDFS).
+% For example, for Grids, a \texttt{job\_service} instance
+% represents a live job submission endpoint, whilst for Clouds it
+% represents a VM instance created on the fly.
+
+% Some specific tests we performed and used to understand performance
+% of \sagamapreduce in distributed environments were:
+% \begin{enumerate}
+% \item We began by distributing \sagamapreduce workers (compute) and
+% the data they operate on locally. We varied the number of workers
+% vary from 1 to 10, and the data-set sizes varying from 1 to
+% 10GB.
+% \item We then understood the effect of performance when using a
+% distributed FS, that is we had \sagamapreduce workers compute local
+% (to master), but using a distributed FS (HDFS). We varied
+% the underlying distributed-FS (used KFS in lieu of HDFS).
+% \end{enumerate}
% \item Same as Exp. \#2, but using a different distributed FS
% (KFS); the number of workers varies from 1-10
% \item We then distributed the \sagamapreduce workers distributed compute (workers) and distributed file-system (KFS)
% \item Distributed compute (workers) but using local file-systems (using GridFTP for transfer)
-\end{enumerate}
-Mirroring the same strucuture, in this paper, we perform the following
-experiments:
+\subsection{Experiments} In an earlier paper
+(Ref~\cite{saga_ccgrid09}), we performaed tests to demonstrate how
+\sagamapreduce utilizes different infrastructrure and provides control
+over task-data placement; this led to insight into performance on
+``vanilla'' Grids. Mirroring the same strucuture, in this paper, we
+perform the following experiments:
\begin{enumerate}
\item We take \sagamapreduce and compare its performance for the
following configurations when exclusively running in Clouds to the
@@ -1140,13 +1129,12 @@
(QB/TeraGrid) and Clouds (EC2, with one job per VM). We
compare the performance from the two hybrid (EC2-Grid,
EC2-Eucalyptus distribution) cases to the pure distributed case.
+\end{enumerate}
% \item Distributed compute (workers) but using GridFTP for
% transfer. This corresponds to the case where workers are able to
-% communicate directly with each other. \jhanote{I doubt we will get
-% to this scenario, hence if we can do the above three, that is more
-% than enough.}
-\end{enumerate}
-
+% communicate directly with each other. \jhanote{I doubt we will
+% get to this scenario, hence if we can do the above three, that
+% is more than enough.}
The primary aim of this work is to establish, via well-structured and
designed experiments, the fact that \sagamapreduce has been used to
demonstrate Cloud-Cloud interoperabilty and Cloud-Grid
@@ -1172,7 +1160,7 @@
\subsection{Results and Analysis}
-Our image size is ... \jhanote{fix and provide details}
+%Our image size is ... \jhanote{fix and provide details}
It takes SAGA about 45 seconds to instantiate a VM on Eucalyptus
\jhanote{Andre is this still true?} and about 100 seconds on average
More information about the saga-devel
mailing list