[Saga-devel] saga-projects SVN commit 863: /papers/clouds/
sjha at cct.lsu.edu
sjha at cct.lsu.edu
Sat Jan 24 06:48:45 CST 2009
User: sjha
Date: 2009/01/24 06:48 AM
Modified:
/papers/clouds/
saga_cloud_interop.tex
Log:
refined introduction
added notes (for further elaboration) in the last section
WIP
File Changes:
Directory: /papers/clouds/
==========================
File [modified]: saga_cloud_interop.tex
Delta lines: +60 -42
===================================================================
--- papers/clouds/saga_cloud_interop.tex 2009-01-22 16:27:29 UTC (rev 862)
+++ papers/clouds/saga_cloud_interop.tex 2009-01-24 12:48:40 UTC (rev 863)
@@ -48,7 +48,7 @@
% \title{SAGA-MapReduce: Providing Infrastructure Independence and
% Cloud-Grid Interoperability}
\title{Application Level Interoperability between Clouds and Grids}
-\author{Andre Merzky$^{1}$, Shantenu Jha$^{123}$, Kate Stamou$^{1}$\\
+\author{Andre Merzky$^{1}$, Kate Stamou$^{1}$, Shantenu Jha$^{123}$\\
\small{\emph{$^{1}$Center for Computation \& Technology, Louisiana
State University, USA}}\\
\small{\emph{$^{2}$Department of Computer Science, Louisiana State
@@ -80,38 +80,25 @@
\maketitle
\begin{abstract}
- The landscape of computing is getting Cloudy.
-
- There exist both technical reasons and social engineering problems
- responsible for low uptake of Grids. One universally accepted reason
- is the complexity of Grid systems -- the interface, software stack
- and underlying complexity of deploying distributed application.
-
+% The landscape of computing is getting Cloudy.
SAGA is a high-level programming interface which provides the
ability to create distributed applications in an infrastructure
- independent way.
-
- In an earlier paper, we discussed how we have developed MapReduce
- using SAGA, and how a SAGA-based MapReduce provided
- i) infrastructure independence and ii)
- could be used to utilize distributed infr
-
- In this paper, we show how MapReduce has been implemented using SAGA
- and demonstrate its interoperability across Clouds and Grids. We
- discuss how a range of {\it cloud adapters} have been developed for
- SAGA. We discuss the advantages of programmatically developing
- MapReduce using SAGA, by demonstrating that the SAGA-based
- implementation is infrastructure independent whilst still providing
- control over the deployment, distribution and run-time
- decomposition. .... The ability to control the distribution and
- placement of the computation units (workers) is critical in order to
- implement the ability to move computational work to the data. This
- is required to keep data network transfer low and in the case of
- commercial Clouds the monetary cost of computing the solution low...
- Using data-sets of size up to 10GB, and up to 10 workers, we provide
- detailed performance analysis of the SAGA-MapReduce implementation,
- and show how controlling the distribution of computation and the
- payload per worker helps enhance performance.
+ independent way. In an earlier paper, we discussed how we have
+ developed MapReduce using SAGA, and how a SAGA-based MapReduce
+ provided i) infrastructure independence and ii) could be used to
+ utilize distributed infrastructure. In this paper, we use a
+ SAGA-based implementation of MapReduce, and demonstrate its
+ interoperability across Clouds and Grids. We discuss how a range of
+ {\it cloud adapters} have been developed for SAGA. The major
+ contribution of this paper is the demonstration -- possibly the
+ first ever, of interoperability between different Clouds and Grids,
+ without any changes to the application. Interestingly it is not the
+ case that our application uses different Clouds and Grids at
+ different instances of time, but that \sagamapreduce uses multiple,
+ different, heterogenous infrastracture concurrently and for the same
+ application. We do not focus on performance, but the
+ proof-of-concept and the illustrate the importance and power of
+ application-level interoperabilty.
\end{abstract}
\section{Introduction}
@@ -123,6 +110,26 @@
processing of data, it can be argued that there is a greater premium
than ever before on abstractions at multiple levels.
+
+ There exist both technical reasons and social engineering problems
+ responsible for low uptake of Grids. One universally accepted reason
+ is the complexity of Grid systems -- the interface, software stack
+ and underlying complexity of deploying distributed application.
+
+ We discuss the advantages of programmatically developing MapReduce
+ using SAGA, by demonstrating that the SAGA-based implementation is
+ infrastructure independent whilst still providing control over the
+ deployment, distribution and run-time decomposition. .... The
+ ability to control the distribution and placement of the computation
+ units (workers) is critical in order to implement the ability to
+ move computational work to the data. This is required to keep data
+ network transfer low and in the case of commercial Clouds the
+ monetary cost of computing the solution low... Using data-sets of
+ size up to 10GB, and up to 10 workers, we provide detailed
+ performance analysis of the SAGA-MapReduce implementation, and show
+ how controlling the distribution of computation and the payload per
+ worker helps enhance performance.
+
Although Clouds are a nascent infrastructure, with the
force-of-industry behind their development and uptake (and not just
the hype), their impact can not be ignored. Specifically, with the
@@ -182,6 +189,9 @@
and test for inter-operability between different flavours of Clouds as
well as between Clouds and Grids.
+
+
+
\section{SAGA}
SAGA~\cite{saga-core} is a high level API that provides a simple,
standard and uniform interface for the most commonly required
@@ -685,14 +695,6 @@
leading to an overall reduction in $T_c$. The advantages of a
greater number of workers is manifest for larger data-sets.}
\label{grids1}
-\upp
-\upp
-\upp
-\upp
-\upp
-\upp
-\upp
-\upp
\end{figure}
{\bf SAGA-MapReduce on Cloud-like infrastructure: } Accounting for the
@@ -809,11 +811,9 @@
\section*{Notes}
\subsubsection*{Why Interoperability:}
-
\begin{itemize}
\item Intellectual curiosity, what programming challenges does this
bring about?
-\item
\item Infrastructure independent programming
\item Here we discuss homgenous workers, but workers (tasks) can be
heterogenous and thus may have greater data-compute affinity or
@@ -823,7 +823,26 @@
explicity (already discussed)
\end{itemize}
+\subsubsection*{Grid vs Cloud Interoperabiltiy}
+\begin{itemize}
+\item Clouds provide services at different levels (Iaas, PaaS, SaaS);
+ standard interfaces to these different levels do not
+ exist. Immediate Consequence of this is the lack of interoperability
+ between today's Clouds; though there is little buisness motivation
+ for Cloud providers to define, implement and support new/standard
+ interfaces, there is a case to be made that applications would
+ benefit from multiple Cloud interoperability. Even better if
+ Cloud-Grid interoperabilty came about for free!
+
+\item How does Interoperabiltiy in Grids differ from interop on
+ Clouds. Many details, but if taken from the Application level
+ interoperabiltiy the differences are minor and inconsequential.
+\end{itemize}
+
+Mention Sawzall as a language that builds upon MapReduce; once could
+build Sawzall using SAGA.
+
\subsubsection*{Network, System Configuration and Experiment Details}
GumboGrid
@@ -833,7 +852,6 @@
All this is new technology, hence makes sense to try to list some of
the challenges we faced
-
Discuss affinity: Current Clouds compute-data affinity
Simplicity of Cloud interface: While certainly not true of all cases,
More information about the saga-devel
mailing list