[Saga-devel] saga-projects SVN commit 863: /papers/clouds/

Sat Jan 24 06:48:45 CST 2009

User: sjha
Date: 2009/01/24 06:48 AM

Modified:
 /papers/clouds/
  saga_cloud_interop.tex

Log:
 refined introduction
    added notes (for further elaboration) in the last section
    WIP

File Changes:

Directory: /papers/clouds/
==========================

File [modified]: saga_cloud_interop.tex
Delta lines: +60 -42
===================================================================

--- papers/clouds/saga_cloud_interop.tex	2009-01-22 16:27:29 UTC (rev 862)
+++ papers/clouds/saga_cloud_interop.tex	2009-01-24 12:48:40 UTC (rev 863)
@@ -48,7 +48,7 @@
 % \title{SAGA-MapReduce: Providing Infrastructure Independence and
 %   Cloud-Grid Interoperability}
 \title{Application Level Interoperability between Clouds and Grids}
-\author{Andre Merzky$^{1}$, Shantenu Jha$^{123}$, Kate Stamou$^{1}$\\
+\author{Andre Merzky$^{1}$,  Kate Stamou$^{1}$, Shantenu Jha$^{123}$\\
   \small{\emph{$^{1}$Center for Computation \& Technology, Louisiana
       State University, USA}}\\
   \small{\emph{$^{2}$Department of Computer Science, Louisiana State
@@ -80,38 +80,25 @@
 \maketitle
 
 \begin{abstract}
-  The landscape of computing is getting Cloudy.
-  
-  There exist both technical reasons and social engineering problems
-  responsible for low uptake of Grids. One universally accepted reason
-  is the complexity of Grid systems -- the interface, software stack
-  and underlying complexity of deploying distributed application.
-
+%  The landscape of computing is getting Cloudy.
   SAGA is a high-level programming interface which provides the
   ability to create distributed applications in an infrastructure
-  independent way. 
-
-  In an earlier paper, we discussed how we have developed MapReduce
-  using SAGA, and how  a SAGA-based MapReduce provided 
-  i) infrastructure independence and ii) 
-  could be used to utilize distributed infr
-
-  In this paper, we show how MapReduce has been implemented using SAGA
-  and demonstrate its interoperability across Clouds and Grids.  We
-  discuss how a range of {\it cloud adapters} have been developed for
-  SAGA.  We discuss the advantages of programmatically developing
-  MapReduce using SAGA, by demonstrating that the SAGA-based
-  implementation is infrastructure independent whilst still providing
-  control over the deployment, distribution and run-time
-  decomposition.  .... The ability to control the distribution and
-  placement of the computation units (workers) is critical in order to
-  implement the ability to move computational work to the data. This
-  is required to keep data network transfer low and in the case of
-  commercial Clouds the monetary cost of computing the solution low...
-  Using data-sets of size up to 10GB, and up to 10 workers, we provide
-  detailed performance analysis of the SAGA-MapReduce implementation,
-  and show how controlling the distribution of computation and the
-  payload per worker helps enhance performance.
+  independent way.  In an earlier paper, we discussed how we have
+  developed MapReduce using SAGA, and how a SAGA-based MapReduce
+  provided i) infrastructure independence and ii) could be used to
+  utilize distributed infrastructure. In this paper, we use a
+  SAGA-based implementation of MapReduce, and demonstrate its
+  interoperability across Clouds and Grids.  We discuss how a range of
+  {\it cloud adapters} have been developed for SAGA.  The major
+  contribution of this paper is the demonstration -- possibly the
+  first ever, of interoperability between different Clouds and Grids,
+  without any changes to the application. Interestingly it is not the
+  case that our application uses different Clouds and Grids at
+  different instances of time, but that \sagamapreduce uses multiple,
+  different, heterogenous infrastracture concurrently and for the same
+  application. We do not focus on performance, but the
+  proof-of-concept and the illustrate the importance and power of
+  application-level interoperabilty.
 \end{abstract}
 
 \section{Introduction} 
@@ -123,6 +110,26 @@
 processing of data, it can be argued that there is a greater premium
 than ever before on abstractions at multiple levels.
 
+
+  There exist both technical reasons and social engineering problems
+  responsible for low uptake of Grids. One universally accepted reason
+  is the complexity of Grid systems -- the interface, software stack
+  and underlying complexity of deploying distributed application.
+
+  We discuss the advantages of programmatically developing MapReduce
+  using SAGA, by demonstrating that the SAGA-based implementation is
+  infrastructure independent whilst still providing control over the
+  deployment, distribution and run-time decomposition.  .... The
+  ability to control the distribution and placement of the computation
+  units (workers) is critical in order to implement the ability to
+  move computational work to the data. This is required to keep data
+  network transfer low and in the case of commercial Clouds the
+  monetary cost of computing the solution low...  Using data-sets of
+  size up to 10GB, and up to 10 workers, we provide detailed
+  performance analysis of the SAGA-MapReduce implementation, and show
+  how controlling the distribution of computation and the payload per
+  worker helps enhance performance.
+
 Although Clouds are a nascent infrastructure, with the
 force-of-industry behind their development and uptake (and not just
 the hype), their impact can not be ignored.  Specifically, with the
@@ -182,6 +189,9 @@
 and test for inter-operability between different flavours of Clouds as
 well as between Clouds and Grids.
 
+
+
+
 \section{SAGA}
 SAGA~\cite{saga-core} is a high level API that provides a simple,
 standard and uniform interface for the most commonly required
@@ -685,14 +695,6 @@
     leading to an overall reduction in $T_c$. The advantages of a
     greater number of workers is manifest for larger data-sets.}
 \label{grids1}
-\upp
-\upp
-\upp
-\upp
-\upp
-\upp
-\upp
-\upp
 \end{figure}
 
 {\bf SAGA-MapReduce on Cloud-like infrastructure: } Accounting for the
@@ -809,11 +811,9 @@
 \section*{Notes}
 
 \subsubsection*{Why Interoperability:}
-
 \begin{itemize}
 \item Intellectual curiosity, what programming challenges does this 
   bring about?
-\item   
 \item Infrastructure independent programming
 \item Here we discuss homgenous workers, but workers (tasks) can be
 heterogenous and thus may have greater data-compute affinity  or
@@ -823,7 +823,26 @@
 explicity  (already discussed)
 \end{itemize}
 
+\subsubsection*{Grid vs Cloud Interoperabiltiy}
 
+\begin{itemize}
+\item Clouds provide services at different levels (Iaas, PaaS, SaaS);
+  standard interfaces to these different levels do not
+  exist. Immediate Consequence of this is the lack of interoperability
+  between today's Clouds; though there is little buisness motivation
+  for Cloud providers to define, implement and support new/standard
+  interfaces, there is a case to be made that applications would
+  benefit from multiple Cloud interoperability.  Even better if
+  Cloud-Grid interoperabilty came about for free!
+
+\item How does Interoperabiltiy in Grids differ from interop on
+  Clouds.  Many details, but if taken from the Application level
+  interoperabiltiy the differences are minor and inconsequential.
+\end{itemize}
+
+Mention Sawzall as a language that builds upon MapReduce; once could
+build Sawzall using SAGA.
+
 \subsubsection*{Network, System Configuration and Experiment Details}
 
 GumboGrid 
@@ -833,7 +852,6 @@
 All this is new technology, hence makes sense to try to list some of
 the challenges we faced
 
-
 Discuss affinity: Current Clouds compute-data affinity 
 
 Simplicity of Cloud interface: While certainly not true of all cases,