[Saga-devel] saga-projects SVN commit 870: /papers/clouds/
sjha at cct.lsu.edu
sjha at cct.lsu.edu
Sun Jan 25 04:38:37 CST 2009
User: sjha
Date: 2009/01/25 04:38 AM
Modified:
/papers/clouds/
saga_cloud_interop.tex
Log:
continued refinement of the introduction.
File Changes:
Directory: /papers/clouds/
==========================
File [modified]: saga_cloud_interop.tex
Delta lines: +101 -77
===================================================================
--- papers/clouds/saga_cloud_interop.tex 2009-01-24 23:25:17 UTC (rev 869)
+++ papers/clouds/saga_cloud_interop.tex 2009-01-25 10:38:32 UTC (rev 870)
@@ -118,60 +118,49 @@
Although Clouds are a nascent infrastructure, with the
force-of-industry behind their development and uptake (and not just
-the hype), their impact can not be ignored. Specifically, with the
-emergence of Clouds as important distributed computing infrastructure,
-we need abstractions that can support existing and emerging
-programming models for Clouds. Inevitably, the unified concept of a
-Cloud is evolving into different flavours and implementations on the
-ground. For example, there are already multiple implementations of
-Google's Bigtable, such as HyberTable, Cassandara, HBase. There is
-bound to be a continued proliferation of such Cloud-like
-infrastructure; this is reminiscent of the plethora of grid middleware
-distributions. Thus application-level support and inter-operability
-with different Cloud infrastructure is critical. And issues of scale
-aside, the transition of existing distributed programming models and
-styles, must be as seamless and as least disruptive as possible, else
-it risks engendering technical and political horror stories
-reminiscent of Globus, which became a disastrous by-word for
-everything wrong with the complexity of Grids.
+the hype), their impact can not be ignored. % Specifically, with the
+% emergence of Clouds as important distributed computing infrastructure,
+% we need abstractions that can support existing and emerging
+% programming models for Clouds.
+Inevitably, the unified concept of a Cloud is evolving into different
+flavours and implementations, with distinct underlying system
+interfaces and infrastructure. For example, there are already multiple
+implementations of Google's Bigtable, such as HyberTable, Cassandara,
+HBase. There is bound to be a continued proliferation of such Cloud
+based infrastructure; this is reminiscent of the plethora of grid
+middleware distributions. The complication arising from proliferatin
+exists over and above the complexity of the actual transition from
+Grids Thus application-level support and inter-operability for
+different application on different Cloud infrastructure is
+critical. And issues of scale aside, the transition of existing
+distributed programming models and styles, must be as seamless and as
+least disruptive as possible, else it risks engendering technical and
+political horror stories reminiscent of Globus, which became a
+disastrous by-word for everything wrong with the complexity of Grids.
Programming Models for Cloud: It is unclear what kind of programming
models will emerge; this in turn will depend on other things, the
kinds of applications that will come forward to try to utilise Clouds.
-
-We discuss the advantages of programmatically developing MapReduce
-using SAGA, by demonstrating that the SAGA-based implementation is
-infrastructure independent whilst still providing control over the
-deployment, distribution and run-time decomposition. The ability to
-control the distribution and placement of the computation units
-(workers) is critical in order to implement the ability to move
-computational work to the data. This is required to keep data network
-transfer low and in the case of commercial Clouds the monetary cost of
-computing the solution low. Using data-sets of size up to 10GB, and up
-to 10 workers, we provide detailed performance analysis of the
-SAGA-MapReduce implementation, and show how controlling the
-distribution of computation and the payload per worker helps enhance
-performance.
-
-{\it Application-level} programming and data-access patterns remain
-essentially invariant on different infrastructure. Thus the ability to
-support application specific data-access patterns is both useful and
-important~\cite{dpa-paper}. There are however, infrastructure
-specific features -- technical and policy, that need to be
-addressed. For example, Amazon, the archetypal Cloud System has a
-well-defined cost model for data transfer across {\it its}
-network. Hence, Programming Models for Clouds must be cognizant of the
-requirement to programmatically control the placement of compute and
-data relative to each other -- both statically and even dynamically.
-It is not that traditional Grids applications do not have this
-interesting requirement, but that, such explicit support is typically
-required for very large-scale and high-performing applications. In
-contrast, for most Cloud applications such control is required in
+... But the importance of {\it application-level} programming and
+data-access patterns remain essentially invariant on different
+infrastructure. Thus the ability to support application specific
+data-access patterns is both useful and important~\cite{dpa-paper}.
+There are however, infrastructure specific features -- technical and
+policy, that need to be addressed. For example, Amazon, the archetypal
+Cloud System has a well-defined cost model for data transfer across
+{\it its} network. Hence, Programming Models for Clouds must be
+cognizant of the requirement to programmatically control the placement
+of compute and data relative to each other -- both statically and even
+dynamically. % It is not that traditional Grids applications do not
+% have this interesting requirement, but that, such explicit support is
+% typically required for very large-scale and high-performing
+% applications.
+In contrast, for most Cloud applications such control is required in
order to ensure basic cost minimization, i.e., the same computational
task can be priced very differently for possibly the same performance.
-These factors and trends place a critical importance on effective
-programming abstractions for data-intensive applications for both
-Clouds and Grids and importantly in bridging the gap between the two.
+% These factors and trends place a critical importance on effective
+% programming abstractions for data-intensive applications for both
+% Clouds and Grids and importantly in bridging the gap between the two.
Any {\it effective} abstraction will be cognizant and provide at least
the above features, viz., relative compute-data placement,
application-level patterns and interoperabilty.
@@ -190,6 +179,18 @@
verify if SAGA had the expressiveness to implement data-parallel
programming and is capable of supporting acceptable levels of
performance (as compared with native implementations of MapReduce).
+We demonstrated that the SAGA-based implementation is infrastructure
+independent whilst still providing control over the deployment,
+distribution and run-time decomposition. The ability to control the
+distribution and placement of the computation units (workers) is
+critical in order to implement the ability to move computational work
+to the data. This is required to keep data network transfer low and in
+the case of commercial Clouds the monetary cost of computing the
+solution low. Using data-sets of size up to 10GB, and up to 10
+workers, we provide detailed performance analysis of the
+SAGA-MapReduce implementation, and show how controlling the
+distribution of computation and the payload per worker helps enhance
+performance.
The primary focus of this paper is however interoperabilty of the
above mentioned \sagamapreduce program. We will demonstrate beyond
@@ -200,11 +201,17 @@
for inter-operability between different flavours of Clouds as well as
between Clouds and Grids.
-\jhanote{Few sentence around what is application-level
- interoperability?}
-
-{\it Application-level Interoperability (ALI):} Some defining features
-of ALI include:
+Clouds provide services at different levels (Iaas, PaaS, SaaS);
+standard interfaces to these different levels do not exist. Immediate
+Consequence of this is the lack of interoperability between today's
+Clouds; though there is little buisness motivation for Cloud providers
+to define, implement and support new/standard interfaces, there is a
+case to be made that applications would benefit from multiple Cloud
+interoperability. And it is a desirable situation if Cloud-Grid
+interoperabilty came about for free; we argue that by addressing
+interoperability at the application-level this can be easily achieved.
+But first we provide Some defining features of {\it Application-level
+ Interoperability (ALI):}
\begin{enumerate}
\item Other than compiling on a different or new platform, there are no
further changes required of the application
@@ -222,7 +229,6 @@
distinct and possibly distributed components.
-
It can be asked if the emphasis on utilising multiple Clouds/Grids is
premature, given that programming models/systems are just emerging? In
many ways the emphasis on interoperabilty is an
@@ -259,14 +265,22 @@
Additionally, with Clouds -- and different Clouds providers, fronting
different Economic Models of computing, it is important to be able to
-utilise the ``right resource''.
-%influence programming models and require explicity (already discussed)
+utilise the ``right resource'', in the right way. We briefly discussed
+how moving prodigious amounts of data across Cloud networks, as
+opposed to moving the compute unit could be expensive; this is an
+example of using a given resource in the right-way. However in the
+absence of autonomic performance models and as current programming
+models don't provide explicit support/control for affinity, in the
+meantime, the end-user is left with performance management, and thus
+with the responsibilty of explicitly determining which resource is
+optimal. Clearly interoperability between Clouds and Grids is an
+important pre-requisite.
\section*{Notes}
-\subsubsection*{Why Interoperability:}
-\begin{itemize}
+%\subsubsection*{Why Interoperability:}
+%\begin{itemize}
% \item Intellectual curiosity, what programming challenges does this
% bring about?
% \item Infrastructure independent programming
@@ -275,9 +289,9 @@
% data-data affinity, which makes it more prudent to map to Cloud than
% regular grid environments (or vice-versa). What about complex
% dependency and inter-relationship between sub-tasks.
-\item Economic Models of computing, influence programming models and
- require explicity (already discussed)
-\end{itemize}
+% \item Economic Models of computing, influence programming models and
+% require explicity (already discussed)
+% \end{itemize}
\subsubsection*{Grid vs Cloud Interoperabiltiy}
@@ -290,7 +304,6 @@
interfaces, there is a case to be made that applications would
benefit from multiple Cloud interoperability. Even better if
Cloud-Grid interoperabilty came about for free!
-
\item How does Interoperabiltiy in Grids differ from interop on
Clouds. Many details, but if taken from the Application level
interoperabiltiy the differences are minor and inconsequential.
@@ -778,20 +791,34 @@
\section{Demonstrating Cloud-Grid Interoperabilty}
+\subsection*{Infrastructure Used} We first describe the
+infrastructure that we employ for the interoperabilty tests.
-In an earlier paper, we had essentially done the following:
+{\it Amazon EC2:}
+
+{\it Eucalyptus, ECP:}
+
+{\it Eucalyptus, GumboCloud:}
+
+And describe LONI in a few sentences. {\textcolor{blue}{KS}}
+
+In an earlier paper (Ref~\cite{saga_ccgrid09}), we had carried out the
+following tests, to demonstrate how \sagamapreduce utilizes different
+infrastructrure and control over task-data placement, and gain insight
+into performance on ``vanilla'' Grids. Some specific tests we
+performed are:
\begin{enumerate}
-\item Both \sagamapreduce workers
- (compute) and data-distribution are local. Number of workers vary
- from 1 to 10, and the data-set sizes varying from 1 to 10GB. % Here we
-% will also compare \sagamapreduce with native MapReduce (using HDFS
-% and Hadoop)
-\item \sagamapreduce workers compute local (to master), but using a
- distributed FS (HDFS)
-% upto 3 workers (upto a data-set size of 10GB).
-\item Same as Exp. \#2, but using a different distributed FS
- (KFS); the number of workers varies from 1-10
-\item \sagamapreduce using distributed compute (workers) and distributed file-system (KFS)
+\item We began by distributing \sagamapreduce workers (compute) and
+ the data they operate on locally. We varied the number of workers
+ vary from 1 to 10, and the data-set sizes varying from 1 to
+ 10GB.
+\item We then understood the effect of performance when using a
+ distributed FS, that is we had \sagamapreduce workers compute local
+ (to master), but using a distributed FS (HDFS). We varied
+ the underlying distributed-FS (used KFS in lieu of HDFS).
+% \item Same as Exp. \#2, but using a different distributed FS
+% (KFS); the number of workers varies from 1-10
+\item We then distributed the \sagamapreduce workers distributed compute (workers) and distributed file-system (KFS)
\item Distributed compute (workers) but using local file-systems (using GridFTP for transfer)
\end{enumerate}
@@ -815,9 +842,6 @@
communicate directly with each other.
\end{enumerate}
-\subsection*{Infrastructure Used} Describe GumboCloud, ECP in a few
-sentences. And describe LONI in a few sentences. {\textcolor{blue}
- {KS}}
\subsection*{Results}
More information about the saga-devel
mailing list