[Saga-devel] saga-projects SVN commit 900: /papers/clouds/

Wed Jan 28 19:37:45 CST 2009

User: sjha
Date: 2009/01/28 07:37 PM

Modified:
 /papers/clouds/
  saga_cloud_interop.tex, saga_data_intensive.bib

Log:
 added cloud-cloud interop data
    and reformatted previous data

File Changes:

Directory: /papers/clouds/
==========================

File [modified]: saga_cloud_interop.tex
Delta lines: +69 -50
===================================================================

--- papers/clouds/saga_cloud_interop.tex	2009-01-28 19:57:00 UTC (rev 899)
+++ papers/clouds/saga_cloud_interop.tex	2009-01-29 01:37:43 UTC (rev 900)
@@ -722,19 +722,20 @@
 ssh adaptors, the Globus adaptors do not rely on command line tools,
 but rather link directly against the respective Globus libraries: the
 Globus job adaptor is thus a GRAM client, the Globus file adaptor a
-gridftp client.  In experiments, non-cloud jobs have been started
-either by using gram or ssh.  In either case, file I/O has been
-performed either via ssh, or via a shared Lustre filesystem -- the
-gridftp functionality has thus not been tested in these
-experiments\footnote{For performance comparision between the Lustre FS
-  and GridFTP, see Ref~\cite{saga_cc09}}
+gridftp client.  In experiments, non-cloud jobs were started using
+either gram or ssh.  In either case, file I/O has been performed
+either via ssh, or via a shared Lustre filesystem -- the gridftp
+functionality has thus not been tested in these experiments.
 
-In a nutshell, SAGA on clouds differs from SAGA on Grids in the
-following ways.....  \jhanote{The aim of the remainder of this section
-  is to discuss how SAGA on Clouds differs from SAGA for Grids with
-  specifics Everything from i) job submission ii) file transfer...iii)
-  others..}
+% \footnote{For performance comparision between the Lustre FS
+%   and GridFTP, see Ref~\cite{saga_cc09}}
 
+% In a nutshell, SAGA on clouds differs from SAGA on Grids in the
+% following ways.....  \jhanote{The aim of the remainder of this section
+%   is to discuss how SAGA on Clouds differs from SAGA for Grids with
+%   specifics Everything from i) job submission ii) file transfer...iii)
+%   others..}
+
 \section{SAGA-based MapReduce}
 In this paper we will demonstrate the use of SAGA in implementing well
 known programming patterns for data intensive computing --
@@ -742,7 +743,7 @@
 real scientific applications using SAGA based implementations of these
 patterns: multiple sequence alignment can be orchestrated using the
 SAGA-All-pairs implementation, and genome searching can be implemented
-using SAGA-MapReduce (see Ref.~\cite{saga_cc09}).
+using SAGA-MapReduce (see Ref.~\cite{saga_ccgrid09}).
 
 % {\bf MapReduce:} MapReduce is a programming framework which supports
 % applications which operate on very large data sets on clusters of
@@ -1171,8 +1172,8 @@
 simple configuration to determine how many jobs (in this case workers)
 are assigned to a single VM. The default case is 1 worker per VM, but
 it is important to be able to vary the number of workers per VM (akin
-to Grids where we were able to vary the number of workers per
-machine).
+to Grids, where the number of workers per
+machine was varied).
 
 \begin{table}
 \upp
@@ -1181,45 +1182,67 @@
   \multicolumn{2}{c}{Number-of-Workers}  &  Data size   &  $T_c$  & $T_{spawn}$ \\   
   TeraGrid &  AWS &   (MB)  & (sec) & (sec)  \\
   \hline
-  6 & 0 & 10  &  12.4 &  10.2 \\
-  10 & 0 & 10  & 20.8 & 17.3 \\  
+%  6 & - & 10  &  12.4 &  10.2 \\
+  10 & - & 10  & 20.8 & 17.3 \\  
   \hline 
-  0 & 1 & 10 & 4.3 & 2.8 \\
-  0 & 2 & 10 & 7.8 & 5.3 \\ 
-  0 & 3 & 10 & 8.7 & 7.7 \\
-  0 & 4 & 10 & 13.0 & 10.3 \\
-  0 & 4 (1) & 10 &  11.3 & 8.6 \\
+  - & 1 & 10 & 4.3 & 2.8 \\
+  - & 2 & 10 & 7.8 & 5.3 \\ 
+  - & 3 & 10 & 8.7 & 7.7 \\
+  - & 4 & 10 & 13.0 & 10.3 \\
+%  - & 4 (1) & 10 &  11.3 & 8.6 \\
+\hline \hline
+%  10 & -  & 100 & 10.4 & 8.86 \\
+  -  & 2  & 100 & 7.9  & 5.3 \\
+  -  & 4  & 100 & 12.4 & 9.2 \\
+  -  & 10 & 100 & 29.0 & 25.1 \\
+ \hline \hline
+  - & 4 (1) & 100 & 16.2 & 8.7 \\ 
+  - & 4 (2) & 100 & 12.3 & 8.5 \\
+  - & 6 (3) & 100 & 18.7 & 13.5 \\
+  - & 8 (1) & 100 & 31.07 & 18.3\\
+  - & 8 (2) & 100 & 27.9 & 19.8 \\
+  - & 8 (4) & 100 & 27.4 & 19.9 \\
+  \hline \hline
+\end{tabular}
+\upp
+\caption{Performance data for different configurations of worker placements. The master is always on a desktop, with workers placed on either Clouds or on the TeraGrid (QueenBee). The configurations are classified as either -- all workers on EC2, all workers on the TeraGrid and workers divided between the TeraGrid and EC2. Unless otherwise explicitly indicated
+  by a number in parenthesis, every worker is assigned to a unique VM; the number  in parenthesis indicates the number of VMs used. It is interesting to note the significant spawning times, and its dependence on the number of VM. Spawning time does not include instantiation.}
+\label{stuff}
+\upp
+\upp
+\end{table}
+
+\begin{table}
+\upp
+\begin{tabular}{cccccc}
+  \hline
+  \multicolumn{3}{c}{Number-of-Workers}  &  Data size   &  $T_c$  & $T_{spawn}$ \\   
+  TeraGrid &  AWS & Eucalyptus &  (MB)  & (sec) & (sec)  \\
+  \hline
+  - & 1 & 1 & 100  & 6.2 & 3.6 \\
   \hline 
-  2 & 2 & 10 & 7.4 & 5.9 \\
-  3 & 3 & 10 & 11.6 & 10.3 \\
-  4 & 4 & 10 & 13.7 & 11.6 \\
-  5 & 5 & 10 & 33.2 & 29.4 \\ 
-  10 & 10 & 10 & 32.2 & 28.8 \\
+  2 & 2 & - & 10 & 7.4 & 5.9 \\
+  3 & 3 & - & 10 & 11.6 & 10.3 \\
+  4 & 4 & - & 10 & 13.7 & 11.6 \\
+  5 & 5 & - & 10 & 33.2 & 29.4 \\ 
+  10 & 10 & - & 10 & 32.2 & 28.8 \\
   \hline
   \hline 
-  10 & 0 & 100 & 10.4 & 8.86 \\
-  0 & 2 & 100 & 7.9 & 5.3 \\
-  0 & 10 & 100 &  29.0 & 25.1 \\
-  1 & 1 & 100 & 5.4 & 3.1 \\
-  3 & 3 & 100 & 11.1 & 8.7 \\
-  \hline \hline
-  0 & 4 (1) & 100 & 16.2 & 8.7 \\ 
-  0 & 8 (1) & 100 & 31.07 & 18.3\\
-  0 & 4 (2) & 100 & 12.3 & 8.5 \\
-  0 & 4 (4) & 100 & 12.4 & 9.2 \\
-  0 & 6 (3) & 100 & 18.7 & 13.5 \\
-  0 & 8 (2) & 100 & 27.9 & 19.8 \\
-  0 & 8 (4) & 100 & 27.4 & 19.9 \\
-  \hline \hline
+  1 & 1 & - & 100 & 5.4 & 3.1 \\
+  3 & 3 & - & 100 & 11.1 & 8.7 \\
 \end{tabular}
 \upp
-\caption{Performance data for different configurations of worker placements. The master is always on a desktop, with the choice of workers placed on either Clouds or on the TeraGrid (QueenBee). The configurations can be classified as of three types -- all workers on EC2, all workers on the TeraGrid and workers divied between the TeraGrid and EC2. Unless otherwise explicitly indicated
-  by a number in parenthesis, every worker is assigned to a unique VM; the number  in parenthesis indicates the number of VMs used. It is interesting to note the significant spawning times, and its dependence on the number of VM. Spawning time is without instantiation.}
+\caption{Performance data for different configurations of worker placements
+  on TeraGrid, Eucalyptus-Cloud and EC2. The first row of data
+  establishes Cloud-Cloud interoperability. The second set (rows 2- 6)    represent  interoperability between Grids-Clouds (EC2). The experimental 
+  conditions and measurements are similar to Table 1.}
 \label{stuff}
 \upp
 \upp
 \end{table}
 
+
+
 The total time to completion ($T_c$) of a \sagamapreduce job, can be
 decomposed into three primary components: $t_{over}$ defined as the
 time for pre-processing -- which is the time to chunk into fixed size
@@ -1362,15 +1385,11 @@
 contrived. Although we would like to preempt such a point-of-view, we
 will work towards developing a SAGA based applications that can use a
 very different beast -- Google's AppEngine, i.e. \sagamapreduce that
-uses Google's Cloud infrastructure.
+uses Google's Cloud infrastructure.  Finally, it is worth mentioning
+that while we were computing in the Clouds (i.e., `` scuse me while I
+kiss the sky''\cite{purplehaze}, or at least the clouds), it cost us
+upwards of \$150 to perform these experiments on EC2.
 
-It is worth mentioning that while we were computing in the Clouds in a
-state of `` scuse me while I kiss the sky''\cite{purplehaze} (or at
-least the clouds), it cost us upwards of \$150 to perform these
-experiments on EC2.
-
-
-
 %   somewhat similar; a very different beast is Google's AppEngine.  We
 %   will in the near future be working towards providing \sagamapreduce
 %   via Google's AppEngine

File [modified]: saga_data_intensive.bib
Delta lines: +1 -1
===================================================================
--- papers/clouds/saga_data_intensive.bib	2009-01-28 19:57:00 UTC (rev 899)
+++ papers/clouds/saga_data_intensive.bib	2009-01-29 01:37:43 UTC (rev 900)
@@ -6779,6 +6779,6 @@
 
 @misc{cloud-ontology, note = {Towards a Unified Ontology of Cloud Computing, ht-tp://www.cs.ucsb.edu/~lyouseff/CCOntology/CloudOntology.pdf}}
 
- at misc{saga_ccgrid09, note ={C. Miceli et al, Programming Abstractions for Data-Intensive Computing on Clouds and Grids, submitted to International Workshop on Cloud Computing (Cloud 2009) held in conjunction with CCGrid 2009, Shangai.}}
+ at misc{saga_ccgrid09, note ={C. Miceli et al, Programming Abstractions for Data-Intensive Computing on Clouds and Grids, submitted to International Workshop on Cloud Computing (Cloud 2009) held in conjunction with CCGrid 2009, Shangai. Draft available at \url{http://www.cct.lsu.edu/~sjha/publications/saga_data_intensive.pdf}}}
 
 @misc{purplehaze, note = {\url{http://en.wikipedia.org/wiki/Purple_Haze}}}
\ No newline at end of file