[Saga-devel] Fwd: help/ideas needed (fwd)
Hartmut Kaiser
hkaiser at cct.lsu.edu
Fri Jan 23 10:02:58 CST 2009
Andre,
>From what I can see the UUID generator is screwed up. I see identical UUID's
for different entities, which might be the reason for the mess.
SAGA's UUID generator supports different modes (see
impl/engine/uuid/saga_uuid.h:66):
enum {
SAGA_UUID_MAKE_V1 = (1 << 0), /* DCE 1.1 v1 SAGA_UUID */
SAGA_UUID_MAKE_V3 = (1 << 1), /* DCE 1.1 v3 SAGA_UUID */
SAGA_UUID_MAKE_V4 = (1 << 2), /* DCE 1.1 v4 SAGA_UUID */
#if defined(WIN32) || defined(WIN64)
SAGA_UUID_MAKE_SYSTEM = (1 << 3), /* rely on the operating system to
generate an uuid */
#else
SAGA_UUID_MAKE_SYSTEM = SAGA_UUID_MAKE_V1,
#endif
SAGA_UUID_MAKE_MC = (1 << 4) /* enforce multi-cast MAC address */
};
The current mode is always: SAGA_UUID_MAKE_SYSTEM (see:
impl/engine/uuid.hpp:120).
Could you try to change that to SAGA_UUID_MAKE_V3, SAGA_UUID_MAKE_V4,
SAGA_UUID_MAKE_V3|SAGA_UUID_MAKE_MC, or SAGA_UUID_MAKE_V4|SAGA_UUID_MAKE_MC
and see if that changes the picture?
Just a thought...
Regards Hartmut
> -----Original Message-----
> From: saga-devel-bounces at cct.lsu.edu [mailto:saga-devel-
> bounces at cct.lsu.edu] On Behalf Of Shantenu Jha
> Sent: Friday, January 23, 2009 8:30 AM
> To: saga-devel at cct.lsu.edu
> Subject: [Saga-devel] Fwd: help/ideas needed (fwd)
>
>
> >From Andre, Who is having e-mail problems.
>
> Please read through. Thanks.
>
> Shantenu
>
>
> ---------- Forwarded message ----------
> From: Andre Merzky <andre at merzky.net>
> Date: 2009/1/23
> Subject: Re: help/ideas needed
> To: SAGA Impl List <saga-devel at cct.lsu.edu>
>
>
> attachements...
>
> A
>
> Quoting [Andre Merzky] (Jan 23 2009):
> > Date: Fri, 23 Jan 2009 14:06:12 +0100
> > From: Andre Merzky <andre at merzky.net>
> > To: SAGA Impl List <saga-devel at cct.lsu.edu>
> > Bcc: Andre Merzky <andre at merzky.net>
> > Subject: help/ideas needed
> >
> > Hi Folx,
> >
> > I stumbled over a really strange problem, and have no idea
> > how to explain, nor how to fix it. So, I'd like to invite
> > you all to brainstorm - all ideas are apreciated.
> >
> > So, what is happening?
> >
> > I execute the following script:
> >
> > ---------------------------------------------------------------------
> ---
> > #!/bin/sh
> >
> > export SAGA_VERBOSE=100
> > export SAGA_PARENT_JOBID=saga-parent-id10039
> > export SAGA_SSH_KEY=`ls /tmp/saga_saga-parent-id*_ssh`
> > export SAGA_SSH_PUB=`ls /tmp/saga_saga-parent-id*_ssh.pub`
> > export SAGA_SSH_USER=amerzky
> >
> > env > /tmp/env-0
> > /root/saga/examples/misc/context ssh 1> /tmp/o-0-0 2>
> /tmp/e-0-0
> > saga-file list_dir / 1> /tmp/o-0-1 2>
> /tmp/e-0-1
> > saga-file list_dir ssh://amerzky@gg101.cct.lsu.edu/ 1> /tmp/o-0-2 2>
> /tmp/e-0-2
> >
> > sleep 600
> >
> > env > /tmp/env-1
> > /root/saga/examples/misc/context ssh 1> /tmp/o-1-0 2>
> /tmp/e-1-0
> > saga-file list_dir / 1> /tmp/o-1-1 2>
> /tmp/e-1-1
> > saga-file list_dir ssh://amerzky@gg101.cct.lsu.edu/ 1> /tmp/o-1-2 2>
> /tmp/e-1-2
> > ---------------------------------------------------------------------
> ---
> >
> > You will notice that this is doing the same thing twice,
> > after a sleep of 5 minutes. This script is executed on a
> > fresh EC2 virtual machine instance, which has SAGA and all
> > requisites preinstalled. As soon as the ssh deamon is up, I
> > copy a couple of ssh keys araund, and then run that script.
> > Nobody else is accesing the instance. All other running
> > processes look innocent, and should not (tm) interfere with
> > SAGA.
> >
> > Now, what is the problem?
> >
> > In the above setup, the following files should show no
> > differences, modulo logged saga object IDs:
> >
> > /tmp/env-0 /tmp/env-1
> > /tmp/o-0-0 /tmp/o-1-0
> > /tmp/e-0-0 /tmp/e-1-0
> > /tmp/o-0-1 /tmp/o-1-1
> > /tmp/e-0-1 /tmp/e-1-1
> > /tmp/o-0-2 /tmp/o-1-2
> > /tmp/e-0-2 /tmp/e-1-2
> >
> > well, the env does not show any diff, but the saga logs do
> > (they are attached)
> > In particular, I make the following observations:
> >
> > - the set of _loaded_ adaptors seems to be the same
> > - the set of _used_ adaptors differs
> >
> > I don't understand how that can be! In the
> > examples/misc/context case: the first run shows that the aws
> > context adaptor fails (this is the _only_ adaptor used).
> > The second run shows that aws_context, default_advert,
> > default_replica, and gridsam_job fail - they all implement
> > the context cpi. the ssh_context adaptor (which is the one
> > I need) succeeds.
> >
> > ---------------------------------------------------------------------
> ---
> > e-0-0:
> > Created exception: SAGA(NoSuccess): ini.cpp(646): Cannot open file
> /etc/saga.ini
> > Created exception: SAGA(NoSuccess): ini.cpp(646): Cannot open file
> /etc/saga.ini
> > INFO : engine.cpp : loading static adaptors
> > INFO : engine.cpp : loading dynamic adaptors
> > Created exception: SAGA(BadParameter): aws_context:
> aws_context_adaptor.cpp(285): Can't handle context types others than
> ec2
> eucalyptus gumbocloud - found ssh
> > Created exception: SAGA(NoAdaptor): adaptor_selector.cpp(184): Could
> not
> select any matching adaptor for: context_cpi::__init__
> > Created exception: SAGA(NoAdaptor): proxy.cpp(227): No adaptor
> succeeded
> in executing constructor for context_cpi
> > Created exception: SAGA(BadParameter):
> > SAGA(BadParameter): aws_context: aws_context_adaptor.cpp(285):
> Can't
> handle context types others than ec2 eucalyptus gumbocloud - found ssh
> > SAGA(NoAdaptor): proxy.cpp(227): No adaptor succeeded in executing
> constructor for context_cpi
> >
> > SAGA(BadParameter):
> > SAGA(BadParameter): aws_context: aws_context_adaptor.cpp(285):
> Can't
> handle context types others than ec2 eucalyptus gumbocloud - found ssh
> > SAGA(NoAdaptor): proxy.cpp(227): No adaptor succeeded in executing
> constructor for context_cpi
> > ---------------------------------------------------------------------
> ---
> >
> > ---------------------------------------------------------------------
> ---
> > e-1-0:
> > Created exception: SAGA(NoSuccess): ini.cpp(646): Cannot open file
> /etc/saga.ini
> > Created exception: SAGA(NoSuccess): ini.cpp(646): Cannot open file
> /etc/saga.ini
> > INFO : engine.cpp : loading static adaptors
> > INFO : engine.cpp : loading dynamic adaptors
> > Created exception: SAGA(BadParameter): aws_context:
> aws_context_adaptor.cpp(285): Can't handle context types others than
> ec2
> eucalyptus gumbocloud - found ssh
> > Created exception: SAGA(BadParameter): default_advert:
> context.cpp(34):
> Can't handle context types others than 'default_advert_db' (got: ssh)
> > Created exception: SAGA(BadParameter): default_replica:
> context.cpp(33):
> Can't handle context types others than 'default_replica_db' (got: ssh)
> > Created exception: SAGA(BadParameter): omii_gridsam_job:
> omii_gridsam_context.cpp(30): Can't handle context types others than
> 'omii_gridsam' (got: ssh)
> > (no error shown as the ssh_context wins)
> > ---------------------------------------------------------------------
> ---
> >
> > So, how can it be that identical runs behave differently
> > after some waiting time? Why are some adaptors not
> > tried, after they loaded successfully?
> >
> > It is not the case that the second run always succeeds - the
> > number of run's does not seem to have any effect. Rather, it
> > is the waiting time between the first run and the success
> > chance of a later run which seem correlated.
> >
> > The above behaviour is screwing with my performance
> > measurements, as you might expect. Also, it explains the
> > sudden loss of some of the mapreduce workers I saw (the
> > early-starters fail).
> >
> > For now, I can add a 10 min waiting time - then the
> > following SAGA jobs mostly succeed (at the moment). Well,
> > consider I want to do 10 Mapreduce workers - that are 10 ec2
> > instances, times 10 minutes - that is 1.5 hours per test run
> > - if everything else is perfect! Difficult to do a parameter
> > sweep, or to get some statistics.
> >
> > BTW, and unrelated: MapReduce should use async ops for
> > creating job services and for creating/running jobs!
> >
> > So, that is where I am stuck. As said: any idea on what I
> > could investigate further would be appreciated!
> >
> > Cheers, Andre.
> >
> >
> > PS.: CCT is swallowing my mails again -- it is that time of
> > year again it seems. So, I won't see any answers send via
> > the cct mailer - my mail address itself seems fine (get the
> > usual amount of spam ;-). Sorry for that additional
> > inconvenience - a ticket is filed (and dormant :-().
> --
> Nothing is ever easy.
More information about the saga-devel
mailing list