[Bigjob-users] BigJob Condor / [BigJob] Unable to control SPMDVariation during pilot_job submission (#1)

Ole Weidner oweidner at cct.lsu.edu
Wed Jan 4 11:03:59 CST 2012


Hi Andre,

On Jan 4, 2012, at 2:36 AM, Andre Luckow wrote:

> Hi Ole, hi Sharath,
> regarding the current Condor support in BJ.
> 
> My understanding is:
> 
> 1) We use the URL scheme condorg://<hostname> to trigger the Condor
> specific BJ plugin.

that's the condor-G case. However, OSG prefers "vanilla" condor (condor://localhost) job submission, since this is backed by their glidein-WMS system. Using Condor-G directly is discouraged since it overrides their fair-share policies and confuses the schedulers. 


> 2) We have a special version of the agent
> bigjob/bigjob_agent_condor.py (not sure whether we need this - since
> most of the code is the same).

Hm. So what is it for then? What was it used for and on what platform? 

> 3.) When bigjob.start_pilot_job is called we submit a Condorg job via
> SAGA/Condor which starts the BJ agent

Again, we could. But: Condor-G is not the way to do it on OSG.

> 4.) Sub-Jobs/Work-Units are spawned via the BJ agent - there is no
> Condor pool that is actually used.

In the Condor-G case I assume the answer is 'yes'. In the 'vanilla' case: I have *no* idea how that is supposed to work. 

Cheers
Ole

> 
> Is this correct?
> 
> @Sharath:
> Question 1: Did you check everything needed into the BJ SVN. I see
> that the BJ agent is launched using different parameters than the
> normal BJ agent:
> 
> jd.arguments = [ "-a", self.coordination.get_address(), "-b",self.pilot_url]
> 
> Looking at the bigjob_agent_condor.py, I don't see the place where
> these parameters are processed.
> 
> Question 2: Do you have a basic example / documentation of how to use
> this? What needs to be in the .ini file of the Condor adaptor, how
> does the file staging work,...?
> 
> Thanks!
> 
> Andre
> 
> 
> 
> ---------- Forwarded message ----------
> From: Ole Weidner
> <reply+i-2719914-9f9254f617a8ce4c25241ff10510c37e53c4ba22-222015 at reply.github.com>
> Date: Wed, Jan 4, 2012 at 7:31 AM
> Subject: [BigJob] Unable to control SPMDVariation during pilot_job
> submission (#1)
> To: Andre Luckow <andre.luckow at googlemail.com>
> 
> 
> When I try to run BigJob via the Condor adaptor, I get the following error:
> 
> 2012-01-04 01:24:49,684 - bigjob - DEBUG - Submit pilot job to:
> condor://localhost/
> 2012-01-04 01:24:49,691 - bigjob.server - ERROR - Exception:
> SAGA(BadParameter): condor_job: Problem launching condor job:
> (std::exception caught: SAGA(NotImplemented): condor_job: Condor
> adaptor does not support the 'SPMDVariation' attribute.
> 
> While this error comes from the condor adaptor (it doesn't support
> SPMDVariation), I can't find a way to unset jd.spmd_variation. It
> seems that it is set explicitly in bigjob/bigjob_manager.py:239.
> 
> Would it be possible to make this an option for the
> bigjob.start_pilot_job() method, give it a default value of "None" and
> don't set it at all in that case?
> 
> Is SPMDVariation variation relevant at all during pilot_job
> submission, or is it just set "for completeness"? In that case, we
> could remove it completely.
> 
> ---
> Reply to this email directly or view it on GitHub:
> https://github.com/drelu/BigJob/issues/1



More information about the Bigjob-users mailing list