[Bigjob-users] BigJob Condor / [BigJob] Unable to control SPMDVariation during pilot_job submission (#1)

Sharath Maddineni smaddineni at cct.lsu.edu
Wed Jan 4 19:48:32 CST 2012


Hi Andre,

Sorry for the late reply.

On Wed, Jan 4, 2012 at 3:36 AM, Andre Luckow <aluckow at cct.lsu.edu> wrote:

> Hi Ole, hi Sharath,
> regarding the current Condor support in BJ.
>
> My understanding is:
>
> 1) We use the URL scheme condorg://<hostname> to trigger the Condor
> specific BJ plugin.
> 2) We have a special version of the agent
> bigjob/bigjob_agent_condor.py (not sure whether we need this - since
> most of the code is the same).

3.) When bigjob.start_pilot_job is called we submit a Condorg job via
> SAGA/Condor which starts the BJ agent
> 4.) Sub-Jobs/Work-Units are spawned via the BJ agent - there is no
> Condor pool that is actually used.
>
> Is this correct?
>

Yes you are absolutely right. I used separate bigjob_agent_condor.py to
make sure the sub jobs(BFAST) work perfectly. BJ needs to be installed  on
each condorg instantiation differently. I am attaching a diff file which
contains the diff of the bigjob_agent.py and bigjob_agent_condor.py when I
modified it.
I agree they are almost same!

Basically, instead of specifying python as the executable, I am using *
"bigjob-condor-bootstrap.py"* which is then transferred to remote
OSG/condorg resource(by default condorg transfers the executable) and then
executed which then installs bigjob.

in bigjob_manager.py

*        if lrms_saga_url.scheme == "condorg":*
*            jd.arguments = [ "-a", self.coordination.get_address(),
"-b",self.pilot_url]*
*            print "\n\n-a", self.coordination.get_address(),"-b",
self.pilot_url*
*            agent_exe =
os.path.abspath(os.path.join(os.path.abspath(__file__),"..","..","bootstrap","bigjob-condor-bootstrap.py"))
*
*            print agent_exe*
*            jd.executable = agent_exe*
*
*
(
https://svn.cct.lsu.edu/repos/saga-projects/applications/bigjob/trunk/generic/bootstrap/bigjob-condor-bootstrap.py
)
bigjob-condor-bootstrap.py is also customized to install BJ on remote
resource. I basically modified what you have used in to bootstrap in quotes
of bigjob_manager.py


> @Sharath:
> Question 1: Did you check everything needed into the BJ SVN. I see
> that the BJ agent is launched using different parameters than the
> normal BJ agent:
>
> jd.arguments = [ "-a", self.coordination.get_address(),
> "-b",self.pilot_url]
>
> Looking at the bigjob_agent_condor.py, I don't see the place where
> these parameters are processed.
>

bigjob_agent_condor.py not where these are processed (
https://svn.cct.lsu.edu/repos/saga-projects/applications/bigjob/trunk/generic/bootstrap/bigjob-condor-bootstrap.py
)


>
> Question 2: Do you have a basic example / documentation of how to use
> this? What needs to be in the .ini file of the Condor adaptor, how
> does the file staging work,...?
>

I am really sorry for the scrappy documentation and I have not really
tested my bigjob-condorg hacks other than my (Renci) machine.

Sharath.


> Thanks!
>
> Andre
>
>
>
> ---------- Forwarded message ----------
> From: Ole Weidner
> <
> reply+i-2719914-9f9254f617a8ce4c25241ff10510c37e53c4ba22-222015 at reply.github.com
> >
> Date: Wed, Jan 4, 2012 at 7:31 AM
> Subject: [BigJob] Unable to control SPMDVariation during pilot_job
> submission (#1)
> To: Andre Luckow <andre.luckow at googlemail.com>
>
>
> When I try to run BigJob via the Condor adaptor, I get the following error:
>
> 2012-01-04 01:24:49,684 - bigjob - DEBUG - Submit pilot job to:
> condor://localhost/
> 2012-01-04 01:24:49,691 - bigjob.server - ERROR - Exception:
> SAGA(BadParameter): condor_job: Problem launching condor job:
> (std::exception caught: SAGA(NotImplemented): condor_job: Condor
> adaptor does not support the 'SPMDVariation' attribute.
>
> While this error comes from the condor adaptor (it doesn't support
> SPMDVariation), I can't find a way to unset jd.spmd_variation. It
> seems that it is set explicitly in bigjob/bigjob_manager.py:239.
>
> Would it be possible to make this an option for the
> bigjob.start_pilot_job() method, give it a default value of "None" and
> don't set it at all in that case?
>
> Is SPMDVariation variation relevant at all during pilot_job
> submission, or is it just set "for completeness"? In that case, we
> could remove it completely.
>
> ---
> Reply to this email directly or view it on GitHub:
> https://github.com/drelu/BigJob/issues/1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cct.lsu.edu/pipermail/bigjob-users/attachments/20120104/00d5a7cd/attachment.html 


More information about the Bigjob-users mailing list