[Bigjob-users] Inconsistency with number_nodes , processes_per_node & jd.number_of_processes parameters.

pradeep kumar Mantha pradeepm66 at gmail.com
Fri Sep 30 19:45:01 CDT 2011


Hi Andre!

There is inconsistency with number_nodes, processes_per_node
and jd.number_of_processes parameters.

As you mentioned in the case of multiple pilots not to
use processes_per_node in the resource specification. I have few questions
here.
   1. Does bigjob automatically determine processes_per_node on all
resources mentioned in the resource list?
   2. Is this condition specifically only for LONI or is it for all
infrastructures like LSU machines & Futuregrid?

I tried the script without specifying processes_per_node but when
 jd.number_of_processes(processors) > number_nodes( nodes) it causes trouble
even though enough resources are available. The script just rolls and jobs
are not scheduled or queued as it doesn't find enough resources.

For example

       resource_list.append({"resource_url" : "gram://
eric1.loni.org/jobmanager-pbs",
                              "number_nodes" : "2",
                              "allocation" : None, "queue" : "checkpt",
                              "working_directory": (os.getcwd() + "/agent"),
                              "walltime":20 })

        resource_list.append({"resource_url" : "gram://
poseidon1.loni.org/jobmanager-pbs",
                              "number_nodes" : "2",
                              "allocation" : None, "queue" : "checkpt",
                              "working_directory": (os.getcwd() + "/agent"),
                              "walltime":20 })

  each bigjob size - 8 processors

  for i in range(0, NUMBER_JOBS):
            jd = saga.job.description()
            jd.executable = "/bin/sleep "
            jd.arguments = ["10"]
            jd.number_of_processes = "4"        ////////  if 2 or 1 is given
its working.. so anything less than number_nodes works.
            jd.spmd_variation = "single"
            jd.working_directory = os.getcwd()
            jd.output =  "stdout-" + str(i) + ".txt"
            jd.error = "stderr-" + str(i) + ".txt"
            subjob = mjs.create_job(jd)
            subjob.run()
            print "Submited sub-job " + str(i) + "."
            jobs.append(subjob)
            job_start_times[subjob]=time.time()
            job_states[subjob] = subjob.get_state()


thanks
pradeep
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cct.lsu.edu/pipermail/bigjob-users/attachments/20110930/b4e564a4/attachment.html 


More information about the Bigjob-users mailing list