[Bigjob-users] Attempt to run example_manyjob_affinity.py

Andre Luckow aluckow at cct.lsu.edu
Fri Dec 9 15:56:31 CST 2011


Hi Paula,
for launching remote jobs you certainly want to use Globus, i.e. GRAM
URLs. The bigjob_agent key in the resource_list does not need to be
set anymore!

Please try to submit the job first from the hotel front node. Once you
are sure that Globus etc. is working you can do a remote submission.
E.g you need to make sure that you use the right BJ version on hotel
as well (ie. you need to double-check your PYTHONPATH etc.)

Hope that helps.

Best,
Andre

On Fri, Dec 9, 2011 at 9:21 PM, Paula Sanematsu <psanem1 at tigers.lsu.edu> wrote:
> Hi Andre,
>
>>
>> Please change numer_nodes to number_of_processes in line 51:
>>
>> resource_list.append( {"resource_url" : "fork://localhost/",
>> "number_of_processes" : "2", "allocation" : "myAllocation", ...
>
>
> I changed number_nodes to number_of_processes and it now runs. However, it
> seems like it is in a infinite loop, perhaps because my second machine is
> not configured properly? This is the configuration for my second machine:
>
> resource_list.append( {"resource_url" : "ssh://hotel.futuregrid.org",
> "number_of_processes" : "4", "allocation" :  "myAllocation", "queue" :
> "workq", "bigjob_agent":
> ("/N/soft/SAGA/saga/1.5.3/gcc-4.1.2/lib/python2.7.1/site-packages/bigjob/bigjob_agent_launcher.sh"),
> "working_directory": (os.getcwd() + "/agent"), "walltime":10, "affinity" :
> "affinity1"})
>
> Here, I used the bigjob_agent_launcher.sh in the directory written above
> because it was the only place I could find it. Should I use something else
> for "bigjob_agent"? Also, I'm not sure whether ssh://hotel.futuregrid.org or
> gram://hotel.futuregrid.org/jobmanager-pbs should be used.
>
> This is the output that keeps coming up:
>
> Current states: {'New': 2, 'Unknown': 6}
> DEBUG:root:Reschedule Thread
> DEBUG:root:Big Job: bigjob:50ec186c-22a2-11e1-9a15-002215124496:localhost
> Cores: 0/2 State: Running Terminated: False #Required Cores: 1
> DEBUG:root:Big Job:
> bigjob:51d0e1cc-22a2-11e1-9a15-002215124496:hotel.futuregrid.org Cores: 4/4
> State: Unknown Terminated: False #Required Cores: 1
> DEBUG:root:found no active resource for sub-job => (re-) queue it
> DEBUG:root:free_cores: [0, 0] total_free_cores: 0
>
>>
>> Also, the BJ part of CSA is a bit old and contains some bugs. If
>> possible, try to install BJ in userspace (as outlined on the Wiki
>> page).
>>
>> It's fixed in BigJob-0.3.31 and SVN.
>
>
> I followed the instructions on the Wiki (b. Python Packaging and Virtualenv)
> and did an update. I looks like I have BigJob-0.3.31.
>
> Thanks,
>
> Paula
>
>
> On Thu, Dec 8, 2011 at 3:07 PM, Andre Luckow <aluckow at cct.lsu.edu> wrote:
>>
>> Hi Paula,
>> there is a small bug in the example:
>>
>> Please change numer_nodes to number_of_processes in line 51:
>>
>> resource_list.append( {"resource_url" : "fork://localhost/",
>> "number_of_processes" : "2", "allocation" : "myAllocation", ...
>>
>> Also, the BJ part of CSA is a bit old and contains some bugs. If
>> possible, try to install BJ in userspace (as outlined on the Wiki
>> page).
>>
>> It's fixed in BigJob-0.3.31 and SVN.
>>
>> Best,
>> Andre
>>
>>
>> On Thu, Dec 8, 2011 at 9:42 PM, Paula Sanematsu <psanem1 at tigers.lsu.edu>
>> wrote:
>> > Hi,
>> >
>> > I'm trying to run example_manyjob_affinity.py on Sierra, but it doesn't
>> > complete (see my output below). I'm submitting the job from Sierra and
>> > would
>> > like to use Hotel as my second machine. Could you please advise me on
>> > how to
>> > proceed?
>> >
>> > In addition, is there anything wrong with the cct advert service? I
>> > could
>> > run example_local_single.py, but now it's not working.
>> >
>> > Thanks,
>> >
>> > Paula
>> >
>> > ManyJob load test with 8 jobs.
>> > Create manyjob service
>> > DEBUG:root:start bigjob at: fork://localhost/
>> > DEBUG:root:init BigJob w/: advert://advert.cct.lsu.edu:8080
>> > DEBUG:root:['/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff/../',
>> > '/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/virtualenv-1.6.4-py2.7.egg',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages',
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>> > '/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff',
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python27.zip',
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7',
>> >
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/plat-linux2',
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-tk',
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-old',
>> >
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-dynload',
>> >
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/site-packages',
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic',
>> >
>> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic']
>> > DEBUG:root:Utilizing ADVERT Backend
>> > DEBUG:root:Parsing URL: advert://advert.cct.lsu.edu:8080
>> > DEBUG:root:Server: advert.cct.lsu.edu Port 8080 server_connect_url: None
>> > DEBUG:root:initialized BigJob:
>> > bigjob:6638ce9e-21d6-11e1-ac76-002215124496
>> > Traceback (most recent call last):
>> >   File "example_manyjob_affinity.py", line 61, in <module>
>> >     mjs = many_job_affinity_service(resource_list, COORDINATION_URL)
>> >   File
>> >
>> > "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job_affinity.py",
>> > line 19, in __init__
>> >     super(many_job_affinity_service, self).__init__(bigjob_list,
>> > advert_host)
>> >   File
>> >
>> > "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>> > line 59, in __init__
>> >     self.__init_bigjobs()
>> >   File
>> >
>> > "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>> > line 74, in __init_bigjobs
>> >     self.__start_bigjob(i)
>> >   File
>> >
>> > "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>> > line 98, in __start_bigjob
>> >     bj_dict["number_of_processes"],
>> > KeyError: 'number_of_processes'
>> > Cancel Pilot Job
>> > stop pilot job:
>> > DEBUG:root:create advert entry: advert://advert.cct.lsu.edu:8080/
>> > DEBUG:root:update state of pilot job to: Done Stopped: True
>> > DEBUG:root:delete pilot job:
>> >
>> > _______________________________________________
>> > Bigjob-users mailing list
>> > Bigjob-users at mail.cct.lsu.edu
>> > https://mail.cct.lsu.edu/mailman/listinfo/bigjob-users
>> >
>
>


More information about the Bigjob-users mailing list