[Bigjob-users] Attempt to run example_manyjob_affinity.py

pradeep kumar Mantha pradeepm66 at gmail.com
Sat Dec 10 11:58:26 CST 2011


Hi Paula!

Could you please enable a password less login to your localhost and retry.
If it doesn't work please send the example & agent log file created in
working directory.



On Sat, Dec 10, 2011 at 11:52 AM, Paula Sanematsu <psanem1 at tigers.lsu.edu>wrote:

> Hi Andre,
>
> I installed BigJob on Hotel and ran example_local_single.py. I'm getting
> the output below. Any idea why it keeps "running"?
>
> Thanks!
>
> Paula
>
> Start Pilot Job/BigJob at: fork://localhost
>
> DEBUG:root:init BigJob w/: advert://advert.cct.lsu.edu:8080
> DEBUG:root:Load Advert Coordination
> DEBUG:root:['/gpfs/home/paulasoo/examples/../',
> '/gpfs/home/paulasoo/examples',
> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/pip-1.0.2-py2.7.egg',
> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/BigJob-0.3.31-py2.7.egg',
> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg',
> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/virtualenv-1.6.4-py2.7.egg',
> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages',
> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages',
> '/gpfs/home/paulasoo/examples',
> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python27.zip',
> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7',
> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/plat-linux2',
> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-tk',
> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-old',
> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-dynload',
> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/site-packages',
> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/BigJob-0.3.31-py2.7.egg/bigjob']
>
> DEBUG:root:Utilizing ADVERT Backend
> DEBUG:root:Parsing URL: advert://advert.cct.lsu.edu:8080
> DEBUG:root:Server: advert.cct.lsu.edu Port 8080 server_connect_url: None
> DEBUG:root:Initialized Coordination to: advert://advert.cct.lsu.edu:8080/(DB: )
> DEBUG:root:initialized BigJob: bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac
> DEBUG:root:create pilot job entry on backend server:
> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost
> DEBUG:root:create advert entry: advert://
> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost
> ?
> DEBUG:root:update state of pilot job to: Unknown Stopped: False
> DEBUG:root:set pilot state to: Unknown
> Adaptor specific modifications: fork
> Working directory: /gpfs/home/paulasoo/examples/agent
> use standard proxy
> Submit pilot job to: fork://localhost/
> Pilot Job/BigJob URL:
> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost State: Running
> DEBUG:root:add subjob to queue of PJ:
> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost
> DEBUG:root:create dictionary for job description. Job-URL:
> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost:jobs:91209314-2355-11e1-9985-00215ec9e3ac
> DEBUG:root:Job URL: advert://
> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac?,
> Job Description URL: advert://
> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac/job-description
> ?
> DEBUG:root:initialized advert entry for job: advert://
> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac
> ?
> DEBUG:root:Set state of job: advert://
> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac?
> to: Unknown
> state: Unknown
> state: Unknown
> Password: state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
> state: Running
>
>
> On Fri, Dec 9, 2011 at 3:56 PM, Andre Luckow <aluckow at cct.lsu.edu> wrote:
>
>> Hi Paula,
>> for launching remote jobs you certainly want to use Globus, i.e. GRAM
>> URLs. The bigjob_agent key in the resource_list does not need to be
>> set anymore!
>>
>> Please try to submit the job first from the hotel front node. Once you
>> are sure that Globus etc. is working you can do a remote submission.
>> E.g you need to make sure that you use the right BJ version on hotel
>> as well (ie. you need to double-check your PYTHONPATH etc.)
>>
>> Hope that helps.
>>
>> Best,
>> Andre
>>
>> On Fri, Dec 9, 2011 at 9:21 PM, Paula Sanematsu <psanem1 at tigers.lsu.edu>
>> wrote:
>> > Hi Andre,
>> >
>> >>
>> >> Please change numer_nodes to number_of_processes in line 51:
>> >>
>> >> resource_list.append( {"resource_url" : "fork://localhost/",
>> >> "number_of_processes" : "2", "allocation" : "myAllocation", ...
>> >
>> >
>> > I changed number_nodes to number_of_processes and it now runs. However,
>> it
>> > seems like it is in a infinite loop, perhaps because my second machine
>> is
>> > not configured properly? This is the configuration for my second
>> machine:
>> >
>> > resource_list.append( {"resource_url" : "ssh://hotel.futuregrid.org",
>> > "number_of_processes" : "4", "allocation" :  "myAllocation", "queue" :
>> > "workq", "bigjob_agent":
>> >
>> ("/N/soft/SAGA/saga/1.5.3/gcc-4.1.2/lib/python2.7.1/site-packages/bigjob/bigjob_agent_launcher.sh"),
>> > "working_directory": (os.getcwd() + "/agent"), "walltime":10,
>> "affinity" :
>> > "affinity1"})
>> >
>> > Here, I used the bigjob_agent_launcher.sh in the directory written above
>> > because it was the only place I could find it. Should I use something
>> else
>> > for "bigjob_agent"? Also, I'm not sure whether ssh://
>> hotel.futuregrid.org or
>> > gram://hotel.futuregrid.org/jobmanager-pbs should be used.
>> >
>> > This is the output that keeps coming up:
>> >
>> > Current states: {'New': 2, 'Unknown': 6}
>> > DEBUG:root:Reschedule Thread
>> > DEBUG:root:Big Job:
>> bigjob:50ec186c-22a2-11e1-9a15-002215124496:localhost
>> > Cores: 0/2 State: Running Terminated: False #Required Cores: 1
>> > DEBUG:root:Big Job:
>> > bigjob:51d0e1cc-22a2-11e1-9a15-002215124496:hotel.futuregrid.orgCores: 4/4
>> > State: Unknown Terminated: False #Required Cores: 1
>> > DEBUG:root:found no active resource for sub-job => (re-) queue it
>> > DEBUG:root:free_cores: [0, 0] total_free_cores: 0
>> >
>> >>
>> >> Also, the BJ part of CSA is a bit old and contains some bugs. If
>> >> possible, try to install BJ in userspace (as outlined on the Wiki
>> >> page).
>> >>
>> >> It's fixed in BigJob-0.3.31 and SVN.
>> >
>> >
>> > I followed the instructions on the Wiki (b. Python Packaging and
>> Virtualenv)
>> > and did an update. I looks like I have BigJob-0.3.31.
>> >
>> > Thanks,
>> >
>> > Paula
>> >
>> >
>> > On Thu, Dec 8, 2011 at 3:07 PM, Andre Luckow <aluckow at cct.lsu.edu>
>> wrote:
>> >>
>> >> Hi Paula,
>> >> there is a small bug in the example:
>> >>
>> >> Please change numer_nodes to number_of_processes in line 51:
>> >>
>> >> resource_list.append( {"resource_url" : "fork://localhost/",
>> >> "number_of_processes" : "2", "allocation" : "myAllocation", ...
>> >>
>> >> Also, the BJ part of CSA is a bit old and contains some bugs. If
>> >> possible, try to install BJ in userspace (as outlined on the Wiki
>> >> page).
>> >>
>> >> It's fixed in BigJob-0.3.31 and SVN.
>> >>
>> >> Best,
>> >> Andre
>> >>
>> >>
>> >> On Thu, Dec 8, 2011 at 9:42 PM, Paula Sanematsu <
>> psanem1 at tigers.lsu.edu>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I'm trying to run example_manyjob_affinity.py on Sierra, but it
>> doesn't
>> >> > complete (see my output below). I'm submitting the job from Sierra
>> and
>> >> > would
>> >> > like to use Hotel as my second machine. Could you please advise me on
>> >> > how to
>> >> > proceed?
>> >> >
>> >> > In addition, is there anything wrong with the cct advert service? I
>> >> > could
>> >> > run example_local_single.py, but now it's not working.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Paula
>> >> >
>> >> > ManyJob load test with 8 jobs.
>> >> > Create manyjob service
>> >> > DEBUG:root:start bigjob at: fork://localhost/
>> >> > DEBUG:root:init BigJob w/: advert://advert.cct.lsu.edu:8080
>> >> > DEBUG:root:['/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff/../',
>> >> > '/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/virtualenv-1.6.4-py2.7.egg',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>> >> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages',
>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>> >> > '/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff',
>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python27.zip',
>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7',
>> >> >
>> >> >
>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/plat-linux2',
>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-tk',
>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-old',
>> >> >
>> >> >
>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-dynload',
>> >> >
>> >> >
>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/site-packages',
>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic',
>> >> >
>> >> >
>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic']
>> >> > DEBUG:root:Utilizing ADVERT Backend
>> >> > DEBUG:root:Parsing URL: advert://advert.cct.lsu.edu:8080
>> >> > DEBUG:root:Server: advert.cct.lsu.edu Port 8080 server_connect_url:
>> None
>> >> > DEBUG:root:initialized BigJob:
>> >> > bigjob:6638ce9e-21d6-11e1-ac76-002215124496
>> >> > Traceback (most recent call last):
>> >> >   File "example_manyjob_affinity.py", line 61, in <module>
>> >> >     mjs = many_job_affinity_service(resource_list, COORDINATION_URL)
>> >> >   File
>> >> >
>> >> >
>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job_affinity.py",
>> >> > line 19, in __init__
>> >> >     super(many_job_affinity_service, self).__init__(bigjob_list,
>> >> > advert_host)
>> >> >   File
>> >> >
>> >> >
>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>> >> > line 59, in __init__
>> >> >     self.__init_bigjobs()
>> >> >   File
>> >> >
>> >> >
>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>> >> > line 74, in __init_bigjobs
>> >> >     self.__start_bigjob(i)
>> >> >   File
>> >> >
>> >> >
>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>> >> > line 98, in __start_bigjob
>> >> >     bj_dict["number_of_processes"],
>> >> > KeyError: 'number_of_processes'
>> >> > Cancel Pilot Job
>> >> > stop pilot job:
>> >> > DEBUG:root:create advert entry: advert://advert.cct.lsu.edu:8080/
>> >> > DEBUG:root:update state of pilot job to: Done Stopped: True
>> >> > DEBUG:root:delete pilot job:
>> >> >
>> >> > _______________________________________________
>> >> > Bigjob-users mailing list
>> >> > Bigjob-users at mail.cct.lsu.edu
>> >> > https://mail.cct.lsu.edu/mailman/listinfo/bigjob-users
>> >> >
>> >
>> >
>>
>
>
> _______________________________________________
> Bigjob-users mailing list
> Bigjob-users at mail.cct.lsu.edu
> https://mail.cct.lsu.edu/mailman/listinfo/bigjob-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cct.lsu.edu/pipermail/bigjob-users/attachments/20111210/186745b7/attachment-0001.html 


More information about the Bigjob-users mailing list