[Bigjob-users] Attempt to run example_manyjob_affinity.py

pradeep kumar Mantha pradeepm66 at gmail.com
Sat Dec 10 12:37:55 CST 2011


Hi!

Localhost is the machine on which you logged on to and currently working
on. Yes in this case it is hotel.

Could you please confirm whether you were able to logon to the machine
without being asked for password.

  ssh localhost

The agent files are stderr-bigjob_agent-* & "stdout-bigjob_agent-* in
$HOME/examples/agent directory.

thanks
pradeep


On Sat, Dec 10, 2011 at 12:17 PM, Paula Sanematsu <psanem1 at tigers.lsu.edu>wrote:

> Hi Pradeep!
>
> What do you mean by localhost? If you mean a password less login to Hotel,
> that's what I did.
>
> Where is the agent log file? I have "advert-launcher-machines*" files in
> my $HOME and "stderr-bigjob_agent-*" and "stdout-bigjob_agent-*" files in
> $HOME/examples/agent/. I don't know which one is the log file.
>
> Thanks,
>
> Paula
>
>
> On Sat, Dec 10, 2011 at 11:58 AM, pradeep kumar Mantha <
> pradeepm66 at gmail.com> wrote:
>
>> Hi Paula!
>>
>> Could you please enable a password less login to your localhost and
>> retry.
>> If it doesn't work please send the example & agent log file created in
>> working directory.
>>
>>
>>
>> On Sat, Dec 10, 2011 at 11:52 AM, Paula Sanematsu <psanem1 at tigers.lsu.edu
>> > wrote:
>>
>>> Hi Andre,
>>>
>>> I installed BigJob on Hotel and ran example_local_single.py. I'm getting
>>> the output below. Any idea why it keeps "running"?
>>>
>>> Thanks!
>>>
>>> Paula
>>>
>>> Start Pilot Job/BigJob at: fork://localhost
>>>
>>> DEBUG:root:init BigJob w/: advert://advert.cct.lsu.edu:8080
>>> DEBUG:root:Load Advert Coordination
>>> DEBUG:root:['/gpfs/home/paulasoo/examples/../',
>>> '/gpfs/home/paulasoo/examples',
>>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/pip-1.0.2-py2.7.egg',
>>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/BigJob-0.3.31-py2.7.egg',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/virtualenv-1.6.4-py2.7.egg',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages',
>>> '/gpfs/home/paulasoo/examples',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python27.zip',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/plat-linux2',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-tk',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-old',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-dynload',
>>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/site-packages',
>>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/BigJob-0.3.31-py2.7.egg/bigjob']
>>>
>>> DEBUG:root:Utilizing ADVERT Backend
>>> DEBUG:root:Parsing URL: advert://advert.cct.lsu.edu:8080
>>> DEBUG:root:Server: advert.cct.lsu.edu Port 8080 server_connect_url: None
>>> DEBUG:root:Initialized Coordination to: advert://
>>> advert.cct.lsu.edu:8080/ (DB: )
>>> DEBUG:root:initialized BigJob:
>>> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac
>>> DEBUG:root:create pilot job entry on backend server:
>>> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost
>>> DEBUG:root:create advert entry: advert://
>>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost
>>> ?
>>> DEBUG:root:update state of pilot job to: Unknown Stopped: False
>>> DEBUG:root:set pilot state to: Unknown
>>> Adaptor specific modifications: fork
>>> Working directory: /gpfs/home/paulasoo/examples/agent
>>> use standard proxy
>>> Submit pilot job to: fork://localhost/
>>> Pilot Job/BigJob URL:
>>> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost State: Running
>>> DEBUG:root:add subjob to queue of PJ:
>>> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost
>>> DEBUG:root:create dictionary for job description. Job-URL:
>>> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost:jobs:91209314-2355-11e1-9985-00215ec9e3ac
>>> DEBUG:root:Job URL: advert://
>>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac?,
>>> Job Description URL: advert://
>>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac/job-description
>>> ?
>>> DEBUG:root:initialized advert entry for job: advert://
>>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac
>>> ?
>>> DEBUG:root:Set state of job: advert://
>>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac?
>>> to: Unknown
>>> state: Unknown
>>> state: Unknown
>>> Password: state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>> state: Running
>>>
>>>
>>> On Fri, Dec 9, 2011 at 3:56 PM, Andre Luckow <aluckow at cct.lsu.edu>wrote:
>>>
>>>> Hi Paula,
>>>> for launching remote jobs you certainly want to use Globus, i.e. GRAM
>>>> URLs. The bigjob_agent key in the resource_list does not need to be
>>>> set anymore!
>>>>
>>>> Please try to submit the job first from the hotel front node. Once you
>>>> are sure that Globus etc. is working you can do a remote submission.
>>>> E.g you need to make sure that you use the right BJ version on hotel
>>>> as well (ie. you need to double-check your PYTHONPATH etc.)
>>>>
>>>> Hope that helps.
>>>>
>>>> Best,
>>>> Andre
>>>>
>>>> On Fri, Dec 9, 2011 at 9:21 PM, Paula Sanematsu <psanem1 at tigers.lsu.edu>
>>>> wrote:
>>>> > Hi Andre,
>>>> >
>>>> >>
>>>> >> Please change numer_nodes to number_of_processes in line 51:
>>>> >>
>>>> >> resource_list.append( {"resource_url" : "fork://localhost/",
>>>> >> "number_of_processes" : "2", "allocation" : "myAllocation", ...
>>>> >
>>>> >
>>>> > I changed number_nodes to number_of_processes and it now runs.
>>>> However, it
>>>> > seems like it is in a infinite loop, perhaps because my second
>>>> machine is
>>>> > not configured properly? This is the configuration for my second
>>>> machine:
>>>> >
>>>> > resource_list.append( {"resource_url" : "ssh://hotel.futuregrid.org",
>>>> > "number_of_processes" : "4", "allocation" :  "myAllocation", "queue" :
>>>> > "workq", "bigjob_agent":
>>>> >
>>>> ("/N/soft/SAGA/saga/1.5.3/gcc-4.1.2/lib/python2.7.1/site-packages/bigjob/bigjob_agent_launcher.sh"),
>>>> > "working_directory": (os.getcwd() + "/agent"), "walltime":10,
>>>> "affinity" :
>>>> > "affinity1"})
>>>> >
>>>> > Here, I used the bigjob_agent_launcher.sh in the directory written
>>>> above
>>>> > because it was the only place I could find it. Should I use something
>>>> else
>>>> > for "bigjob_agent"? Also, I'm not sure whether ssh://
>>>> hotel.futuregrid.org or
>>>> > gram://hotel.futuregrid.org/jobmanager-pbs should be used.
>>>> >
>>>> > This is the output that keeps coming up:
>>>> >
>>>> > Current states: {'New': 2, 'Unknown': 6}
>>>> > DEBUG:root:Reschedule Thread
>>>> > DEBUG:root:Big Job:
>>>> bigjob:50ec186c-22a2-11e1-9a15-002215124496:localhost
>>>> > Cores: 0/2 State: Running Terminated: False #Required Cores: 1
>>>> > DEBUG:root:Big Job:
>>>> > bigjob:51d0e1cc-22a2-11e1-9a15-002215124496:hotel.futuregrid.orgCores: 4/4
>>>> > State: Unknown Terminated: False #Required Cores: 1
>>>> > DEBUG:root:found no active resource for sub-job => (re-) queue it
>>>> > DEBUG:root:free_cores: [0, 0] total_free_cores: 0
>>>> >
>>>> >>
>>>> >> Also, the BJ part of CSA is a bit old and contains some bugs. If
>>>> >> possible, try to install BJ in userspace (as outlined on the Wiki
>>>> >> page).
>>>> >>
>>>> >> It's fixed in BigJob-0.3.31 and SVN.
>>>> >
>>>> >
>>>> > I followed the instructions on the Wiki (b. Python Packaging and
>>>> Virtualenv)
>>>> > and did an update. I looks like I have BigJob-0.3.31.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Paula
>>>> >
>>>> >
>>>> > On Thu, Dec 8, 2011 at 3:07 PM, Andre Luckow <aluckow at cct.lsu.edu>
>>>> wrote:
>>>> >>
>>>> >> Hi Paula,
>>>> >> there is a small bug in the example:
>>>> >>
>>>> >> Please change numer_nodes to number_of_processes in line 51:
>>>> >>
>>>> >> resource_list.append( {"resource_url" : "fork://localhost/",
>>>> >> "number_of_processes" : "2", "allocation" : "myAllocation", ...
>>>> >>
>>>> >> Also, the BJ part of CSA is a bit old and contains some bugs. If
>>>> >> possible, try to install BJ in userspace (as outlined on the Wiki
>>>> >> page).
>>>> >>
>>>> >> It's fixed in BigJob-0.3.31 and SVN.
>>>> >>
>>>> >> Best,
>>>> >> Andre
>>>> >>
>>>> >>
>>>> >> On Thu, Dec 8, 2011 at 9:42 PM, Paula Sanematsu <
>>>> psanem1 at tigers.lsu.edu>
>>>> >> wrote:
>>>> >> > Hi,
>>>> >> >
>>>> >> > I'm trying to run example_manyjob_affinity.py on Sierra, but it
>>>> doesn't
>>>> >> > complete (see my output below). I'm submitting the job from Sierra
>>>> and
>>>> >> > would
>>>> >> > like to use Hotel as my second machine. Could you please advise me
>>>> on
>>>> >> > how to
>>>> >> > proceed?
>>>> >> >
>>>> >> > In addition, is there anything wrong with the cct advert service? I
>>>> >> > could
>>>> >> > run example_local_single.py, but now it's not working.
>>>> >> >
>>>> >> > Thanks,
>>>> >> >
>>>> >> > Paula
>>>> >> >
>>>> >> > ManyJob load test with 8 jobs.
>>>> >> > Create manyjob service
>>>> >> > DEBUG:root:start bigjob at: fork://localhost/
>>>> >> > DEBUG:root:init BigJob w/: advert://advert.cct.lsu.edu:8080
>>>> >> > DEBUG:root:['/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff/../',
>>>> >> > '/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/virtualenv-1.6.4-py2.7.egg',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>>>> >> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages',
>>>> >> >
>>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>>>> >> > '/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff',
>>>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python27.zip',
>>>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/plat-linux2',
>>>> >> >
>>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-tk',
>>>> >> >
>>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-old',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-dynload',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/site-packages',
>>>> >> >
>>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>>>> >> >
>>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic',
>>>> >> >
>>>> >> >
>>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic']
>>>> >> > DEBUG:root:Utilizing ADVERT Backend
>>>> >> > DEBUG:root:Parsing URL: advert://advert.cct.lsu.edu:8080
>>>> >> > DEBUG:root:Server: advert.cct.lsu.edu Port 8080
>>>> server_connect_url: None
>>>> >> > DEBUG:root:initialized BigJob:
>>>> >> > bigjob:6638ce9e-21d6-11e1-ac76-002215124496
>>>> >> > Traceback (most recent call last):
>>>> >> >   File "example_manyjob_affinity.py", line 61, in <module>
>>>> >> >     mjs = many_job_affinity_service(resource_list,
>>>> COORDINATION_URL)
>>>> >> >   File
>>>> >> >
>>>> >> >
>>>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job_affinity.py",
>>>> >> > line 19, in __init__
>>>> >> >     super(many_job_affinity_service, self).__init__(bigjob_list,
>>>> >> > advert_host)
>>>> >> >   File
>>>> >> >
>>>> >> >
>>>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>>>> >> > line 59, in __init__
>>>> >> >     self.__init_bigjobs()
>>>> >> >   File
>>>> >> >
>>>> >> >
>>>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>>>> >> > line 74, in __init_bigjobs
>>>> >> >     self.__start_bigjob(i)
>>>> >> >   File
>>>> >> >
>>>> >> >
>>>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>>>> >> > line 98, in __start_bigjob
>>>> >> >     bj_dict["number_of_processes"],
>>>> >> > KeyError: 'number_of_processes'
>>>> >> > Cancel Pilot Job
>>>> >> > stop pilot job:
>>>> >> > DEBUG:root:create advert entry: advert://advert.cct.lsu.edu:8080/
>>>> >> > DEBUG:root:update state of pilot job to: Done Stopped: True
>>>> >> > DEBUG:root:delete pilot job:
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > Bigjob-users mailing list
>>>> >> > Bigjob-users at mail.cct.lsu.edu
>>>> >> > https://mail.cct.lsu.edu/mailman/listinfo/bigjob-users
>>>> >> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>> _______________________________________________
>>> Bigjob-users mailing list
>>> Bigjob-users at mail.cct.lsu.edu
>>> https://mail.cct.lsu.edu/mailman/listinfo/bigjob-users
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cct.lsu.edu/pipermail/bigjob-users/attachments/20111210/26db789d/attachment-0001.html 


More information about the Bigjob-users mailing list