[Bigjob-users] Attempt to run example_manyjob_affinity.py

Paula Sanematsu psanem1 at tigers.lsu.edu
Sat Dec 10 12:17:26 CST 2011


Hi Pradeep!

What do you mean by localhost? If you mean a password less login to Hotel,
that's what I did.

Where is the agent log file? I have "advert-launcher-machines*" files in my
$HOME and "stderr-bigjob_agent-*" and "stdout-bigjob_agent-*" files in
$HOME/examples/agent/. I don't know which one is the log file.

Thanks,

Paula

On Sat, Dec 10, 2011 at 11:58 AM, pradeep kumar Mantha <pradeepm66 at gmail.com
> wrote:

> Hi Paula!
>
> Could you please enable a password less login to your localhost and retry.
> If it doesn't work please send the example & agent log file created in
> working directory.
>
>
>
> On Sat, Dec 10, 2011 at 11:52 AM, Paula Sanematsu <psanem1 at tigers.lsu.edu>wrote:
>
>> Hi Andre,
>>
>> I installed BigJob on Hotel and ran example_local_single.py. I'm getting
>> the output below. Any idea why it keeps "running"?
>>
>> Thanks!
>>
>> Paula
>>
>> Start Pilot Job/BigJob at: fork://localhost
>>
>> DEBUG:root:init BigJob w/: advert://advert.cct.lsu.edu:8080
>> DEBUG:root:Load Advert Coordination
>> DEBUG:root:['/gpfs/home/paulasoo/examples/../',
>> '/gpfs/home/paulasoo/examples',
>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/pip-1.0.2-py2.7.egg',
>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/BigJob-0.3.31-py2.7.egg',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/virtualenv-1.6.4-py2.7.egg',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages',
>> '/gpfs/home/paulasoo/examples',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python27.zip',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/plat-linux2',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-tk',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-old',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-dynload',
>> '/gpfs/software/x86_64/el5/hotel/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/site-packages',
>> '/gpfs/home/paulasoo/.bigjob/python/lib/python2.7/site-packages/BigJob-0.3.31-py2.7.egg/bigjob']
>>
>> DEBUG:root:Utilizing ADVERT Backend
>> DEBUG:root:Parsing URL: advert://advert.cct.lsu.edu:8080
>> DEBUG:root:Server: advert.cct.lsu.edu Port 8080 server_connect_url: None
>> DEBUG:root:Initialized Coordination to: advert://advert.cct.lsu.edu:8080/(DB: )
>> DEBUG:root:initialized BigJob: bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac
>> DEBUG:root:create pilot job entry on backend server:
>> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost
>> DEBUG:root:create advert entry: advert://
>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost
>> ?
>> DEBUG:root:update state of pilot job to: Unknown Stopped: False
>> DEBUG:root:set pilot state to: Unknown
>> Adaptor specific modifications: fork
>> Working directory: /gpfs/home/paulasoo/examples/agent
>> use standard proxy
>> Submit pilot job to: fork://localhost/
>> Pilot Job/BigJob URL:
>> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost State: Running
>> DEBUG:root:add subjob to queue of PJ:
>> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost
>> DEBUG:root:create dictionary for job description. Job-URL:
>> bigjob:8f1b52fc-2355-11e1-9985-00215ec9e3ac:localhost:jobs:91209314-2355-11e1-9985-00215ec9e3ac
>> DEBUG:root:Job URL: advert://
>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac?,
>> Job Description URL: advert://
>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac/job-description
>> ?
>> DEBUG:root:initialized advert entry for job: advert://
>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac
>> ?
>> DEBUG:root:Set state of job: advert://
>> advert.cct.lsu.edu:8080//bigjob/8f1b52fc-2355-11e1-9985-00215ec9e3ac/localhost/jobs/91209314-2355-11e1-9985-00215ec9e3ac?
>> to: Unknown
>> state: Unknown
>> state: Unknown
>> Password: state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>> state: Running
>>
>>
>> On Fri, Dec 9, 2011 at 3:56 PM, Andre Luckow <aluckow at cct.lsu.edu> wrote:
>>
>>> Hi Paula,
>>> for launching remote jobs you certainly want to use Globus, i.e. GRAM
>>> URLs. The bigjob_agent key in the resource_list does not need to be
>>> set anymore!
>>>
>>> Please try to submit the job first from the hotel front node. Once you
>>> are sure that Globus etc. is working you can do a remote submission.
>>> E.g you need to make sure that you use the right BJ version on hotel
>>> as well (ie. you need to double-check your PYTHONPATH etc.)
>>>
>>> Hope that helps.
>>>
>>> Best,
>>> Andre
>>>
>>> On Fri, Dec 9, 2011 at 9:21 PM, Paula Sanematsu <psanem1 at tigers.lsu.edu>
>>> wrote:
>>> > Hi Andre,
>>> >
>>> >>
>>> >> Please change numer_nodes to number_of_processes in line 51:
>>> >>
>>> >> resource_list.append( {"resource_url" : "fork://localhost/",
>>> >> "number_of_processes" : "2", "allocation" : "myAllocation", ...
>>> >
>>> >
>>> > I changed number_nodes to number_of_processes and it now runs.
>>> However, it
>>> > seems like it is in a infinite loop, perhaps because my second machine
>>> is
>>> > not configured properly? This is the configuration for my second
>>> machine:
>>> >
>>> > resource_list.append( {"resource_url" : "ssh://hotel.futuregrid.org",
>>> > "number_of_processes" : "4", "allocation" :  "myAllocation", "queue" :
>>> > "workq", "bigjob_agent":
>>> >
>>> ("/N/soft/SAGA/saga/1.5.3/gcc-4.1.2/lib/python2.7.1/site-packages/bigjob/bigjob_agent_launcher.sh"),
>>> > "working_directory": (os.getcwd() + "/agent"), "walltime":10,
>>> "affinity" :
>>> > "affinity1"})
>>> >
>>> > Here, I used the bigjob_agent_launcher.sh in the directory written
>>> above
>>> > because it was the only place I could find it. Should I use something
>>> else
>>> > for "bigjob_agent"? Also, I'm not sure whether ssh://
>>> hotel.futuregrid.org or
>>> > gram://hotel.futuregrid.org/jobmanager-pbs should be used.
>>> >
>>> > This is the output that keeps coming up:
>>> >
>>> > Current states: {'New': 2, 'Unknown': 6}
>>> > DEBUG:root:Reschedule Thread
>>> > DEBUG:root:Big Job:
>>> bigjob:50ec186c-22a2-11e1-9a15-002215124496:localhost
>>> > Cores: 0/2 State: Running Terminated: False #Required Cores: 1
>>> > DEBUG:root:Big Job:
>>> > bigjob:51d0e1cc-22a2-11e1-9a15-002215124496:hotel.futuregrid.orgCores: 4/4
>>> > State: Unknown Terminated: False #Required Cores: 1
>>> > DEBUG:root:found no active resource for sub-job => (re-) queue it
>>> > DEBUG:root:free_cores: [0, 0] total_free_cores: 0
>>> >
>>> >>
>>> >> Also, the BJ part of CSA is a bit old and contains some bugs. If
>>> >> possible, try to install BJ in userspace (as outlined on the Wiki
>>> >> page).
>>> >>
>>> >> It's fixed in BigJob-0.3.31 and SVN.
>>> >
>>> >
>>> > I followed the instructions on the Wiki (b. Python Packaging and
>>> Virtualenv)
>>> > and did an update. I looks like I have BigJob-0.3.31.
>>> >
>>> > Thanks,
>>> >
>>> > Paula
>>> >
>>> >
>>> > On Thu, Dec 8, 2011 at 3:07 PM, Andre Luckow <aluckow at cct.lsu.edu>
>>> wrote:
>>> >>
>>> >> Hi Paula,
>>> >> there is a small bug in the example:
>>> >>
>>> >> Please change numer_nodes to number_of_processes in line 51:
>>> >>
>>> >> resource_list.append( {"resource_url" : "fork://localhost/",
>>> >> "number_of_processes" : "2", "allocation" : "myAllocation", ...
>>> >>
>>> >> Also, the BJ part of CSA is a bit old and contains some bugs. If
>>> >> possible, try to install BJ in userspace (as outlined on the Wiki
>>> >> page).
>>> >>
>>> >> It's fixed in BigJob-0.3.31 and SVN.
>>> >>
>>> >> Best,
>>> >> Andre
>>> >>
>>> >>
>>> >> On Thu, Dec 8, 2011 at 9:42 PM, Paula Sanematsu <
>>> psanem1 at tigers.lsu.edu>
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I'm trying to run example_manyjob_affinity.py on Sierra, but it
>>> doesn't
>>> >> > complete (see my output below). I'm submitting the job from Sierra
>>> and
>>> >> > would
>>> >> > like to use Hotel as my second machine. Could you please advise me
>>> on
>>> >> > how to
>>> >> > proceed?
>>> >> >
>>> >> > In addition, is there anything wrong with the cct advert service? I
>>> >> > could
>>> >> > run example_local_single.py, but now it's not working.
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Paula
>>> >> >
>>> >> > ManyJob load test with 8 jobs.
>>> >> > Create manyjob service
>>> >> > DEBUG:root:start bigjob at: fork://localhost/
>>> >> > DEBUG:root:init BigJob w/: advert://advert.cct.lsu.edu:8080
>>> >> > DEBUG:root:['/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff/../',
>>> >> > '/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/virtualenv-1.6.4-py2.7.egg',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
>>> >> > '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages',
>>> >> >
>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>>> >> > '/N/u/paulasoo/HW06_E3/examples/manySJ_2BJ_diff',
>>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python27.zip',
>>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/plat-linux2',
>>> >> > '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-tk',
>>> >> >
>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-old',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/lib-dynload',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/python2.7/site-packages',
>>> >> >
>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>>> >> >
>>> '/N/soft/SAGA/external/python/2.7.1/gcc-4.1.2/lib/2.7/site-packages',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic',
>>> >> >
>>> >> >
>>> '/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic']
>>> >> > DEBUG:root:Utilizing ADVERT Backend
>>> >> > DEBUG:root:Parsing URL: advert://advert.cct.lsu.edu:8080
>>> >> > DEBUG:root:Server: advert.cct.lsu.edu Port 8080
>>> server_connect_url: None
>>> >> > DEBUG:root:initialized BigJob:
>>> >> > bigjob:6638ce9e-21d6-11e1-ac76-002215124496
>>> >> > Traceback (most recent call last):
>>> >> >   File "example_manyjob_affinity.py", line 61, in <module>
>>> >> >     mjs = many_job_affinity_service(resource_list, COORDINATION_URL)
>>> >> >   File
>>> >> >
>>> >> >
>>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job_affinity.py",
>>> >> > line 19, in __init__
>>> >> >     super(many_job_affinity_service, self).__init__(bigjob_list,
>>> >> > advert_host)
>>> >> >   File
>>> >> >
>>> >> >
>>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>>> >> > line 59, in __init__
>>> >> >     self.__init_bigjobs()
>>> >> >   File
>>> >> >
>>> >> >
>>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>>> >> > line 74, in __init_bigjobs
>>> >> >     self.__start_bigjob(i)
>>> >> >   File
>>> >> >
>>> >> >
>>> "/N/soft/SAGA/saga/1.6/gcc-4.1.2/lib/python2.7/site-packages/BigJob-0.3.2-py2.7.egg/bigjob_dynamic/many_job.py",
>>> >> > line 98, in __start_bigjob
>>> >> >     bj_dict["number_of_processes"],
>>> >> > KeyError: 'number_of_processes'
>>> >> > Cancel Pilot Job
>>> >> > stop pilot job:
>>> >> > DEBUG:root:create advert entry: advert://advert.cct.lsu.edu:8080/
>>> >> > DEBUG:root:update state of pilot job to: Done Stopped: True
>>> >> > DEBUG:root:delete pilot job:
>>> >> >
>>> >> > _______________________________________________
>>> >> > Bigjob-users mailing list
>>> >> > Bigjob-users at mail.cct.lsu.edu
>>> >> > https://mail.cct.lsu.edu/mailman/listinfo/bigjob-users
>>> >> >
>>> >
>>> >
>>>
>>
>>
>> _______________________________________________
>> Bigjob-users mailing list
>> Bigjob-users at mail.cct.lsu.edu
>> https://mail.cct.lsu.edu/mailman/listinfo/bigjob-users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cct.lsu.edu/pipermail/bigjob-users/attachments/20111210/776b8527/attachment-0001.html 


More information about the Bigjob-users mailing list