[Bigjob-users] BigJob - Remote Job Submission Error!

Sai Saripalli ssarip1 at tigers.lsu.edu
Mon Feb 6 15:23:46 CST 2012


Hi All,

I was able to run bigjob on remote machine from sierra(e.g. from sierra on
india) but was not able to do so from queenbee. I was facing the following
error when I run the example_fg_ingle.py. I was able to see the BigJob
directory on the remote machine but not corresponding subjob directory. The
following are the parameters in my python script.

    queue=None # if None default queue is used
    project=None # if None default allocation is used
    walltime=10
    processes_per_node=8
    number_nodes = 1
    workingdirectory="/N/u/ssarip1/agent"
    #workingdirectory="/home/ssaripal/bigjob_examples/examples" # working
directory for agent
    userproxy=None # userproxy (not supported yet due to context issue w/
SAGA)

COORDINATION_URL = "redis://cyder.cct.lsu.edu:2525"
lrms_url = "pbs-ssh://ssarip1@india.futuregrid.org" # resource url to run
the jobs on localhost



Error:
(python)[ssaripal at qb3 examples]$ python example_fg_single.py
02/06/2012 03:11:30 PM - bigjob - DEBUG - Loading BigJob version: 0.4.33
02/06/2012 03:11:30 PM - bigjob - DEBUG - read configfile:
/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/BigJob-0.4.33-py2.7.egg/bigjob/../bigjob.conf
02/06/2012 03:11:30 PM - bigjob - DEBUG - Using SAGA C++/Python.
Start Pilot Job/BigJob at: pbs-ssh://ssarip1@india.futuregrid.org
02/06/2012 03:11:31 PM - bigjob - DEBUG - init BigJob w/: redis://
cyder.cct.lsu.edu:2525
02/06/2012 03:11:31 PM - root - DEBUG -
['/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/BigJob-0.4.33-py2.7.egg/coordination/../ext/redis-2.4.9/',
'/home/ssaripal/bigjob_examples/examples/../',
'/home/ssaripal/bigjob_examples/examples',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/pip-1.0.2-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/BigJob-0.4.33-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/bliss-0.1.17-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/virtualenv-1.7-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/uuid-1.30-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/paramiko_on_pypi-1.7.6-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/openssh_wrapper-0.2-py2.7.egg',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/pycrypto_on_pypi-2.3-py2.7-linux-x86_64.egg',
'/project/tg_csa/saga/saga/1.5.3/gcc-3.4.6/lib/python2.7.1/site-packages',
'/project/tg_csa/saga/external/python/2.7.1/gcc-3.4.6/lib/python2.7/site-packages',
'/home/ssaripal/bigjob_examples/examples/PYTHONPATH=/usr/local/packages/python/2.6.4/gcc-4.3.2.new/bin/python2.6',
'/usr/local/packages/saga/1.5.3/saga-bindings-python-0.9.0',
'/usr/local/packages/saga/1.5.3/lib/python2.3.4/site-packages',
'/home/ssaripal/bigjob_examples/examples',
'/home/packages/teragrid/pacman-3.26-r1/src',
'/home/ssaripal/.bigjob/python/lib/python27.zip',
'/home/ssaripal/.bigjob/python/lib/python2.7',
'/home/ssaripal/.bigjob/python/lib/python2.7/plat-linux2',
'/home/ssaripal/.bigjob/python/lib/python2.7/lib-tk',
'/home/ssaripal/.bigjob/python/lib/python2.7/lib-old',
'/home/ssaripal/.bigjob/python/lib/python2.7/lib-dynload',
'/project/tg_csa/saga/external/python/2.7.1/gcc-3.4.6/lib/python2.7',
'/project/tg_csa/saga/external/python/2.7.1/gcc-3.4.6/lib/python2.7/plat-linux2',
'/project/tg_csa/saga/external/python/2.7.1/gcc-3.4.6/lib/python2.7/lib-tk',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages',
'/home/ssaripal/.bigjob/python/lib/python2.7/site-packages/BigJob-0.4.33-py2.7.egg/bigjob']
02/06/2012 03:11:31 PM - bigjob - DEBUG - Utilizing Redis Backend
02/06/2012 03:11:31 PM - bigjob - DEBUG - Parsing URL: redis://
cyder.cct.lsu.edu:2525
02/06/2012 03:11:31 PM - bigjob - DEBUG - redis:// cyder.cct.lsu.edu 2525
02/06/2012 03:11:31 PM - bigjob - DEBUG - Connect to Redis:
cyder.cct.lsu.edu Port: 2525
02/06/2012 03:11:31 PM - bigjob - DEBUG - initialized BigJob:
bigjob:bj-2426ab10-5107-11e1-857a-0060dd46c5ff
02/06/2012 03:11:31 PM - bigjob - DEBUG - create pilot job entry on backend
server: bigjob:bj-2426ab10-5107-11e1-857a-0060dd46c5ff:india.futuregrid.org
02/06/2012 03:11:31 PM - bigjob - DEBUG - update state of pilot job to:
Unknown
02/06/2012 03:11:31 PM - bigjob - DEBUG - set pilot state to: Unknown
02/06/2012 03:11:31 PM - bigjob - DEBUG - Create remote directory; scheme:
ssh://, host: india.futuregrid.org, path:
/N/u/ssarip1/agent/bj-2426ab10-5107-11e1-857a-0060dd46c5ff
02/06/2012 03:11:31 PM - bigjob - DEBUG - discovered user: ssarip1
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - starting thread
(client mode): 0x9f0f0b50L
02/06/2012 03:11:31 PM - paramiko.transport - INFO - Connected (version
2.0, client OpenSSH_4.6)
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - kex
algos:['diffie-hellman-group-exchange-sha256',
'diffie-hellman-group-exchange-sha1', 'diffie-hellman-group14-sha1',
'diffie-hellman-group1-sha1'] server key:['ssh-rsa', 'ssh-dss'] client
encrypt:['aes128-cbc', '3des-cbc', 'blowfish-cbc', 'cast128-cbc',
'arcfour128', 'arcfour256', 'arcfour', 'aes192-cbc', 'aes256-cbc', '
rijndael-cbc at lysator.liu.se', 'aes128-ctr', 'aes192-ctr', 'aes256-ctr']
server encrypt:['aes128-cbc', '3des-cbc', 'blowfish-cbc', 'cast128-cbc',
'arcfour128', 'arcfour256', 'arcfour', 'aes192-cbc', 'aes256-cbc', '
rijndael-cbc at lysator.liu.se', 'aes128-ctr', 'aes192-ctr', 'aes256-ctr']
client mac:['hmac-md5', 'hmac-sha1', 'hmac-ripemd160', '
hmac-ripemd160 at openssh.com', 'hmac-sha1-96', 'hmac-md5-96'] server
mac:['hmac-md5', 'hmac-sha1', 'hmac-ripemd160', 'hmac-ripemd160 at openssh.com',
'hmac-sha1-96', 'hmac-md5-96'] client compress:['none', 'zlib at openssh.com']
server compress:['none', 'zlib at openssh.com'] client lang:[''] server
lang:[''] kex follows?False
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - Ciphers agreed:
local=aes128-ctr, remote=aes128-ctr
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - using kex
diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local
aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1;
compression: local none, remote none
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - Switch to new keys ...
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - Trying discovered key
b5c71a8a8f0422d09f2b765f546f07af in /home/ssaripal/.ssh/id_rsa
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - userauth is OK
02/06/2012 03:11:31 PM - paramiko.transport - INFO - Authentication
(publickey) successful!
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - [chan 1] Max packet
in: 34816 bytes
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - [chan 1] Max packet
out: 32768 bytes
02/06/2012 03:11:31 PM - paramiko.transport - INFO - Secsh channel 1 opened.
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - [chan 1] Sesch
channel 1 request ok
02/06/2012 03:11:31 PM - paramiko.transport.sftp - INFO - [chan 1] Opened
sftp connection (server version 3)
02/06/2012 03:11:31 PM - paramiko.transport.sftp - DEBUG - [chan 1]
mkdir('/N/u/ssarip1/agent/bj-2426ab10-5107-11e1-857a-0060dd46c5ff', 511)
02/06/2012 03:11:31 PM - paramiko.transport.sftp - INFO - [chan 1] sftp
session closed.
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - [chan 1] EOF sent (1)
02/06/2012 03:11:31 PM - paramiko.transport - DEBUG - EOF in transport
thread
02/06/2012 03:11:31 PM - bigjob - DEBUG - Stage: None to ssh://
ssarip1 at india.futuregrid.org/N/u/ssarip1/agent/bj-2426ab10-5107-11e1-857a-0060dd46c5ff
02/06/2012 03:11:31 PM - bigjob - DEBUG - BJ Working Directory:
/N/u/ssarip1/agent/bj-2426ab10-5107-11e1-857a-0060dd46c5ff
02/06/2012 03:11:31 PM - bigjob - DEBUG - Adaptor specific modifications:
pbs-ssh
02/06/2012 03:11:31 PM - bigjob - DEBUG - Escape SSH
02/06/2012 03:11:31 PM - bigjob - DEBUG - "import sys
import os
import urllib
import sys
import time
import textwrap

qsub_file_name=\"bigjob_pbs_ssh\"

qsub_file = open(qsub_file_name, \"w\")
qsub_file.write(\"#PBS -l nodes=1:ppn=8\")
qsub_file.write(\"\n\")
qsub_file.write(\"#PBS -l walltime=0:10:00\")
qsub_file.write(\"\n\")
qsub_file.write(\"#PBS -o
/N/u/ssarip1/agent/bj-2426ab10-5107-11e1-857a-0060dd46c5ff/stdout-bigjob_agent.txt\")
qsub_file.write(\"\n\")
qsub_file.write(\"#PBS -e
/N/u/ssarip1/agent/bj-2426ab10-5107-11e1-857a-0060dd46c5ff/stderr-bigjob_agent.txt\")
qsub_file.write(\"\n\")
qsub_file.write(\"cd
/N/u/ssarip1/agent/bj-2426ab10-5107-11e1-857a-0060dd46c5ff\")
qsub_file.write(\"\n\")
qsub_file.write(\"python -c \\\"\" + textwrap.dedent(\"\"\"import sys
import os
import urllib
import sys
import time
start_time = time.time()
home = os.environ.get(\\\\\"HOME\\\\\")
BIGJOB_AGENT_DIR= os.path.join(home, \\\\\".bigjob\\\\\")
if not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR)
BIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\\\\\"/python/\\\\\"
BOOTSTRAP_URL=\\\\\"
https://raw.github.com/drelu/BigJob/master/bootstrap/bigjob-bootstrap.py\\\\\
"
BOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\\\\\"/bigjob-bootstrap.py\\\\\"
#ensure that BJ in .bigjob is upfront in sys.path
sys.path.insert(0, os.getcwd() + \\\\\"/../\\\\\")
sys.path.insert(0, os.getcwd() + \\\\\"/../../\\\\\")
p = list()
for i in sys.path:
    if i.find(\\\\\".bigjob/python\\\\\")>1:
          p.insert(0, i)
for i in p: sys.path.insert(0, i)
print str(sys.path)
try: import saga
except: print \\\\\"SAGA and SAGA Python Bindings not found: BigJob only
work w/ non-SAGA backends e.g. Redis, ZMQ.\\\\\";print \\\\\"Python
version: \\\\\",  os.system(\\\\\"python -V\\\\\");print \\\\\"Python path:
\\\\\" + str(sys.path)
try: import bigjob.bigjob_agent
except: print \\\\\"BigJob not installed. Attempting to install it.\\\\\";
opener = urllib.FancyURLopener({}); opener.retrieve(BOOTSTRAP_URL,
BOOTSTRAP_FILE); os.system(\\\\\"python \\\\\" + BOOTSTRAP_FILE + \\\\\"
\\\\\" + BIGJOB_PYTHON_DIR); activate_this =
BIGJOB_PYTHON_DIR+\\\\\"bin/activate_this.py\\\\\"; execfile(activate_this,
dict(__file__=activate_this))
#try to import BJ once again
import bigjob.bigjob_agent
# execute bj agent
args = list()
args.append(\\\\\"bigjob_agent.py\\\\\")
args.append(\\\\\"redis://cyder.cct.lsu.edu:2525\\\\\")
args.append(\\\\\"bigjob:bj-2426ab10-5107-11e1-857a-0060dd46c5ff:
india.futuregrid.org\\\\\")
print \\\\\"Bootstrap time: \\\\\" + str(time.time()-start_time)
print \\\\\"Starting BigJob Agents with following args: \\\\\" + str(args)
bigjob_agent = bigjob.bigjob_agent.bigjob_agent(args)
\"\"\") + \"\\\"\")
qsub_file.close()
os.system( \"qsub  \" + qsub_file_name)
"
use standard proxy



Thank you,
Sai Saripalli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cct.lsu.edu/pipermail/bigjob-users/attachments/20120206/4a873ae6/attachment.html 


More information about the Bigjob-users mailing list