19 March 2012

111. Ecce (nwchem) on Debian, and ROCKS/Centos

If you're using nwchem chances are that you've considered using ECCE to parse the output:
http://ecce.emsl.pnl.gov/

First of all you'll need to register at https://eus.emsl.pnl.gov/Portal/ -- and you can only do that if you're faculty. Postdocs and PhD students need not apply. Other than that, it's free, but you'll have to wait a couple of days to get your registration approved.

As much as I like nwchem owing to the clear syntax, I feel less warmly about ecce. Don't get me wrong -- it's pretty. It's just feels archaic and cobbled together. Even worse is that it's not open source and that its workings feel a bit opaque at times. Still, there's no better program for visually parsing nwchem output at this point. Anyway...

--start here --
Debian:
Download the install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh file to ~/tmp/ecce

There's no md5sum supplied but here's what I got:
2ee70cc817dee9f80b11be5eac6e53e5

If you haven't already
sudo apt-get install csh 

OK, moving on...
cd ~/tmp/ecce
chmod +x  install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh
./install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh


Main ECCE installation menu
===========================
0) Help on main menu options
1) Full install
2) Full upgrade
3) Application software install
4) Application software upgrade
5) Server install
6) Server upgrade

Pick 1 if you're installing on your desktop and there's no server that you know of. 

Once the installation is over you get:
***************************************************************
!! You MUST perform the following steps in order to use ECCE !!
-- Unless only the user 'me' will be running ECCE,
   start the ECCE server as 'me' with:
     /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
-- To register machines to run computational codes, please see
   the installation and compute resource registration manuals
   at http://ecce.pnl.gov/using/installguide.shtml
-- To run ECCE each user must source either the runtime_setup
   (csh/tcsh) or runtime_setup.sh (sh/bash/ksh) script in the
   directory /home/me/tmp/ecce/ecce-v6.2/apps/scripts
   from their shell environment setup script.  For example,
   with csh or tcsh, add the following to ~/.cshrc:
     if (-e /home/me/tmp/ecce/ecce-v6.2/apps/scripts/runtime_setup) then
       source /home/me/tmp/ecce/ecce-v6.2/apps/scripts/runtime_setup
     endif
***************************************************************
Which translates to:
1. sh  /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
2. Sourcing that file makes no sense. Instead, add the following to your ~/.bashrc
export ECCE_HOME=/home/me/tmp/ecce/ecce-v6.2/apps
export PATH=${ECCE_HOME}/scripts:${PATH}

Assuming you've source your ~/.bashrc, start ecce by typing
ecce

...which takes an unreasonably long time (ca 1 min) after which you're greeted by
Press Any Key
Type in a password -- any password -- which will be your password from now on.
You're then taken to
Click on Viewer (assuming you've got something to look at)
Pay attention to the fine print
Have a look at the text box in the bottom right corner..and pay attention. In my particular case I have 6 cores and an mpi aware nwchem 6.0 version compiled. I bet that's better than whatever comes bundled with ecce. Also, the

To change you go to the machine browser (see screen shot #2), click on set up remote access and make sure that everything is working by clicking on e.g. processes:

Then click on the Machine menu (top left), select Register Machine while your machine is selected.
You can now change your options.

Running:
So, before using ecce you always need to
sh  /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
first. The server will run until you stop it or reboot.
Next, start ecce
ecce

Integration with nwchem
Most people would probably set up their nwchem jobs by hand, because it's so simple. All you need to do is to include the statement
ecce_print ecce.out
in the beginning, and you'll get an ecce.out file which you can then IMPORT (not open regularly, but import) into ecce.

Click on Viewer, Import Calculation From Output File, select your ecce out and voilá:
ECCE: homo (benzene)
If you're running debian, you're done now.



ROCKS 5.4.3/Centos 5.6:
This isn't a fix as much as a rant. The problem with ROCKS 5.4.3 is that csh is so broken that it's a struggle just to install ecce. I mean, I do show how to get ecce running in the end, but ROCKS feels like an unfinished piece of work compared to a normal debian install.

--Demonstration only -- don't do --
First back up ssh-key.sh and ssh-key.csh in /etc/profile.d

So...you start by
chmod +x install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh
./install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh
...and nothing's happening.

You then try just typing in
csh

/etc/profile.d/ssh-key.sh: line 211: return: can only `return' from a function or sourced script
It appears that you have not set up your ssh key.
This process will make the files:
     /export/home/me/.ssh/id_rsa.pub
     /export/home/me/.ssh/id_rsa
     /export/home/me/.ssh/authorized_keys
Generating public/private rsa key pair.
/export/home/me/.ssh/id_rsa already exists.
Overwrite (y/n)? 

Turns out there's a bug in ROCKS 5.4.3.  You can fix that by:
rpm -Uvh ftp://www.rocksclusters.org/pub/rocks/updates/5.4.3/x86_64/RPMS/rocks-config-server-5.4.3-1.x86_64.rpm

So far so good.
csh
...and nothing. It just exits. Or so you think. But the problem is bigger than that --  try opening a new terminal in e.g. gnome (gnome-terminal or xterm) -- it exits immediately. No error message or anything.

You can get csh to start by moving /etc/csh.cshrc out of the way, but you're still screwed as to opening a new terminal. The only way to get back a working system is to restore ssh-key.sh and ssh-key.csh.

--- Demonstration over ---

--Start here --
 You could also get around all this by running
csh -f
But then you don't have any env. variables loading and it can lead to problems of its own.

Anyway:
csh -f install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh

The install starts. Just follow the instructions.

After installation, start the server:
csh -f ecce-v6.2/server/ecce-utils/start_ecce_server

Hit enter until you get a workable prompt back...
Edit your ~/.bashrc and add

export ECCE_HOME=/home/me/tmp/ecce/ecce-v6.2/apps
export PATH=${ECCE_HOME}/scripts:${PATH}

Don't bother sourcing your ~/.bashrc. It's easier to just open a new terminal.
Type
ecce
and you should be up and running...sort of. Under ROCKS I had problems importing ecce.out files since I had problems actually connecting to the server. Don't know why, but it came down to not being able to open a remote shell on the host.

NOTE:
this worked fine on one box, but not on another one which I was setting up remotely. On that one I had to edit

ecce/apps/siteconfig/Dataservers
and
ecce/apps/siteconfig/jndi.properties 

In particular, I had to change references to eccetera.emsl.pnl.gov.

9 comments:

  1. Thanks for the clear explanation. I have Ubuntu 12.04 i followed your instructions for Debian. Everything seems fines except when I type 'ecce' in a terminal i get the following error message
    "Starting ECCE ... please wait
    ./gateway: symbol lookup error: /usr/lib/i386-linux-gnu/libXfixes.so.3: undefined symbol: _XGetRequest"

    any ideas why? Thanks!

    ReplyDelete
    Replies
    1. Anon,
      Looking online I found this thread (http://forum.ovh.com/showthread.php?t=79256) in French -- basically it's a problem with the Ubuntu versions of libxfixes.


      In http://www.le-libriste.fr/2012/03/installer-sur-ubuntu-hubic-proposant-25-go-gratuit-sur-le-cloud/comment-page-1/#comment-7601 one person solves it by using the debian package instead.

      1. The easiest solution will be to file a bug report with canonical if you have the patience -- if it's a canonical bug then the response may be fast. If not, then they might wait for upstreams, which can take time.

      2. If not you can try compiling everything yourself including a version of libxfixes (not terribly difficult with ecce at least) or
      3. you can try to use the debian package (tricky -- can break things).

      If you're not working with a critical system (that you're using 386 make it look like it might just be a virtual machine) you can try downloading from here http://packages.debian.org/wheezy/i386/libxfixes3/download

      Then just do
      sudo dpkg -i libxfixes3_5.0-4_i386.deb
      in the directory that you downloaded the file in.

      If you're working on a more important system, I wouldn't recommend mixing packages though.

      -----
      My 'tests' on debian wheezy
      aptitude show libxfixes
      give v 1:5.0-4.

      locate libXfixes.so gives me
      /usr/lib/x86_64-linux-gnu/libXfixes.so
      /usr/lib/x86_64-linux-gnu/libXfixes.so.3
      /usr/lib/x86_64-linux-gnu/libXfixes.so.3.1.0
      /usr/lib32/libXfixes.so
      /usr/lib32/libXfixes.so.3
      /usr/lib32/libXfixes.so.3.1.0

      ls -lah shows
      /usr/lib/x86_64-linux-gnu/libXfixes.so -> libXfixes.so.3.1.0
      /usr/lib32/libXfixes.so -> libXfixes.so.3.1.0

      Neither
      cat /usr/lib32/libXfixes.*|strings|grep XGe
      cat /usr/lib/x86_64-linux-gnu/libXfixes.*|strings|grep XGe
      give me anything.


      libXfixes is a requirement for ECCE as shown in the long list posted here: http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id423/ECCE_6.3_apps_won't_start.html/

      Delete
    2. I solved this on someone's ubuntu laptop by compiling ECCE version 6.4. Worked perfectly.

      Delete
  2. Hi! Thank you for you blog. All the information is very important. I have this problem, can you help me?

    Starting ECCE ... please wait
    libGL error: failed to load driver: nouveau
    libGL error: Try again with LIBGL_DEBUG=verbose for more details.
    libGL error: failed to load driver: swrast
    libGL error: Try again with LIBGL_DEBUG=verbose for more details.
    The program 'builder' received an X Window System error.
    This probably reflects a bug in the program.
    The error was 'BadAlloc (insufficient resources for operation)'.
    (Details: serial 1926 error_code 11 request_code 154 minor_code 24)
    (Note to programmers: normally, X errors are reported asynchronously;
    that is, you will receive the error a while after causing it.
    To debug your program, run it with the --sync command line
    option to change this behavior. You can then get a meaningful
    backtrace from your debugger if you break on the gdk_x_error() function.)

    ReplyDelete
    Replies
    1. The key is here:"libGL error: failed to load driver: nouveau" -- nouveau is an open source driver for nvidia cards.

      So the questions are:
      1. What kind of hardware are you doing this on? Have you installed ecce on a desktop with a monitor? Or are you ssh:ing to a headless box?

      2. If physically on the same machine, do you have any DEs installed?

      Delete
  3. Thanks Lindqvist,

    1) My Desktop PC has linuxmint operative system installed on 64Bits, Processor Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz; 12GB RAM; Hard drive 1 TB; graphic cards NVIDIA GT216 [GeForce GT 220]. Respect to nouveau driver with lspci -k the showed information is:
    01:00.0 VGA compatible controller: NVIDIA Corporation GT216 [GeForce GT 220] (rev a2)
    Subsystem: Micro-Star International Co., Ltd. Device 2022
    Kernel driver in use: nouveau
    Kernel modules: nouveau, nvidiafb


    ReplyDelete
    Replies
    1. Not sure what's going on then.

      The first step would be to try without openGL -- go to the apps/siteconfig/ directory under the ecce root and open site_runtime. Change
      ECCE_MESA_OPENGL true
      to false.

      You might have to edit apps/siteconfig/RemoteServer/site_runtime as well.

      Otherwise, have you tried using the nvidia binary driver? Maybe the acceleration is required? I'm just guessing though.

      http://verahill.blogspot.com.au/2012/09/setting-up-asus-nvidia-gf-210-on-debian.html

      Delete
  4. I just installed ECCE on my computer (of course ECCE v-7.0). I added "ecce_print ecce.out" at the beginning of my file .nw and I could have a file "ecce.out". However, when I import the ecce.out file on ECCE the "run statistic", "vibrational frequencies" and "energy gradient" are not shown. Do you know why?

    ReplyDelete
    Replies
    1. I can't reproduce your error. When I open an ecce.out file everything gets imported. Note that you might have to wait a few seconds (or longer).

      I think there was post on the ecce/nwchem forum about something similar, where the poster needed to open the file twice to get the full output.

      If you want to troubleshoot on your own:
      * the ecce.out file is in plain text, so you can make sure that the information is there.
      * the information is imported using the scripts in apps/scripts/parsers -- start by looking at nwchem.desc which shows the triggers and scripts for different properties

      Delete