10 June 2014

580. Python script: Interpolate between structures in a multi-xyz file

I'm doing a lot of NEB (nudged elastic band) calculations using nwchem at the moment, and while getting 'neb convergence' is simple enough, I often get an error along the lines of
3965547 @neb NEB calculation converged
3965548 @neb However NEB Gmax not converged...Try increasing the number of beads.

While that sounds simple enough it's nicer if you don't have to go back to the beginning and e.g. run a more fine-grained PES job to generate a reasonable trajectory (straight linear interpolation often doesn't work), then keep on running neb iterations. One way to cut down on time (presumably) is to simply pad the neb path xyz with intermediate structures, and that is what this python (2.7) script does.

Oh, and I really wish blogspot would support code inclusion better...

How to use:
python nebinterpolate.py -i neb_A_F.neb_final.xyz  -o test.xyz

Example:
Say we have the structure of methanol, and methanol in which the oxygen-carbon distance is 3.0 Ångström:

Here's the corresponding xyz file, which we'll call first.xyz:
6 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 0.83960 0.81877 0.83960 H 1.38912 0.20156 1.38912 6 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 1.76087 1.75017 1.76087 H 2.31039 1.13296 2.31039

Run nebinterpolate -i first.xyz -o second.xyz and you'll get a new xyz file with three structures -- the first one plus an intermediate structure:

Here's the new file, second.xyz:
6 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 0.83960 0.81877 0.83960 H 1.38912 0.20156 1.38912 6 structure 2 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 1.30024 1.28447 1.30024 H 1.84976 0.66726 1.84976 6 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 1.76087 1.75017 1.76087 H 2.31039 1.13296 2.31039

 Run it again, nebinterpolate -i second.xyz -o third.xyz, and you'll get:

Here's the new file, third.xyz:
6 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 0.83960 0.81877 0.83960 H 1.38912 0.20156 1.38912 6 structure 2 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 1.06992 1.05162 1.06992 H 1.61944 0.43441 1.61944 6 structure 2 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 1.30024 1.28447 1.30024 H 1.84976 0.66726 1.84976 6 structure 4 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 1.53056 1.51732 1.53056 H 2.08007 0.90011 2.08007 6 C 0.03517 0.00549 0.03517 H -0.61778 -0.63407 0.66798 H 0.66798 -0.63407 -0.61778 H -0.60514 0.64647 -0.60514 O 1.76087 1.75017 1.76087 H 2.31039 1.13296 2.31039

Now, the real use for this is when you've been optimising a string of structures using NEB and want to increase the number of images because you're not getting gmax convergence, or if you want to do a quick and rough optimisation and then get a prettier looking set of coordinates.

You can load the multi-xyz file in nwchem by using

NEB
   ...
  XYZ_PATH path.xyz
END


nebinterpolate.py 

#!/usr/bin/python

import sys

def getvars(arguments):
 switches={}

 version='0.1'
 
 try:
  if "-i" in arguments:
   switches['in_one']=arguments[arguments.index('-i')+1]
   print 'Input: %s '% (switches['in_one'])
  else:
   arguments="--help";
 except:
  arguments="--help";
  
 try:
  if "-o" in arguments:
   switches['o']=arguments[arguments.index('-o')+1].lower()
   print 'Output: %s'% switches['o']
  else:
   arguments="--help";
 except:
  arguments="--help";

 try:
  if "-w" in arguments:
   switches['w']=float(arguments[arguments.index('-w')+1])
   print 'Weighting: %i'% switches['w']
  else:
   print 'Assuming no weighting'
   switches['w']=1.0;
 except:
  switches['w']=1.0;

 doexit=0
 try:
  if ("-h" in arguments) or ("--help" in arguments):
   print '\t\t bytes2words version %s' % version
   print 'Creates interpolated structures'
   print 'from an multixyz file'
   print '--input \t-i\t multi-xyz file to morph'
   print '--output\t-o\t output file'
   print '--weight\t-w\t weight first structure vs second one (1=average; 0=start; 2=end)'
   print 'Exiting'
   doexit=1
 except:
  a=0 # do nothing
 if doexit==1:
  sys.exit(0)

 return switches

def getcmpds(switches):
 
 cmpds={}
 
 g=open(switches['in_one'],'r') 
 n=0
 xyz=[]
 atoms=[]
 structure_id=1

 for line in g:

  try:
   if len(xyz)==cmpds['atoms_'+str(structure_id)]:
    cmpds['coords_'+str(structure_id)]=xyz
    cmpds['elements_'+str(structure_id)]=atoms   
    structure_id+=2
    n=0
    atoms=[]
    xyz=[]
  except:
   pass

  n+=1
  line=line.rstrip('\n')
   
  if n==1:
   cmpds['atoms_'+str(structure_id)]=int(line)
  elif n==2:
   cmpds['title_'+str(structure_id)]=line
  else:
   line=line.split(' ')
   line=filter(None,line)
   xyz+=[[float(line[1]),float(line[2]),float(line[3])]]
   atoms+=[line[0].capitalize()]
 g.close

 cmpds['coords_'+str(structure_id)]=xyz
 cmpds['elements_'+str(structure_id)]=atoms   
 cmpds['w']=switches['w']
 cmpds['structures']=(structure_id)
 
 return cmpds

def morph(cmpds,structure_id):
 coords_one=cmpds['coords_'+str(structure_id)]
 coords_two=cmpds['coords_'+str(structure_id+2)]
 
 coords_morph=[]
 coords_diff=[]
 for n in range(0,cmpds['atoms_'+str(structure_id)]):
  morph_x=coords_one[n][0]+cmpds['w']*(coords_two[n][0]-coords_one[n][0])/2.0
  morph_y=coords_one[n][1]+cmpds['w']*(coords_two[n][1]-coords_one[n][1])/2.0
  morph_z=coords_one[n][2]+cmpds['w']*(coords_two[n][2]-coords_one[n][2])/2.0
  diff_x=coords_two[n][0]-coords_one[n][0]
  diff_y=coords_two[n][1]-coords_one[n][1]
  diff_z=coords_two[n][2]-coords_one[n][2]
  coords_morph+=[[morph_x,morph_y,morph_z]]
  coords_diff+=[[diff_x,diff_y,diff_z]]

 cmpds['atoms_'+str(structure_id+1)]=cmpds['atoms_'+str(structure_id)]
 cmpds['elements_'+str(structure_id+1)]=cmpds['elements_'+str(structure_id)]
 cmpds['title_'+str(structure_id+1)]='structure '+str(structure_id+1)
 cmpds['coords_'+str(structure_id+1)]=coords_morph

 return cmpds

def genxyzstring(coords,element):
 x_str='%10.5f'% coords[0]
 y_str='%10.5f'% coords[1]
 z_str='%10.5f'% coords[2]
 
 xyz_string=element+(3-len(element))*' '+10*' '+\
 (8-len(x_str))*' '+x_str+10*' '+(8-len(y_str))*' '+y_str+10*' '+(8-len(z_str))*' '+z_str+'\n'
 
 return xyz_string

def writemorph(cmpds,outfile):
 g=open(outfile,'w') 

 for m in range(1,cmpds['structures']+1):
  g.write(str(cmpds['atoms_'+str(m)])+'\n')
  g.write(str(cmpds['title_'+str(m)])+'\n')
  for n in range(0,cmpds['atoms_'+str(m)]):
   coords=cmpds['coords_'+str(m)][n]
   g.write(genxyzstring(coords, cmpds['elements_'+str(m)][n]))
 g.close

 return 0

if __name__=="__main__":
 arguments=sys.argv[1:len(sys.argv)]
 switches=getvars(arguments)
 cmpds=getcmpds(switches)

#check that the structures are compatible
 for n in range(1,cmpds['structures'],2):

  if cmpds['atoms_'+str(n)]!=cmpds['atoms_'+str(n+2)]:
   print 'The number of atoms differ. Exiting'
   sys.exit(1)
  elif cmpds['elements_'+str(n)]!=cmpds['elements_'+str(n+2)]:
   print 'The types of atoms differ. Exiting'
   sys.exit(1)
  cmpds=morph(cmpds,n)
 success=writemorph(cmpds,switches['o'])
 if success==0:
  print 'Conversion seems successful'

3 comments:

  1. I have not used NEB before and I intend to use it in the next few weeks. It will be of great help for me (and possibly many others) if you can,
    1. Give a simple example with a simple xyz file of how the script works.
    2. Provide a little more information (with a modified script) to show how this script can be interfaced with nwchem or other quantum mechanical packages.

    ReplyDelete
    Replies
    1. 1. I've added a bit more information. Hope it helps.
      2. See 1. You don't call the script from nwchem, but you can use it to generate a set of structures for nwchem.

      Note that if you start with two (or three) structures it's no different from what nwchem does when you give a start geometry and an 'endgeom' (and a 'midgeom') (see http://www.nwchem-sw.org/index.php/Release62:Nudged_Elastic_Band_%28NEB%29_and_Zero_Temperature_String_Methods#Setting_up_initial_path) in the sense that both my script and nwchem simply interpolates between the xyz coordinates for the atoms at the beginning and end of the reaction.

      I'm planning on writing a smarter script that takes the cartesian coordinates but generates the intermediate structures using z-matrices. The advantage with that is that it will deal with rotations better as it works with bond lengths, angles, dihedrals, and not simply atom positions.

      But it will take a bit of time...

      Delete
    2. Thank you so much. The example makes it much more clear. I look forward to your z matrix transformed script

      Delete