G0W0 crashes on frontera supercomputer

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
nicholas_dimakis1
Newbie
Newbie
Posts: 28
Joined: Tue Sep 15, 2020 3:36 pm

G0W0 crashes on frontera supercomputer

#1 Post by nicholas_dimakis1 » Thu Feb 20, 2025 6:05 pm

Hello

I am trying to run G0W0 on a 2x2 MoS2 monolayer. The POSCAR is shown below:

Code: Select all

Mo2 S4                                  
1.00000000000000
6.3806309700000003 0.0000000000000000 0.0000000000000000 -3.1903140134000001 5.5257893618000002 0.0000000000000000 0.0000000000000000 0.0000000000000000 14.8790035247999999 Mo S 4 8 Direct 0.1666665429999981 0.3333335530000028 0.2500000080000007 0.1666665429999981 0.8333335530000028 0.2500000080000007 0.6666665429999981 0.3333335530000028 0.2500000080000007 0.6666665429999981 0.8333335530000028 0.2500000080000007 0.3333335680000005 0.1666665050000020 0.3549265529975554 0.3333335680000005 0.6666665050000020 0.3549265529975554 0.8333335680000005 0.1666665050000020 0.3549265529975554 0.8333335680000005 0.6666665050000020 0.3549265529975554 0.3333335680000005 0.1666665050000020 0.1450734630024459 0.3333335680000005 0.6666665050000020 0.1450734630024459 0.8333335680000005 0.1666665050000020 0.1450734630024459 0.8333335680000005 0.6666665050000020 0.1450734630024459

I do scf followed by EXACT, and finally the GW run. The INCAR for the GW is given below

Code: Select all

SYSTEM  = MoS2

#NCORE  = 4
KPAR   = 2

ENCUT   = 500
#IBRION  = -1

ISMEAR  = 0
SIGMA   = 0.01
#NBANDS  = 96
NEDOS   = 3000

#LOPTICS = TRUE
#ALGO =EXACT
#NELM =1
NBANDS= 300
#ISIF = 2 ; IBRION = 2; NSW = 100


ALGO = EVGW0
NELMGW = 1
NOMEGA = 50;


PREC    = Single
EDIFF   = 1.e-8
LREAL   = Auto
LASPH   = True

The grep memory OUTCAR gives the following information

Code: Select all

 total amount of memory used by VASP MPI-rank0    59275. kBytes
 available memory per node:   22.90 GB, setting MAXMEM to   23450
 files read and symmetry switched off, memory is now:
 total amount of memory used by VASP MPI-rank0   617433. kBytes
 min. memory requirement per mpi rank  21762.4 MB, per node 152336.9 MB
 all allocation done, memory is now:
 total amount of memory used by VASP MPI-rank0 22734744. kBytes

I am running this job on the TACC Frontera supercomputer using 20 nodes and keeping the number of CPUs low

#SBATCH -n 140
#SBATCH -N 20

Each node has 192 Gb of RAM, and thus, the total RAM is about 3.8 Tb—however, GW crashes.

I have similar crashes using the Lonestar 6 supercomputer.

Thank you-Nick

You do not have the required permissions to view the files attached to this post.
Last edited by manuel_engel1 on Fri Feb 21, 2025 8:54 am, edited 1 time in total.
Reason: Put input/output into code blocks for improved readability

manuel_engel1
Global Moderator
Global Moderator
Posts: 188
Joined: Mon May 08, 2023 4:08 pm

Re: G0W0 crashes on frontera supercomputer

#2 Post by manuel_engel1 » Fri Feb 21, 2025 10:27 am

Hello Nick,

Thanks for reaching out. I suspect that the crash is due to insufficient memory, but it's not 100% clear yet to me. The first thing I would like you to try is to lower the memory requirements. This will tell us if the crash is really due to memory constraints or not.
Here are a few things you can try to reduce memory consumption:

  • Lower ENCUTGW. This should drastically reduce the required memory.

  • Reduce the number of cores per node even further.

  • Reduce the number of bands in your calculation.

If the crash is due to insufficient memory, you will have to look for ways to converge your calculation in the given memory constraints.

One more thing that might be worth considering is the load distribution across the different nodes. With

Code: Select all

#SBATCH -n 140
#SBATCH -N 20

Slurm will try to allocate the cores evenly across the nodes. However, it is probably a good idea to check if this is actually the case. A load imbalance could easily cause your calculation to run out of memory. I recommend to start the job using

Code: Select all

#SBATCH --nodes=20
#SBATCH --ntasks-per-node=7

which is more explicit about the placement of cores/tasks across nodes.

Let me know how it goes.

Kind regards

Manuel
VASP developer

nicholas_dimakis1
Newbie
Newbie
Posts: 28
Joined: Tue Sep 15, 2020 3:36 pm

Re: G0W0 crashes on frontera supercomputer

#3 Post by nicholas_dimakis1 » Fri Feb 21, 2025 3:56 pm

Thank you very much for your help. I used ENCUT =400, and it now works.

Nick


Post Reply