Page 1 of 1

internal error in: mkpoints_full.F at line: 1172 | internal error in SET_INDPW_FULL: insufficient memory

Posted: Fri Dec 06, 2024 5:38 pm
by pzguns

An oxide unit cell calculation (12 at.) with SCAN + 10% HF crashes due to insufficient memory at the outstart.

Technical details: vasp6.4.2; IntelXeon (40 cores); about 200 Gb RAM.

Parallelization: NPAR = 2; KPAR = 2

$ more nohup.out
running 40 mpi-ranks, on 1 nodes
distrk: each k-point on 20 cores, 2 groups
distr: one band on 10 cores, 2 groups
vasp.6.4.2 20Jul23 (build Jun 05 2024 18:57:13) complex

POSCAR found type information on POSCAR ZnW O
POSCAR found : 3 types and 12 ions
scaLAPACK will be used
LDA part: xc-table for Pade appr. of Perdew
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... GRIDC
FFT: planning ... GRID_SOFT
FFT: planning ... GRID
-----------------------------------------------------------------------------
| _ ____ _ _ _____ _ |
| | | | _ \ | | | | / ____| | | |
| | | | |_) | | | | | | | __ | | |
| |_| | _ < | | | | | | |_ | |_| |
| _ | |_) | | |__| | | |__| | _ |
| (_) |____/ \____/ \_____| (_) |
| |
| internal error in: mkpoints_full.F at line: 1172 |
| |
| internal error in SET_INDPW_FULL: insufficient memory (see wave.F |
| safeguard) 375 374 |
| |
| If you are not a developer, you should not encounter this problem. |
| Please submit a bug report. |
| |
-----------------------------------------------------------------------------


Re: internal error in: mkpoints_full.F at line: 1172 | internal error in SET_INDPW_FULL: insufficient memory

Posted: Mon Dec 09, 2024 11:44 am
by merzuk.kaltak

Dear pzguns,

the reason your job is crashing the low a value set for NPAR.
I suggest you remove the tag from the INCAR or you increase it to a reasonable value.
It seems you have 40 cores available, so NPAR >6 seems to be a reasonable setting.
I have tested NCORE=8 and NCORE=4 on a machine with 24 cores.
Here are the timings for an electronic step:

Code: Select all

OUTCAR.NPAR4:      LOOP:  cpu time      1.3897: real time      1.3947
OUTCAR.NPAR4:      LOOP:  cpu time      0.9160: real time      0.9211
OUTCAR.NPAR4:      LOOP:  cpu time      0.8861: real time      0.8897
OUTCAR.NPAR4:      LOOP:  cpu time      0.8886: real time      0.8920
OUTCAR.NPAR4:      LOOP:  cpu time     55.5923: real time     55.7984
OUTCAR.NPAR4:      LOOP:  cpu time     55.3798: real time     55.5852
OUTCAR.NPAR8:      LOOP:  cpu time      1.5465: real time      1.5530
OUTCAR.NPAR8:      LOOP:  cpu time      1.1081: real time      1.1128
OUTCAR.NPAR8:      LOOP:  cpu time      1.0886: real time      1.0930
OUTCAR.NPAR8:      LOOP:  cpu time      1.1591: real time      1.1636
OUTCAR.NPAR8:      LOOP:  cpu time     45.0355: real time     45.2358
OUTCAR.NPAR8:      LOOP:  cpu time     44.8045: real time     44.9897

It seems that NCORE=8 is slightly more efficient on my machine.
However, I suggest you do your own testing.

Unfortunately, the error message is quite misleading. We will try to change it in a future release.
Thank you for your bug report.