NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

[netcdfgroup] 4.7.4 failing on multiple nodes

Hi,

I thought I'd give netcdf 4.7.4 a try for the compression in parallel IO (using 
hdf5 1.10.7, pnetcdf 1.9.0, netcdf-fortran-4.5.3) on a NOAA cluster. I've been 
using intel 19 with mvapich2.3, which worked fine with earlier versions 
(4.3.something). So the problem I have is that it works fine on a single node, 
but get various failures when trying to run a job that uses 2 or more nodes. It 
also fails if the IO is not parallel (standard netcdf-4 where each process 
writes its data in turn).

I have also compiled everything (including cloud model code) using Intel MPI, 
which fails promptly with a seg fault when it tries to run on 2 nodes. (Here, I 
am comparing 4 or 9 threads on a single node or 16 threads split on 2 nodes. If 
I force the 16 thread version to run on a single node, it runs fine.)

The problem seems to be reproducible with a simple write/read test adapted from 
ftst_parallel.F, so it is seems not specific to my model code. Fails with both 
pnetcdf and mpiio

Any ideas what could be the issue here? I am stumped.

-- Ted



  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: