NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Thomas, You wrote: > OK, some update on that one: I applied the workaround of compiling > dumplib.o with -O0. This makes `make check` (OK, in my case, `gmake > check` ... ) succeed, but the resulting ncdump is still broken. > > Again, two points: > 1. I suggest adding another test case, with the cdl file I am about to > paste. Thanks for this new test. As it's apparently stricter than the ncdump tests we have, we'll add it. > 2. I again would like to know if someone reported this to Sun. This > miscompilation is really a serious issue and should be addressed. I > will report it myself if there is noone giving notice... The user who reported and helped investigate this problem in early February also committed to reporting the bug to Sun. You can read about my unsuccessful attempts to isolate the bug to a smaller program than ncdump or to find a workaround that would not trigger the bug here: http://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg05358.html The details of the bug, as reported by that user, are: here's the solution with the Sun compiler: ncdump/dumplib.c must be compiled using -O0 explicitly, otherwise -O2 is used by default. By hand, just remove dumplib.o, add -O0 to CFLAGS in the Makefile (second occurence), and gmake . The depending programs are recompiled and the tests succeed. This seems to be an optimizer bug, I've checked the code produced, and it does not set the xmm0 register in the complicated version and breaks calling ABI for libc, whereas your simple code below shows that it is set as expected. The value printed is just a random value in xmm0 used for something before. I've just halted the code before entering snprintf and set xmm0 explicitly to the value, continued, and, voila, the value printed is correct ! The instructions generated are TOTALLY different, a symptom I've seen very often, just adding a line somewhere completely changes the generated code, which makes it really hard to track down such errors. I'll report this to Sun, maybe they've a better clue why this happens. I've been unable to determine that it got successfully logged as a Sun compiler bug. If we can't find it after a little more searching, we'll report it again. > Down to the mode of failure. I generate a test NetCDF file from this > CDL: > > netcdf bubble { > dimensions: > element = 1000 ; > variable = 1 ; > base = 1 ; > time = UNLIMITED ; // (0 currently) > variables: > double time(time) ; > double coefficient(time, element, variable, base) ; > > // global attributes: > :info = "Model state for the AWI DG model, ThOr breed." ; > :par_stringsize = 30 ; > :par_base_grades = 0, 0, 0 ; > :par_grid_elements = 10, 10, 10 ; > :par_hill_params = 0.01, 0.1, 0.1, 0.1 ; > :par_linad_speed = 1., 1., 1. ; > :par_oro_types = "null null > null" ; > :par_shallow_gravity = 1. ; > :par_sys_name = "linear advection" ; > :par_timeint_rksteps = 1 ; > :par_timeint_step = 0.1 ; > :par_trans_gradients = 2., 2., 2. ; > :par_trans_types = "linear linear > linear" ; > :par_world_dims = 3 ; > :par_world_lengths = 10., 10., 10. ; > data: > } > > > shell$ ncgen -o bubble.nc bubble.cdl > > Now I have a look at it with ncdump compiled with CFLAGS=-m64 overall, > but dumplib.o being built with CFLAGS='-O0 -m64' instead: > > shell$ ncdump bubble.nc > netcdf bubble { > dimensions: > element = 1000 ; > variable = 1 ; > base = 1 ; > time = UNLIMITED ; // (0 currently) > variables: > double time(time) ; > double coefficient(time, element, variable, base) ; > > // global attributes: > :info = "Model state for the AWI DG model, ThOr breed." ; > :par_stringsize = 30 ; > :par_base_grades = 0, 0, 0 ; > :par_grid_elements = 10, 10, 10 ; > :par_hill_params = 2.22044604925031e-16, 0.999999992549419, > 0.999999992549419, 0.999999992549419 ; > :par_linad_speed = 0.999999992549419, 0.999999992549419, > 0.999999992549419 ; > :par_oro_types = "null null > null" ; > :par_shallow_gravity = 0.999999992549419 ; > :par_sys_name = "linear advection" ; > :par_timeint_rksteps = 1 ; > :par_timeint_step = 0.999999992549419 ; > :par_trans_gradients = 0.999999992549419, 0.999999992549419, > 0.999999992549419 ; > :par_trans_types = "linear linear > linear" ; > :par_world_dims = 3 ; > :par_world_lengths = 0.999999992549419, 0.999999992549419, > 0.999999992549419 ; > data: > } > > That looks grossly wrong. Rebuilding everything inside the ncdump/ > directory with CFLAGS="-O0 -m64" results into a working ncdump binary, > output is identical to input CDL file. This is disturbing also as it > leads to the question if my application will be affected by the same > bug that harrasses ncdump when building with Sun Studio. Did really > nonone investigate the mode of breakage and why it apparently(?!) does > not affect other parts of NetCDF? Yes, we investigated to the point of determining that it was a compiler bug when compiling with -m64 for 64-bit environment, and we tried unsuccessfully to find a workaround other than using -O0 when compiling. > So... shall one start crying at Sun to fix their compiler on > Solaris/x86-64 with NetCDF or is there some hidden wisdom already that > I am not aware of? It would probably help if you could also report this bug. --Russ
netcdfgroup
archives: