NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: [netcdfgroup] unlimited dimension and chunking breaking in 4.3.1.1?

  • Subject: Re: [netcdfgroup] unlimited dimension and chunking breaking in 4.3.1.1?
  • From: Chris Barker <chris.barker@xxxxxxxx>
  • Date: Mon, 24 Mar 2014 15:20:42 -0700
On Mon, Mar 24, 2014 at 3:04 PM, Ben <Benjamin.M.Auer@xxxxxxxx> wrote:

> Unfortunately I still can't reproduce this outside of our model in a stand
> alone tester.
>

Darn -- that is a trick to debug.

I did notice on thing that struck me as peculiar. In my stand alone tester,
> when I run with 4.4.2.1 and create a netcdf 4 file with an unlimited
> dimension called time and make 1-D variable time using this dimension an
> ncdump shows this for the time variable:
>
>         int time(time) ;
>                 time:_Storage = "chunked" ;
>                 time:_ChunkSizes = 1 ;
>                 time:_Endianness = "little" ;
>
> with the same code and 4.3.2-rc1 I get this
>
>         int time(time) ;
>                 time:_Storage = "chunked" ;
>                 time:_ChunkSizes = 1048576 ;
>                 time:_Endianness = "little" ;
>

right -- that was a deliberate change -- unlimited dimensions must be
chunked, and the code for determining the default chunking was designed
with higher dimension variables in mind. The result was the chunk size of 1
you see in your first example -- a chunk size of 1 is a really really bad
idea. It's very inefficient  and requires a huge tree to keep track of all
the tiny little pieces of data. I discovered this when I was writing a big
file, and the result was crashes on one platform  and really large files
with really slow writing on others.

So the second version is, in fact, much better.

But, of course, it shouldn't crash in your other case, so there's a bug in
there somewhere...

-Chris




>
> in both cases it says this about the time dimension
>
>         time = UNLIMITED ; // (0 currently)
>
> In any case, I am sending the ncdump -sh output for a typical file we had
> no problem creating with 4.4.2.1
>
>
> On 03/05/2014 02:46 PM, Russ Rew wrote:
>
>> Hi Ben,
>>
>>  A couple of questions.  Your suggestion of setting the the chunking
>>> explicit doesn't seem like it would help as one first has to define the
>>> variable before you can set the chunk size and we are crashing when
>>> defining it unless I am missing something. We did notice that there are
>>> several code changes between 4.2.1 and 4.3.1.1 in the
>>> nc4_find_default_chunksizes2 in nc4var.c which is where we are crashing.
>>>
>>> In 4.2.1 if the dimension is unlimited the the chunksize is set to 1 and
>>> it looks like one would skip the code where we are crashing:
>>>
>>> suggested_size = (pow((double)DEFAULT_CHUNK_SIZE/(num_values *
>>> type_size),
>>>                   1/(double)(var->ndims - num_set)) * var->dim[d]->len -
>>> .5);
>>>
>>> In 4.3.1.1 the setting of the unlimited dimension chunksize to 1 was
>>> removed. I'm guessing the code previous to 4.3.1.1 that set the
>>> chunksize to 1 for unlimited dimensions was saving us. We did notice
>>> that the latest version of nc4var.c on github has extra code after the
>>> line in question for set chunksizes of 1-D record variables as well as a
>>> few other changes so I'm wondering if this is a bug fix?
>>>
>> I committed a change to make sure the statement above wouldn't divide by
>> zero if num_values is zero, but I'm not sure that was the cause of the
>> crash you encountered.
>>
>>  Unfortunately we have not been able to reproduce this in a small
>>> example program but has there been some change underneath the hood
>>> that that might have occurred that we should be taking a look at?
>>>
>> Could you please capture the output of "ncdump -sh yourfile.nc", for the
>> file you were able to create using netCDF version 4.2.1, and send it to
>> support-netcdf@xxxxxxxxxxxxxxxx?  With that, we might be able to
>> reproduce the problem using the ncgen utility.
>>
>> Thanks.
>>
>> --Russ
>>
>
>
> --
> Ben Auer, PhD   SSAI, Scientific Programmer/Analyst
> NASA GSFC,  Global Modeling and Assimilation Office
> Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD  20771
> Phone: 301-286-9176               Fax: 301-614-6246
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx