Hi Constantine,
> My profiling results show that NetCDF Classic is very slow if the  
> following output scheme is used:
> 
> // case 1
> for var in variables
>    define var
>    write var
> endfor
Rob Latham is exactly right about case 1, including the recommendation
to use the "underbar underbar" function nc__endef() to reserve extra
space in the header to optimize for this case if you want netCDF-3
files.
> or even if something like this is done (assuming that all the  
> variables are defined already and they depend on an unlimited  
> dimension):
> 
> // case 2
> for var in variables
>    append var
> endfor
> 
> It seems to me that case 1 is slow because NetCDF (Classic) keeps the  
> file header as small as possible (Section 4 of the NetCDF User's Guide  
> is perfectly clear about this). Case 2, on the other hand, seems to be  
> slow because (please correct me if I'm wrong) variables are stored  
> contiguously. (In other words: if variables A and B are defined in  
> this order, then appending X bytes to A requires moving B over by X  
> bytes.)
No, that's not the case.  The data for record variables (those that use
an unlimited dimension) is interlaced by the unlimited dimension, so
that appending data for the nth record of all record variables is
efficient, especially if you append the variables in the same order in
which they were defined.  In this case appending data is just sequential
I/O, except that the number of records in the header must also be
updated once when a new record is first written.
You may be seeing what you think is an inefficiency because all the fill
values for a record are written the first time the file is extended to
contain that record, unless you have "no-fill mode" set.  Hence all the
record values are typically written twice, once to fill the record with
fill values of the appropriate type for each variable, and a second time
when a data value overwrites the associated fill value.
If you know you will always write all the values in a record, you can
set no-fill mode before writing, to eliminate the overhead of writing
fill values.
> My question is:
> 
> How does NetCDF-4 compare to NetCDF Classic in this regard? Would  
> switching to it improve write performance? (This is two questions,  
> really: I'm interested in cases 1 and 2 separately.)
For case 1, netCDF-4 supports efficient addition of new variables, with
no necessity to move data around to make more space in the "header",
because there is no single contiguous header that stores all the
metadata.  Instead, it's distributed throughout the file.  So either use
nc__enddef() to reserve extra space in the header of netCDF-3 files, or
use netCDF-4 if the software to access the data has been upgraded to
netCDF-4.
For case 2, netCDF-4 is no more efficient than netCDF-3, but it's more
flexible, because it supports multiple unlimited dimensions.
--Russ