NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: [netcdfgroup] Dimensions IDs



Thanks, Russ.

I thought I had my problem solved by in fact defining a list of "unique" dimension IDs, where the "key" it's the ID, but now I found a test case where I generated a netCDF file that has *duplicated* dimension IDs !

So, I need some additional insight into this, please...



ncks is both a "display only" program like ncdump , but also a selective copy program, like nccopy, that is, it can generate a new file from selected variables.

Here's the test case, partial CDL syntax

I generated a command that selects the following variables to the output file


netcdf in_grp {   //root
dimensions:
 time=unlimited;
 lev=3;
 vrt_nbr=2;
variables:
 float ilev(lev,vrt_nbr);
  //coordinate variable (/lev)
 float lev(lev);
 //coordinate variable (/time)
 double time(time);

group: g8 {
 dimensions:
   lev=3;
   vrt_nbr=2;
 variables:
   //coordinate variable (/g8/lev)
   float lev(lev);
   //coordinate variable (/g8/vrt_nbr)
   float vrt_nbr(vrt_nbr);
   float ilev(lev,vrt_nbr);
 } // end g8

 group: g10 {
 variables:
   float two_dmn_rec_var(time,lev);
   }// end g10


This test case has 5 dimensions
3 in root
time=unlimited;
 lev=3;
 vrt_nbr=2;

and 2 dimensions  in group /g8

group: g8 {
 dimensions:
   lev=3;
   vrt_nbr=2;

and some variables that use the dimensions, some the "local" ones, some on the root


Note that the /g8 dimensions have the same relative name as the ones on the root




These variables are written the following way:

1) Obtain the dimension IDs for the variable

(void)nco_inq_vardimid(grp_in_id,var_in_id,dmn_in_id_var);

2) Loop dimensions for variable

for(int dmn_idx=0;dmn_idx<nbr_dmn_var;dmn_idx++){

3) I now defined a list for dimensions where the "key" is the unique dimension ID in the input file; this key returns an object that has all the information about the dimension (path, number of coordinate variables, etc)

for the case I only need the full name (path), this is in the *input* file, the above CDL

I need to know if that dimension name is defined for the output group (for simplicity, let's consider that the output group has the same location of the input)

In a netCDF3 case, all dimensions are in the same group, things can be done with

nc_inq_dimid(nc_id,dmn_nm,dmn_id);

, that is, simply inquire if the dimension exists

4) Since I am defining the output, I have to check if the group was created

/* Test existence of group and create if not existent */
   if(nco_inq_grp_full_ncid_flg(nc_out_id,grp_out_fll,&grp_dmn_out_id)){

5) then obtain its dimensions

/* Check output group (only) dimensions  */
   (void)nco_inq_dimids(grp_dmn_out_id,&nbr_dmn_out_grp,dmn_out_id_grp,0);

6) Loop group dimensions and match the *relative* name with the *relative* name of the variable

If a match , this tells me that the dimension was defined in that group, and I store the ID, to pass later to the variable definition

/* A relative name for variable and group exists for this group...the dimension is already defined */
     if(strcmp(dmn_nm_grp,dmn_nm) == 0){

/* Assign the defined ID to the dimension ID array for the variable */
       dmn_out_id[dmn_idx]=dmn_out_id_grp[dmn_idx_grp];



This works well... I do get *distinct* IDs if I print them following the above calls

this function returns me the dimension ID

/* Define dimension and obtain dimension ID */
       (void)nco_def_dim(grp_dmn_out_id,dmn_nm,dmn_sz,&dmn_id_out);



Here are the IDs for this case, as you can see they go from 0 to 4 ( 5 in total)


ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=0 index [0]:</time> with size=10 ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=1 index [1]:</lev> with size=3 ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=2 index [0]:</g8/lev> with size=3 ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=3 index [1]:</g8/vrt_nbr> with size=2 ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=4 index [1]:</vrt_nbr> with size=2


In the variable definition, I get the assignment I expect, for example </g10/two_dmn_rec_var> is defined with /time and /lev dimensions from root (IDs 0 and 1) , like in the input file

ncks: INFO nco_cpy_var_dfn() DEFINING variable </g10/two_dmn_rec_var> with dimension IDS = #0 #1


So, all seems OK, so far

But...when I try to read the generated file, all things go terribly wrong, because I do have duplicated IDs now in the generated file...

I changed all my model assuming that dimension IDS are unique... in this case, the output file, does *not* have unique IDs, so I get the wrong dimension while getting the variable's
data


here they are the dimensions in the ouput file, from my generated unique dimension list

the symbol # stands for ID, then full path starting with /, and dimension size in ()


(#0/time) record dimension(10)
(#1/lev) dimension(3)
(#4/vrt_nbr) dimension(2)
(#0/g8/lev) dimension(3)
(#1/g8/vrt_nbr) dimension(2)


Just to make sure, I went to see what ncdump was telling me about these IDs, by reading the generate file;

ncdump does not print dimension IDs, but I put this call in
line 1375 of ncdump.c

printf("#%d,",dimids_grp[d_grp]);
print_name(dims[d_grp].name);

in the loop where you print the dimension name

Sure enough, I do get the same ID numbers

netcdf out {
dimensions:
#0,time = UNLIMITED ; // (10 currently)
#1,lev = 3 ;
#4,vrt_nbr = 2 ;
variables:
float ilev(lev, vrt_nbr) ;
float lev(lev) ;
double time(time) ;

group: g10 {
 variables:
  float two_dmn_rec_var(time, lev) ;
 } // group g10

group: g8 {
 dimensions:
  #0,lev = 3 ;
  #1,vrt_nbr = 2 ;
 variables:
  float ilev(lev, vrt_nbr) ;
 } // group g8
}



ncdump is a "print on the fly" tool, that is , reads and prints things , as the groups are iterated, I do get the correct data in the generated file,
because IDs are not stored and used other than to read on the moment

But in my new ncks model, these dimension IDs are stored in the lists I mentioned before, from my previous comments:


The fact that group dimension IDs are in fact unique makes possible to match them with dimension IDs for variables...

But only if I have a list of

The full path of all dimensions for each variable

I already had this. I constructed my "path only" model by recursively iterating the file, starting at root, and for every group I store the current path passed as a parameter to the recursive function.

The API gets me all local info for variables, for the current group, including dimensions for variables and dimensions for groups.

The additional step is to store for each group, the dimension ID, and for every variable dimension, its ID.
Then match them.


So, why do I get these duplicated dimension IDs on the generated file ?


To note

All the above ID printout , from 0 to 4

ncks: INFO nco_cpy_var_dfn() defining dimension OUT_ID=0 index [0]:</time> with size=10


was done while in *define* mode

After that routine for defining groups and dimensions, that prints those values, define mode is ended and then the data for the variables written

Fom my understanding of the netCDF intermediate layer between the public API and the HDF5 layer, things like HDF5 datasets are not actually "defined" until the define mode is ended, for example , to allow to assemble the dataset with chunk, compression, etc..

Could it be that somehow I need to leave define mode (but where ?) (every *time* I define a new dimension ? ) so that those dimension IDs are "flushed" ?

Or, am I wrong in assuming that the dimension IDs are in fact "unique"? Can a netCDF4 have duplicated dimension IDs, yes, or no ?

If no, then the API should have complained somewhere on my file generation?


Thanks

Pedro



------
Pedro Vicente, Earth System Science
University of California, Irvine
http://www.ess.uci.edu/


----- Original Message ----- From: "Russ Rew" <russ@xxxxxxxxxxxxxxxx>
To: "Pedro Vicente" <pvicente@xxxxxxx>
Cc: <netcdfgroup@xxxxxxxxxxxxxxxx>
Sent: Monday, March 04, 2013 5:27 AM
Subject: Re: [netcdfgroup] How to find the full dimension names (pathswithgroups) for a variable?


Hi Pedro,

You're right that it would be useful to have additional public netCDF
functions to make it easy to get the absolute netCDF name from a
dimension ID and the reverse.

There is code for this in the source for ncdump and nccopy.  The ncdump
utility outputs the absolute dimension name when there is an ambiguity,
for example one of the test cases for ncdump outputs this variable
declaration for a case where a variable uses several dimensions named
"dim" in different groups (see ncdump/ref_tst_group_data.cdl):

   float var2(/dim, /g2/dim, dim) ;

The code for figuring out these names is in ncdump/ncdump.c, preceded by
this comment

/* Subtlety: The following code block is needed because
  * nc_inq_dimname() currently returns only a simple dimension
  * name, without a prefix identifying the group it came from.
  * That's OK unless the dimid identifies a dimension in an
  * ancestor group that has the same simple name as a
  * dimension in the current group (or some intermediate
  * group), in which case the simple name is ambiguous.  This
  * code tests for that case and provides an absolute dimname
  * only in the case where a simple name would be
  * ambiguous. */

The 20 or so subsequent lines of code that implement this should be
captured in a separate function, so other developers don't need to
rediscover how to do it with the current public API.

For the other direction, there is this function in ncdump/utils.c that
would also be useful in the public API:

 /* Missing functionality that should be in nc_inq_dimid(), to get
  * dimid from a full dimension path name that may include group
  * names */
 int
 nc_inq_dimid2(int ncid, const char *dimname, int *dimidp) {
   ...

We have some plans to provide API additions such as these for developers
of generic netCDF tools in a future version.  Thanks for pointing out
the need for these.

--Russ

This is a multi-part message in MIME format.

--===============0004655921==
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_0013_01CE1883.8BB35050"

This is a multi-part message in MIME format.

------=_NextPart_000_0013_01CE1883.8BB35050
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


ok, I think I found the solution...

The fact that group dimension IDs are in fact unique makes possible to =
match them with dimension IDs for variables...

But only if I have a list of

1) The full path of all dimensions in the file
2) The full path of all dimensions for each variable

I already had this. I constructed my "path only" model by recursively =
iterating the file, starting at root,=20
and for every group I store the current path passed as a parameter to =
the recursive function.

The API gets me all local info for variables, for the current group, =
including dimensions for variables and dimensions for groups.

The additional step is to store for each group, the dimension ID, and =
for every variable dimension, its ID.
Then match them.

So, I take back my comment that "IDs are a recipe for disaster", for =
dimensions they are actually the solution.

I was thinking more of variable IDs, that can have duplicated values for =
each group, somehow I missed this dimension ID issue.

Here's my output with this patch applied

ncks: INFO nco_bld_dmn_ids_trv() traversing variable =
</g16/g16g2/lon1_var>
match <8> for var dim </g16/lon1> and group dim </g16/lon1>

In summary

1) the API does not get me the full dimension path for each variable, =
but it's possible to construct them.
2) I don't need variable IDs and group IDs


Pedro



------
Pedro Vicente, Earth System Science
University of California, Irvine
http://www.ess.uci.edu/