NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: [netcdfgroup] storing sparse matrices data in NetCDF

Hi Sourish, Gus, and Elizabeth,

Thank you all for your suggestions. I think I've found something that works, 
except for one issue. Please excuse my likely incorrect use of terminology - 
being new to NetCDF creation I may say something incorrect, but I hope the data 
dump below speaks for itself.

Because my data is 2D (time, ID), then those are the dimensions, and 
lon,lat,x,y become variables on the ID dimension. This means my standard netcdf 
tools for slicing based on spatial dimension don't work. For example,

cdo sellonlatbox,83.5,85,-27,-28 ds.nc bar.nc

or

ncks -d lat,83.5,85 -d lon,-27,-28 ds.nc bar.nc
# ncks: ERROR dimension lat is not in input file

Is there a way to make the data 2D but have the 2nd dimension be (lon,lat)? 
Even if yes, I don't imagine the cdo and ncks tools would work on that 
dimension... Is there a cdo, nco, or ncks (or other) simple tool I'm missing 
that can work with this non-gridded data the way those tools do so easily work 
with gridded data?


Anway, here is the Python xarray code I got working to produce the NetCDF file, 
reading in the 'foo.csv' from my previous email and generating ds.nc. Once I 
understood the NetCDF structure from the file Sourish provided, I was able to 
generate something similar using a higher level API - one that takes care of 
time units, calendar, etc. I leave out (x,y,elev) for brevity.


  -k.



df = pd.read_csv('foo.csv', index_col=0, header=[0,1,2,3,4,5])
df.index = pd.to_datetime(df.index)

# Build the dataset
ds = xr.Dataset()
ds['lon'] = (('ID'), df.columns.get_level_values('lon'))
ds['lat'] = (('ID'), df.columns.get_level_values('lat'))
ds['runoff'] = (('time', 'ID'), df.values)
ds['ID'] = df.columns.get_level_values('ID')
ds['time'] = df.index

# Add metadata
ds['lon'].attrs['units'] = 'Degrees East'
ds['lon'].attrs['long_name'] = 'Longitude'
ds['lat'].attrs['units'] = 'Degrees North'
ds['lat'].attrs['long_name'] = 'Latitude'
ds['runoff'].attrs['units'] = 'm^3/day'
ds['ID'].attrs['long_name'] = 'Basin ID'

ds.to_netcdf('ds.nc')




And here is the ncdump of the file





netcdf ds {
dimensions:
        ID = 10 ;
        time = 5 ;
variables:
        string lon(ID) ;
                lon:units = "Degrees East" ;
                lon:long_name = "Longitude" ;
        string lat(ID) ;
                lat:units = "Degrees North" ;
                lat:long_name = "Latitude" ;
        double runoff(time, ID) ;
                runoff:_FillValue = NaN ;
                runoff:units = "m^3/day" ;
                runoff:long_name = "RACMO runoff" ;
        string ID(ID) ;
                ID:long_name = "Basin ID" ;
        int64 time(time) ;
                time:units = "days since 1980-01-01 00:00:00" ;
                time:calendar = "proleptic_gregorian" ;

// global attributes:
                :Creator = "Ken Mankoff" ;
                :Contact = "kdm@xxxxxxx" ;
                :Institution = "GEUS" ;
                :Version = 0.1 ;
data:

 lon = "-27.983", "-27.927", "-27.894", "-28.065", "-28.093", "-28.106", 
    "-28.155", "-27.807", "-27.455", "-27.914" ;

 lat = "83.505", "83.503", "83.501", "83.502", "83.501", "83.499", "83.498", 
    "83.485", "83.471", "83.485" ;

 runoff =
  0.023, 0.01, 0.023, 0.005, 0, 0, 0, 0, 0, 0,
  0.023, 0.01, 0.023, 0.005, 0, 0, 0, 0, 0, 0,
  0.024, 0.013, 0.023, 0.005, 0, 0, 0, 0, 0, 0,
  0.025, 0.012, 0.023, 0.005, 0, 42, 0, 0, 0, 0,
  0.023, 0.005, 0.023, 0.005, 0, 0, 0, 0, 0, 0 ;

 ID = "1", "2", "5", "8", "9", "10", "12", "13", "15", "16" ;

 time = 0, 1, 2, 3, 4 ;
}