NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: [netcdfgroup] storing sparse matrices data in NetCDF

Hi Ken

That is true.
I suppose both CDO and NCO (ncks) assume the lat and lon are
monotonic (increasing or decreasing) coordinate variables, and that
runoff has (time,lat,lon) dimensions, not (time,ID).
ID is not a coordinate, it is just a label for your observation stations, I
guess.

You could devise a more elaborate scheme to define lat and lon dimensions,
then,
lat(lat) and lon(lon) coordinate variables, and from there create a 3D
runoff(time,lat,lon) variable.
There are several hurdles though:
1) The values of lat and lon in your CSV file may have repetitions (this
affects the actual lenght of each dimension, which may be <20000).
2) The values of lat and lon in your CSV file may not be monotonically
ordered (either in increasing or decreasing order).
I didn't spot any repetitions in the sample file you sent (but the full
file may have repetitions),
but the lat and lon are definitely not monotonically ordered,
they can go up, then down, then up again ...
Bona fide coordinate variables must be monotonic.
3) Even if you weed out repetitions in lat or lon, and sort them in
increasing or decreasing order,
you would have to exchange also the corresponding runoff values, so that
they continue to belong to the correct station/location/ID,
i.e. sort the whole file with (lat,lon) as primary and secondary keys.

Maybe Python has a sort routine that does all that for you gracefully, some
variant of qsort perhaps.

Gus


On Mon, Mar 18, 2019 at 6:50 PM Ken Mankoff <mankoff@xxxxxxxxx> wrote:

> Hi Sourish, Gus, and Elizabeth,
>
> Thank you all for your suggestions. I think I've found something that
> works, except for one issue. Please excuse my likely incorrect use of
> terminology - being new to NetCDF creation I may say something incorrect,
> but I hope the data dump below speaks for itself.
>
> Because my data is 2D (time, ID), then those are the dimensions, and
> lon,lat,x,y become variables on the ID dimension. This means my standard
> netcdf tools for slicing based on spatial dimension don't work. For example,
>
> cdo sellonlatbox,83.5,85,-27,-28 ds.nc bar.nc
>
> or
>
> ncks -d lat,83.5,85 -d lon,-27,-28 ds.nc bar.nc
> # ncks: ERROR dimension lat is not in input file
>
> Is there a way to make the data 2D but have the 2nd dimension be
> (lon,lat)? Even if yes, I don't imagine the cdo and ncks tools would work
> on that dimension... Is there a cdo, nco, or ncks (or other) simple tool
> I'm missing that can work with this non-gridded data the way those tools do
> so easily work with gridded data?
>
>
> Anway, here is the Python xarray code I got working to produce the NetCDF
> file, reading in the 'foo.csv' from my previous email and generating ds.nc.
> Once I understood the NetCDF structure from the file Sourish provided, I
> was able to generate something similar using a higher level API - one that
> takes care of time units, calendar, etc. I leave out (x,y,elev) for brevity.
>
>
>   -k.
>
>
>
> df = pd.read_csv('foo.csv', index_col=0, header=[0,1,2,3,4,5])
> df.index = pd.to_datetime(df.index)
>
> # Build the dataset
> ds = xr.Dataset()
> ds['lon'] = (('ID'), df.columns.get_level_values('lon'))
> ds['lat'] = (('ID'), df.columns.get_level_values('lat'))
> ds['runoff'] = (('time', 'ID'), df.values)
> ds['ID'] = df.columns.get_level_values('ID')
> ds['time'] = df.index
>
> # Add metadata
> ds['lon'].attrs['units'] = 'Degrees East'
> ds['lon'].attrs['long_name'] = 'Longitude'
> ds['lat'].attrs['units'] = 'Degrees North'
> ds['lat'].attrs['long_name'] = 'Latitude'
> ds['runoff'].attrs['units'] = 'm^3/day'
> ds['ID'].attrs['long_name'] = 'Basin ID'
>
> ds.to_netcdf('ds.nc')
>
>
>
>
> And here is the ncdump of the file
>
>
>
>
>
> netcdf ds {
> dimensions:
>         ID = 10 ;
>         time = 5 ;
> variables:
>         string lon(ID) ;
>                 lon:units = "Degrees East" ;
>                 lon:long_name = "Longitude" ;
>         string lat(ID) ;
>                 lat:units = "Degrees North" ;
>                 lat:long_name = "Latitude" ;
>         double runoff(time, ID) ;
>                 runoff:_FillValue = NaN ;
>                 runoff:units = "m^3/day" ;
>                 runoff:long_name = "RACMO runoff" ;
>         string ID(ID) ;
>                 ID:long_name = "Basin ID" ;
>         int64 time(time) ;
>                 time:units = "days since 1980-01-01 00:00:00" ;
>                 time:calendar = "proleptic_gregorian" ;
>
> // global attributes:
>                 :Creator = "Ken Mankoff" ;
>                 :Contact = "kdm@xxxxxxx" ;
>                 :Institution = "GEUS" ;
>                 :Version = 0.1 ;
> data:
>
>  lon = "-27.983", "-27.927", "-27.894", "-28.065", "-28.093", "-28.106",
>     "-28.155", "-27.807", "-27.455", "-27.914" ;
>
>  lat = "83.505", "83.503", "83.501", "83.502", "83.501", "83.499",
> "83.498",
>     "83.485", "83.471", "83.485" ;
>
>  runoff =
>   0.023, 0.01, 0.023, 0.005, 0, 0, 0, 0, 0, 0,
>   0.023, 0.01, 0.023, 0.005, 0, 0, 0, 0, 0, 0,
>   0.024, 0.013, 0.023, 0.005, 0, 0, 0, 0, 0, 0,
>   0.025, 0.012, 0.023, 0.005, 0, 42, 0, 0, 0, 0,
>   0.023, 0.005, 0.023, 0.005, 0, 0, 0, 0, 0, 0 ;
>
>  ID = "1", "2", "5", "8", "9", "10", "12", "13", "15", "16" ;
>
>  time = 0, 1, 2, 3, 4 ;
> }
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>