NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdf-java] Bug (concurrency issue?) when reading NCML aggregations

so you are running a model which outputs 50-70 files that belong to a
single "run".

do you put each run in a seperate directory?

are you overwriting the files?

On Tue, Nov 17, 2015 at 1:34 PM, Clifford Harms <clifford.harms@xxxxxxxxx>
wrote:

> The data I attached is for a test case in a scenario I am trying to
> handle. I have several thousand netcdfs (some CF, some not), most of which
> are the same logical dataset broken up via a time or Z axis into datasets
> consisting of 30-50 files, which I must aggregate into a single 'logical'
> dataset (I believe this is a fairly common use case). These files are
> updated daily, but due to the amount of data involved as well as other
> environmental factors, these updates happen sporadically over a span of
> about 24 hours.
>
> So what I am trying to do here is, as the files of an aggregated dataset
> are slowly updated with newer versions of the same file, add those new
> versions to the aggregated datasets that they belong to but ensuring that
> the new data can be differentiated within the aggregation via its data
> creation time (be it a model run time or production time or whatever). This
> is where the joining of files with the joinNew dimension comes in (in this
> example, 'runtime'), as the data creation time does not exist in the
> datasets as a coordinate variable, and in some cases is not even indicated
> in global attribution.
>
> Ultimately, once all of the files for an aggregated dataset have been
> updated, the aggregation contains files that all have the same data
> creation or run time, until the next update starts.
>
> You seem to be indicating that I cannot perform a 'joinNew' aggregation
> between datasets that have coordinate variables with different sizes? If
> that is the case, and I missed it in the documentation somewhere, then what
> about aggregating the files with a joinNew first, and then aggregating
> those aggregations as 'joinExisting' along time/Z axis?
>
> There still is the issue, though, of the random behavior (an exception for
> some reads, for other reads an array of values) which indicates a
> concurrency problem. If the read worked consistently, instead of only half
> of the time, that would still be useful to me as my code could easily
> determine which values in the returned array were valid.
> At any rate, thanks for responding so quickly
>
> On Sat, Nov 14, 2015 at 5:35 PM, John Caron <jcaron1129@xxxxxxxxx> wrote:
>
>> Hi Clifford:
>>
>>   <aggregation type="joinNew" dimName="runtime">
>>     <netcdf  coordValue="0" location="ncom-relo-mayport_u_miw-t000.nc"/>
>>     <netcdf coordValue="24">
>>       <aggregation type="joinExisting" dimName="time">
>>         <netcdf location="ncom-relo-mayport_26_u_miw-t001.nc"/>
>>         <netcdf location="ncom-relo-mayport_26_u_miw-t000.nc"/>
>>       </aggregation>
>>     </netcdf>
>>
>> ncom-relo-mayport_u_miw-t000.nc only has 1 time coordinate, but the
>> inner aggregation has 2, so these are not homogeneous in the sense that
>> Ncml aggregation requires.
>>
>> could you explain more what you are trying to do?
>>
>> John
>>
>>
>> On Fri, Nov 13, 2015 at 11:24 PM, Clifford Harms <
>> clifford.harms@xxxxxxxxx> wrote:
>>
>>> I've posted the report, sample data, sample xml, and sample code on
>>> github -> https://github.com/Unidata/thredds/issues/276
>>>
>>>
>>> --
>>> Clifford M. Harms
>>>
>>> _______________________________________________
>>> netcdf-java mailing list
>>> netcdf-java@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>
>>
>>
>
>
> --
> Clifford M. Harms
>
  • 2015 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: