NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Hi Ed, > Quincey Koziol <koziol@xxxxxxxxxxxxx> writes: > > > From HDF5's perspective, you have to use H5Pset_fapl_<foo>(params) to > > choose to use a particular file driver to access a file. Probably something > > like this should be exported/translated out to the netCDF4 layer for users > > to > > choose which driver to access the file with. > > Here's the URL for the parallel HDF5 info currently: > > http://hdf.ncsa.uiuc.edu/HDF5/PHDF5/ > > I'm seeing three steps to parallel HDF5: > > 1 - Initialize MPI > 2 - When opening/creating the file, set a property in file access > properties. > 3 - Every time reading or writing file, pass a correctly set transfer > property. I'm assuming you mean reading/writing "raw" data. > Does that seem to sum it up? That's some of it. You also have to make certain that the functions listed below are called correctly. > But I see below that you are also asking that "these properties must > be set to the same values when they > are used in a parallel program," > > What do you mean by that? You can't have half the processes set a property to one value and the other half set the same property to a different value. (i.e. everybody must agree that the userblock is 512 bytes, for example :-) > In parallel I/O do multiple processes try and create the file? Or does > one create it, and the rest just open it? Sorry if that seems like a > dumb question! In MPI-I/O, file creation is a collective operation, so all the processes participate in the create (from our perspective at least, I don't know how it happens internally in the MPI-I/O library). You are going to have fun learning how to do parallel programming with MPI - think of it as multi-threaded programs with bad debugging support... :-/ Quincey > > > For reading, what does this mean to the API, if anything? > > Well, I've appended a list of HDF5 API functions that are required to be > > performed collectively to the bottom of this document (I can't find the link > > on our web-pages). > > > > > Everyone gets to open the file read-only, and read from it to their > > > heart's content, confident that they are getting the most recent data > > > at that moment. That requires no API changes. > > > > > > Is that it for readers? Or do they get some special additional > > > features, like notification of data arrival, etc? > > User's would also need the option to choose to use collective or > > independent I/O when reading or writing data to the file. That reminds me - > > are y'all planning on adding any wrappers to the H5P* routines in HDF5 which > > set/get various properties for objects? > > This is truly an important question that I will treat in it's own > email thread... > > > > > > Quincey > > > > ============================================================== > > > > Collective functions: > > H5Aclose (2) > > H5Acreate > > H5Adelete > > H5Aiterate > > H5Aopen_idx > > H5Aopen_name > > H5Aread (6) > > H5Arename (A) > > H5Awrite (3) > > > > H5Dclose (2) > > H5Dcreate > > H5Dfill (6) (A) > > H5Dopen > > H5Dextend (5) > > H5Dset_extent (5) (A) > > > > H5Fclose (1) > > H5Fcreate > > H5Fflush > > H5Fmount > > H5Fopen > > H5Funmount > > > > H5Gclose (2) > > H5Gcreate > > H5Giterate > > H5Glink > > H5Glink2 (A) > > H5Gmove > > H5Gmove2 (A) > > H5Gopen > > H5Gset_comment > > H5Gunlink > > > > H5Idec_ref (7) (A) > > H5Iget_file_id (B) > > H5Iinc_ref (7) (A) > > > > H5Pget_fill_value (6) > > > > H5Rcreate > > H5Rdereference > > > > H5Tclose (4) > > H5Tcommit > > H5Topen > > > > Additionally, these properties must be set to the same values when they > > are used in a parallel program: > > File Creation Properties: > > H5Pset_userblock > > H5Pset_sizes > > H5Pset_sym_k > > H5Pset_istore_k > > > > File Access Properties: > > H5Pset_fapl_mpio > > H5Pset_meta_block_size > > H5Pset_small_data_block_size > > H5Pset_alignment > > H5Pset_cache > > H5Pset_gc_references > > > > Dataset Creation Properties: > > H5Pset_layout > > H5Pset_chunk > > H5Pset_fill_value > > H5Pset_deflate > > H5Pset_shuffle > > > > Dataset Access Properties: > > H5Pset_buffer > > H5Pset_preserve > > H5Pset_hyper_cache > > H5Pset_btree_ratios > > H5Pset_dxpl_mpio > > > > Notes: > > (1) - All the processes must participate only if this is the last > > reference to the file ID. > > (2) - All the processes must participate only if all the file IDs > > for > > a file have been closed and this is the last outstanding object > > ID. > > (3) - Because the raw data for an attribute is cached locally, all > > processes must participate in order to guarantee that future > > H5Aread calls return the correct results on all processes. > > (4) - All processes must participate only if the datatype is for a > > committed datatype, all the file IDs for the file have been > > closed > > and this is the last outstanding object ID. > > (5) - All processes must participate only if the number of chunks in > > the dataset actually changes. > > (6) - All processes must participate only if the datatype of the > > attribute a a variable-length datatype (sequence or string). > > (7) - This function may be called independently if the object ID > > does > > not refer to an object that was collectively opened. > > > > (A) - Available only in v1.6 or later versions of the library. > > (B) - Available only in v1.7 or later versions of the library. >
netcdf-hdf
archives: