NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
In the latest release of VisAD, I've added a TextFile adapter. With very helpful input from Don Murray, I've focused this on reading tab-, comma-, semicolon- and blank-separated files, with a minimum of editting (I hope ;-). In addition to the generic constructor, I've also added a signature that allows you to define the metadata as parameters, rather than having it read from the text file. Here is the README.text that is contained in the source code: The VisAD TextAdapter January, 2001 The VisAD TextAdapter is designed to allow you to quickly read in data that are in the form of an ASCII text file. We fully expect this class to continue to grow to accommodate other, common variations of text file formats that might be encountered. Two example files are also contained in the release. It is most convenient to test these using the VisAD Spreadsheet or the Jython (Python) interface. Fire up either the visad.python.JPythonFrame, or simply start it from the command line and use a sequence like: >>> from visad.python.JPythonMethods import * >>> a = load("example1.txt") >>> plot(a) >>> clearplot() >>> b = load("example2.csv") >>> plot(b) >>> clearplot() Or simply load these files into the SpreadSheet, and then experiment with the mappings! I. Text file formats The text files usually consist of 2 header lines and then data. Optional comment lines may be interspersed throughout. The data portion of the file may be either blank-, comma-, semicolon- or tab-separated values. At present only numeric data can be read. The model for these files is spreadsheet (Excel, etc) output -- that is, column-oriented values. Comment lines are any line that starts with either #, !, or %. The file extensions recognized by the VisAD DefaultFamily and the TextAdapter: .bsv -> blank-separated values .tsv -> tab-separated values .csv -> comma-separated values In addition, the VisAD DefaultFamily will recognize the extension .txt and invoke TextAdapter. In this case, however, the TextAdapter attempts to sense the delimiter using the hierarchy: tab, semicolon, comma, blank That is, if a tab character appears in the line, tab will be used. If not, then it looks for a semicolon. Otherwise, if a comma appears in the line it will be used. If neither a tab, semicolon, or comma appears, then blank will be used. We tried to keep the amount of modification you might have to make to existing files to a minimum. The general layout is: Line 1: functional description of the data in "VisAD" lingo (aka the "MathType") Line 2: column headers, which name each parameter and possibly give them a physical unit, using delimiters defined above. Line 3-n: the data values (with delimiters as define above; for filenames without recognized extensions, the delimiter used in this data section does not have to be the same as the one used on Line 2) Please refer to the VisAD Library Developers Guide, section "3.1 MathTypes" for information on how to define the functional description. Also, take a look to the examples at the end of this file. Also, please note that if you are using the TextAdapter constructor directly, the "Line 1" and "Line 2" values do not have to be in the file - alternate signatures allow these values to be passed as arguments. However, if your text file is used through the VisAD DefaultFamily (as in the SpreadSheet), you must provide the information right in the text file. II. Line 1 (ignoring any preceding comment lines...) This line specifies a functional description of the data, using the VisAD "MathType" string. There are two categories of data that may be represented in these text of files: 1) 2-D arrays of a single parameter, or 2) 1,2,or 3-D (domain) points of one or more parameters. IIa. 2-D arrays In this case, the "VisAD" functional description looks like: (x,y)->(temperature) (Longitude, Latitude)->(speed) And the data portion of the file contains "x" values per line, and "y" lines of data. See Examples #2 and Example #6, below. Only the "y" domain component may have its values defined in the file. 2-D arrays are implied when: * there are 2 domain components * there is only one range component * there is more than one domain sampling value for the first domain component (that is, more than one data value on a line in the text file) IIb. 1,2,or 3-D points Just about every other form of a text file falls into this category. Examples of VisAD functional descriptions: (x)->(temperature, dewpoint, speed) (x,y,z)->(temperature, speed) At least one of the domain variables (x,y,z) _must_ be defined by data in the file. See Examples #1, #3, #4 and #5, below. III. Line 2 (ignoring any comment lines that might come before) The second line of the text files defines which column of the data portion contains what parameters. (Note that, as with the "Line 1", an alternate form of the constructor is available so this information can be passed as an argument rather than being read from the file.) If you have other information that you need to specify for a parameter, you should use a blank-separated sequence of phrases in the form "key=value", to specify what you need. Here are the possible keys: key value ---- --------------- unit name of Unit (default = no unit) miss value to be treated as missing (default = no missing values) scale value that each datum is multiplied by (default = 1.0) offset value that is added to each scaled datum (default = 0.0) error value of the estimated error for this parameter (default = none) ** In this release, only range error estimates are implemented. interval either 'true' or 'false' to indicate that this parameter is an _interval_, like a difference (default = false) ** The following is _not_ implemented in this release: pos column-oriented location of the data values in the form first:last (if present for one item, this MUST be supplied for all items!) Two short examples: a, b, c, temperature[unit=degC err=.1 miss=999.9], speed[m/s] Longitude[scale=-1], Latitude, temp[unit=degK miss=999.9], dewPoint[unit=degK miss=999.9] (Please note in the second example, the "scale=-1" for the Longitude serves to invert the sign of the values read from the file). (You might also note that "C" is not the VisAD unit name for degrees Celcius..."C" means Coulomb...) As with Line 1, there are two cases to consider when defining the contents of this Line: IIIa. 2-D arrays In this case, _only_ one range parameter is permitted. You may have 0 or 1 domain parameter names, as well as any "column skip" dummy names (these are only permitted _before_ the actual range parameter name, though). For this simplest example: temperature[unit=degF] Says that the data contains only values of temperature in degrees F. The domain parameters defined in Line 1 will be computed based on the number of items per line and the number of lines of text. If you need to skip some columns before the values of the variable start, just put in a "skip" name for each column. For example: skip, skip, skip, temperature[unit=degF] indicates that the values in the first three columns should be skipped, and the rest of the values on the line will make up the "columns" of the 2-D array. The name "skip" can be _any_ unused name. IIIb. 1,2,or 3-D points In this case, you define which column corresponds to which parameter you named on Line 1. The order doesn't matter, only that the correct column is identified. If there are columns of data that are to be ignored, just use a "skip" name that was not defined on Line 1. For example: x, y, skip, temperature[unit=degC], skip, pressure[unit=hPa] In this case, the name "skip" is a filler to indicate what column(s) should be skipped. In both cases, you may also use this line of text to define the values of the domain component samplings in the form: name(first:last) This means that a) "name" is a domain component, and... b) the (sampling) values of "name" are NOT read from the file, but are computed based on the range "first to last" and the number of lines of text (number of samples), or (in the case of 2-D arrays) possibly the number of range values on the line (see below), and.... c) this name is ignored for the purposes of counting/locating, columns for other parameters. If the name of a domain component variable does NOT appear on Line 2, it's values are assumed to be 0:(N-1) where "N" is the number of samples (number of lines) in the file. There is one exception to this: in the case of 2-D arrays, the first domain component is assumed to apply to the number of range values on each line of text, _not_ the number of "samples" or lines of text. Finally, if you need to combine the range with other information about the parameter, it would look like: x(1.0:13.7)[unit=cm] IV. Examples Here are a few examples taken from the beginning of some files: Example #1 - Simple CSV file, for a function value=f(x) <= (x)->(value) value 0 7.2 -9.1 Example #2 - CSV file of a 2-D array <= (x,y)->(value) skip,skip,skip,skip,value 0 , 0 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 1 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 , 98 Example #3 - a ".txt" file of two range components (note that the delimiters used on "Line 2" are different that the ones used in the data) <= (x)->(value1, value2) skip value1 skip skip skip skip value2 100 , 0 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 101 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 , 98 Example #4 - CSV file of two range components located at 2-d coordinates <= (x,y)->(value_a, value_b) y,x,p,value_a[unit=degC],p,p,p,value_b[unit=degF] 0 , 0 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 1 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 , 98 Example #5 - BSV file of real data <= % Retrieval statistics for mlw_K+ir3_2a.ret : % Zbottom threshold = 0.0 km liqclouds=1 % IWP IWP errors (dB) Dme errors (dB) % (g/m^2) mean rms median mean rms median (IPW)->(IWP_Error, Dme_error, IWP_Error_mean, Dme_error_mean) IPW[g/m^2] IWP_Error_mean p IWP_Error Dme_error_mean p Dme_error 1.41 7.092 7.890 6.831 0.746 1.768 1.139 717 Example #6 - BSV file of a 2D grid, with the locations given Lat/Lon values <= (Longitude,Latitude)->(value) Longitude(-130:-40) Latitude(20:60) p value[unit=degC] 0 0 17 34 50 64 76 86 93 98 99 1 17 34 50 64 76 86 93 98 99 98 -- Tom Whittaker (tomw@xxxxxxxxxxxxx) University of Wisconsin-Madison Space Science and Engineering Center Phone/VoiceMail: 608/262-2759 Fax: 608/262-5974
visad
archives: