NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Baudouin, Sorry for the delay in responding... re: > are spaces handled in the regex? Yes. A space is no different than any other character. > We push products with: > pqinsert -p "$name $rand" $filename > > where $rand is a random number so NCAR can control the number of stream > with 20 rules like (there is a space after "grib"): I should have commented on this earlier. We think that it would be better to use a sequence number instead of a random number. The sequence number could be used to segment the stream into pieces and as a tracer that demonstrates that every product inserted into the upstream queue (your machine) is received by the downstream machine (dataportal). This is the approach we took in our CONDUIT datastream, and it has been quite successful. The only thing that you would need to worry about is when to reset the counter. In CONDUIT, the products inserted in the queue come from the output of a model at a particular timestep. A program written to understand the model output being producted carves up the output file (which contains all fields for the timestep) into individual GRIB or GRIB2 messages (depending on what the model output), records the products into a manifest file (Manuel asked about this previously), and then inserts them into the LDM queue using information about the product (e.g., parameter, level time, forecast time, etc.) and a monotonically increasing sequence number as part of the product header. The descriptive header can then be used by downstream machines to select tailored subsets of the stream. In the case or TIGGE, we recommend that the downstream machine (dataportal) subset the streams from you into 10 mutually exclusive requests to start with to see if the split is enough to allow ingestion of all products with minimal latencies. If additional splitting is needed, it can be done pretty easily. re: > tigge_\([a-z]*\)_\([a-z]*\)_\(\d{8}\)_\(\d{4}\)_\(\d\)_.*.grib > .*\(10|30|50|70|90\)$ > tigge_\([a-z]*\)_\([a-z]*\)_\(\d{8}\)_\(\d{4}\)_\(\d\)_.*.grib > .*\(11|31|51|71|91\)$ > ... > > I have been looking at our log files, and the last group of number seam > to match number anywhere in the product name: > > 'tigge_ecmf_fc_20060331_0000_0_0_potential_temperature_0.grib 11335' I assume that this is the product metadata. > -> tigge_\([a-z]*\)_\([a-z]*\)_\(\d{8}\)_\(\d{4}\)_\(\d\)_.*.grib > .*\(11|31|51|71|91\)$ If this is the regular expression being used to match the header, then I believe (but must check with our LDM expert) that your systax is incorrect. For instance, I have never seen a regular expression like: \d{8} Also, where is this regular expression being used (e.g., in ldmd.conf or pqact.conf or in the '-p' pattern on the command line of the LDM 'notifyme' utility)? > 'tigge_ecmf_fc_20060331_0000_0_0_snow_depth_0.grib 11333' > -> tigge_\([a-z]*\)_\([a-z]*\)_\(\d{8}\)_\(\d{4}\)_\(\d\)_.*.grib > .*\(11|31|51|71|91\)$ > > 'tigge_ecmf_fc_20060331_0000_0_0_convective_available_potential_energy_0.grib > 11350' > -> tigge_\([a-z]*\)_\([a-z]*\)_\(\d{8}\)_\(\d{4}\)_\(\d\)_.*.grib > .*\(10|30|50|70|90\)$ > > 'tigge_ecmf_fc_20060331_0000_0_0_geopotential_700.grib 11338' > -> tigge_\([a-z]*\)_\([a-z]*\)_\(\d{8}\)_\(\d{4}\)_\(\d\)_.*.grib > .*\(10|30|50|70|90\)$ > > In the first example, the 11 from 11333 is matched. In the second the 50 > from 11350 is matched, which is the expected behaviour. > In this last example, is seems that the 70 from the end of the regex > matches the 700. > The result is that the same product matched several regex and is sent > several time to NCAR. If the objective is to use the regular expression in the 'request' line on the downstream machine (dataportal), then I suggest that the process can be made alot simplier: - change the random number being appended to the header to a monotonically increasing sequence number - construct the ~ldm/etc/ldmd.conf request lines on the downstream to select a subset of the products. This is exactly what we did during the testing phase -- we used the sequence number Manuel had added to the products and split the request into something like 10 request lines. For instance, if the monotonically increasing sequence number is the last part of the product header, then your request lines could look like: request EXP "0$" tigge-ldm.ecmwf.int request EXP "1$" tigge-ldm.ecmwf.int ... request EXP "9$" tigge-ldm.ecmwf.int This should effectively split the feed into tenths. NB: it is possible that the feed could be split into some other fraction like fifths, twentiths, etc. We need to experiment to find out how small a split is needed to successfully get all products with the minimum of latency. > Baudouin > PS: Doug can you change the space in the regex to a colon (:) If our suggestion about inclusion of the monotonically increasing sequence number is accepted, then I want to push for a simplification in the request regular expression being used. Doug and Dave, we can take a look at what you have and make recommendations if we are allowed to login to dataportal as 'ldm'. It might be the case that my and Mike Schmidt's login still works; I havn't checked. The problem we encountered before was that we could see the setup and edit configuration files, but we were unable to restart the LDM since we were not allowed to login as 'ldm'. This is OK, but increases the time needed to affect a change (Dave can attest to this when he returns from his vacation). Cheers, Tom **************************************************************************** Unidata User Support UCAR Unidata Program (303) 497-8642 P.O. Box 3000 address@hidden Boulder, CO 80307 ---------------------------------------------------------------------------- Unidata HomePage http://www.unidata.ucar.edu **************************************************************************** Ticket Details =================== Ticket ID: EGL-584516 Department: Support IDD TIGGE Priority: Normal Status: Closed