NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
to determine character encoding used by a file on a linux box, try the following: file -bi filename example using a tornado warning ingested via noaaport: file -bi 2013021922.TOR application/octet-stream; charset=binary sample of standard encoding used for example when i create my crontab files on a linux box: file -bi crontab.ldm text/plain; charset=us-ascii it is quite possible that for us, on the receiving end, the encoding used at the source does not matter since we change that encoding while we ingest via noaaport or ldm, but that is just a guess. cheers, --patrick -------------------- Patrick L. Francis VP Media Logic Group http://www.medialogicgroup.com http://www.hamweather.com http://www.alertsbroadcaster.com http://www.modelweather.com FB: http://www.facebook.com/wxprofessor -- -----Original Message----- From: ldm-users-bounces@xxxxxxxxxxxxxxxx [mailto:ldm-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of daryl herzmann Sent: Wednesday, February 20, 2013 10:22 AM To: ldm-users@xxxxxxxxxxxxxxxx Subject: [ldm-users] What is the \x92 character in PNSWSH? Hi LDM Users, So a long term annoyance / curiosity continues to get the best of me. I figured I would spam the very smart folks on the ldm-users list and hope somebody could educate me. Attached you will find a PNS statement from WSH that came down our lovely IDD feed on LDM today. Within the file, you will find the following characters and here's some python code showing where its at :) >>> a = open('PNSWSH.txt').read() >>> a.find("\x92") 2191 >>> a[2190:2200] 'ts from FAA\x92s \r\r\nTDW' So it should have been an apostrophe, but it instead appears to be Windows CP1252 encoding for "RIGHT SINGLE QUOTATION MARK" ? After much gnashing of teeth and conversations with NWS TOC folks, this appears to be some issue with products generated in a Word Processor getting saved to a text file without US-ASCII encoding being set during the process, so it defaults to some windows encoding? Or it is some copy/paste issue. The jury never did return a verdict on this and my support ticket with the TOC was closed, oye. I asked Unidata and they did not know. So does anybody here know what character encoding is used for text data that come down the IDD? If you are still reading this, you are probably wondering two things: 1) why I care. 2) if I have a life. Well, this is an important problem when saving these products to a database. See, databases can be sticklers about character encoding and often do not accept "garbage in". If you check out the NWS website, the apostrophe is gone! http://www.srh.noaa.gov/productview2.php?pil=pnswsh&max=51 http://www.nws.noaa.gov/view/national.php?prod=pns&sid=wsh There is a large and vast conspiracy afoot! daryl -- /** * Daryl Herzmann * Assistant Scientist -- Iowa Environmental Mesonet * http://mesonet.agron.iastate.edu */
ldm-users
archives: