NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: Parsing and missparsing XML

Hi Benno,

Benno Blumenthal wrote:
> 
> In looking through my logs, I noticed that fastsearch.net has managed
> (somehow) to find  my thredds directory, but seems to be misparsing it.
> 
> 
> The top of the directory is at
> 
> http://iridl.ldeo.columbia.edu/SOURCES/thredds.xml
> 
> and that file has lines in it like
> 
> <catalogRef xlink:title="DASILVA"
> xlink:href="http://iridl.ldeo.columbia.edu/SOURCES/.DASILVA/thredds.xml"/>
> 
> The robot is hitting  urls like
> 
> http://iridl.ldeo.columbia.edu/SOURCES/.DASILVA/thredds.xml"/
> 
> 
> which I am presuming to mean that it does not understand the   '/>' notation
> to end the tag.
> 
> 
> Are we using a non-standard xml notation?   I was just following the example I
> was given.

No, that is standard XML. Perhaps the crawler is confused by the differences
between XML and HTML. But even so, I would think it would stop for the '"'.
Maybe the crawler would do better if there were a space before the "/>" but
either way is valid XML.

Ethan


> Benno
> 
> 
> 
> --
> Dr. M. Benno Blumenthal          benno@xxxxxxxxxxxxxxxx
> International Research Institute for climate prediction
> The Earth Institute at Columbia University
> Lamont Campus, Palisades NY 10964-8000   (845) 680-4450
> 
> 

-- 
Ethan R. Davis                       Telephone: (303) 497-8155
Software Engineer                    Fax:       (303) 497-8690
UCAR Unidata Program Center          E-mail:    edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO  80307-3000              http://www.unidata.ucar.edu/

  • 2002 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: