[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DOM wrappers and UTF-8 encoding



http://www.cl.cam.ac.uk/~mgk25/unicode.html is a very complete
explanation of UTF-8 encoding/decoding. From the text, it appears that
you have to make sure that your application can or does properly decode
UTF-8 byte sequences. This may not happen automagically. You neglected
to mention what language/program you are using to decode the XML file.
Ed.

Poorav Chaudhari wrote:
> 
> I have an xml file, that contains utf-8 encoded text that is 〹
> format. to read the data between the tags in xml, i am using the
> standard DOM method getNodeValue. if i put a simple string between the
> tags my program spits out the exact string, if i put a (97 -  is
> the ascii value for 'a') then the entire string is converted to 'a'.
> so basically if there is any other garbage over 128, it spits out some
> funny character.
> 
> supposing i had the following xml
> 
> <Data10>Data 10 &#97;</Data10>
> 
> the result it spits out is Data 10 a
> 
> so if the xml contains
> 
> <Data15>&#12473;</Data15>
> 
> i get a funny character.
> 
> please if someone has anyidea what is going wrong please reply soon.
> thanks
> 
> poorav
> 
> ----------------------------------------------------------------------
> Do You Yahoo!?
> Listen to your Yahoo! Mail messages from any phone with Yahoo! by
> Phone.