On Wednesday 26 October 2005 16:42, Andrew Johnson wrote:
> Benjamin Franksen wrote:
> ...
> Marty Kraimer replied:
> > Java 5 uses 16 bits for char, which is not sufficient to encode all
> > uni-code character sets.
> > It uses 2 consecutive chars to hold a unicode character that does
> > not fit in 16 bits.
> >
> > At least some C/C++ implementations use 32 bits for wchar which is
> > sufficient for all unicode characters.
> > But what if an implementation uses 16 bits?
> >
> > Thus how will the number of characters in a UTF-8 string be used?
>
> Unicode/UTF-8 (which is what we really mean when we say UTF-8) is
> well-defined in that if a routine understands the multi-byte encoding
> rules it can scan a UTF-8 string and count the number of Unicode
> 'code points' contained in it, which is probably what Benjamin means
> when he talks about a character count.
Yes.
> However like Marty I would strongly question the usefulness of this
> information to anything other than the final GUI display widget that
> is going to put the thing on a screen; even if it were using a
> monospaced font, some Unicode code points actually encode 'combining'
> characters like accents so the number of code points wouldn't always
> match the width of the final output.
Ok. It was just a thought. It seems you have put much more thought into
this than I ever did, so you (both) are probably right. I was just
thinking that conversion to other encoding/formats might be faster if a
character (or code point) count was readily supplied. I agree that this
is probably largely a client side matter and thus not so critical.
Ben
- References:
- CA V4 Protocol Specification Jeff Hill
- Re: CA V4 Protocol Specification Marty Kraimer
- Re: CA V4 Protocol Specification Andrew Johnson
- Navigate by Date:
- Prev:
Re: CA V4 Protocol Specification Andrew Johnson
- Next:
RE: Release 3.14.8: What goes in it and when? Jeff Hill
- Index:
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: CA V4 Protocol Specification Andrew Johnson
- Next:
Re: CA V4 Protocol Specification Andrew Johnson
- Index:
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|