On Wednesday 02 March 2005 22:02, Jeff Hill wrote:
> My solution to the wide character issue was to have the putChar
> and getChar interfaces pass type int. UTF-8 then becomes an
> implementation (an internal storage compression) issue.
Ok, this is a possibility. But it means that, for instance, an IOC will use
wide characters throughout its code. Just caring about our precious memory,
you know... ;) That is, provided it doesn't want to re-encode everything
again, of course...
> > > I would bet that such an implementation is in the end a lot
> > > more efficient than any implementation based on mutability,
>
> such as
>
> > > imposed by the dataAccess string interface.
>
> Sorry, I reread your discussion about immutable strings better
> understanding your suggestion. A string must be written at some
> point
> in its life time, but I am supposing that your immutable string
> would
> only receive its value when it was constructed? I think I see the
>
> distinction and that under the immutable model, if an existing
> string
> is written, then a new string is created and the old string is
> thrown
> away.
Almost. Under the immutable method, there /is no/ writing to a string, period.
There are only functions (or methods) that might return (new) strings. [But
see below, before you answer this.]
> I don't think that the dataAccess interface precludes the
> internal
> implementation from doing exactly that with its storage buffers
> even if the
> interfaces makes it look like this is not the case.
Hmm. That would be a possibility I haven't considered yet. It would mean that
the immutable string implemetation would not "openly" talk to dataAccess, and
so would not need to implement the daString interfaces. Instead, it could use
DA and the daString interfaces internally, and present a purely functional
interface to the user.
This sounds like a viable solution.
> > Functional style:
> >
> > res = concat(s1,s2);
> >
> > Imperative style:
> >
> > res = new string( s1.length() + s2.length() ); // or was
>
> it -1 or +1 ???
>
> > res.copy( s1 );
> > res.append( s2 );
>
> For example, I could easily design an interface
> that looks like "Functional Style" and implement
> it internally as "Imperative style".
> Ditto for visa-versa.
>
> One could argue that, ignoring the implementation, the
> "Functional Style" programming interface is easier
> for programmers to use. Maybe so. The stringSegment interface
> is concentrating on being the simplist and clearest possible
> interface to an implemantion, but the best design for an
> implemenation interface and a programming interface might
> be incompatible. Therefore, we just might need to design
> also a programming interface that uses a private stringSegment
> to get at the implemenation depending
> on how often there is direct access to stringSegment
> in user plug-ins.
Yes, I think I can agree with that.
> Also, bare in mind that one of the fundamental data access
> premises is that the user has a data container with properties
> that may be written.
Yes, I am quite aware of this. And I must admit that the conceptual simplicity
arising from the symmetry between reading and writing of a container object's
properties is not achievable without mutability of the same. At least I
haven't been able to come up with any better idea.
> Therefore, a mutable interface to strings
> is required. This certainly does not preclude throwing internal
> storage for an old string away when a new string is
> written should that turn out to be the best implementation.
Ah, I understand now what you mean by 'writing to the string object'. You
mean, of course, assignment (for instance during mutating traversal of a
propertyCatalog object). Yes, in this case the old string data needs to be
dropped (not necessarily discarded, there may be other objects using its data
or parts of it).
> I agree that your constant time internal implementation based on
> careful maintenance of reference counting might be very efficent,
> but I don't see that the stringSegment interface precludes that
> implementation.
You are right.
> > Another advantage of functional/immutable strings is that
>
> support for
>
> > unicode encodings is a lot easier and less error-prone. For
>
> instance,
>
> > since a UTF-8 character may be longer than one byte, a UTF-8
>
> encoded
>
> > string should never be written to at an arbitrary byte index.
>
> With
>
> > immutable strings it is much easier to maintain such
>
> invariants.
>
> And the internal implemenation under dataAccess could employ such
>
> optimzations when concatinating strings also should it arrange
> storage
> this same way.
>
> The stringSegment interface *is* indexable by the stream element.
>
> A stream element could be mapped by the implementation to a
> "UTF-8
> character longer than one byte". The stream maintains a current
> position which would always be placed at the start of a UTF-8
> boundary. So when reading or writing a sequence of tokens the
> overhead is low. When moving the index, it would of course be
> necessary to scan the UTF-8 tokens one-by-one, but that cant
> be avoided by any UTF-8 implementation with random access by
> token index.
Yes. My main point was that random access to the characters of a string is an
operation that is almost never really needed. Ask a Perl programmer if
she /ever/ has needed random access to parts of resp. characters in a string.
The probable answer is "no". As long as you have sufficiently rich high-level
functions to analyze and construct strings (such as regexp matching and
substitution) you simply never even consider using index based (random)
access.
> I guess you have to ask if random access is useful or not.
> If useful, then it *is* a bit less efficent with a UTF-8
> implemenation. That cant be avoided. Otherwise, if its not
> needed, or we dont like to implement it, then we could drop
> that feature from the interface.
The only reason I can think of to provide random access is in order to
efficiently implement those (higher-level) features that are really useful.
Ben
- Replies:
- RE: memory management Jeff Hill
- References:
- RE: memory management Jeff Hill
- Navigate by Date:
- Prev:
RE: memory management Jeff Hill
- Next:
RE: memory management Jeff Hill
- Index:
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: memory management Jeff Hill
- Next:
RE: memory management Jeff Hill
- Index:
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|