[alicebot-archcomm] Whitespace clarification

Jonathan Roewen alicebot-archcomm@list.alicebot.org
Thu, 18 Sep 2003 08:08:08 +1200


> OK, allthough i still think we need the extra clause for some cases,
unless
> we can rephrase it into something all-encompassing (I was making and
> implementing those rules as I was writing my AIML, they are based upon
what
> I needed and thought was practical at that time, they're not from the top
of
> my head)

I'm not trying to make these up as I go along .. I was trying to write them
from what previous AIML was supposed to do, and taking into consideration
how some people write AIML (lots of indenting etc).

Here is a better description, that also takes into account your single space
normalisation within character data as well.

0. xml:space='preserve'/CDATA section = normal xml behaviour.
1. leading & trailing whitespace between an element and it's nested content
should be stripped (nested content can be character data or an element).
2. whitespace within character data should be normalised to a single space.
3. whitespace separating sibling elements should be normalised to a single
space, unless there is no whitespace content separating the elements.

> I'd prefer stripped (see example in previous post where whitespace
separates
> two sibling <random> elements).

I'd rather not, as it makes it hard to get a space in without having to
resort to things like &nbsp; etc. I personally think the space makes much
more sense, but that the space can be eliminated by not including any
whitespace whatsoever.

> I think <br> has something to do with appearance, for example for speech
> systems it's meaningless. It also depends on the platform you're using,
for
> example, for IM systems, it wouldn't be clear whether a line break also
> means sending the message (I'm not suer all systems support breaks in
> messages), mobile phones SMS doesn't have a use for it either. There
should
> be nothing preventing you from implementing or using it though - most
> interpreters will just ignore unknown markup (unless they're really picky
> validating). The same goes for most of the other HTML based tags.

Well, I still think the <br /> tag would be useful, since for systems such
as IM, WAP, TTS, etc, it's possible for either the 'responder', or some
other output system would be able to recognise that it can (and should) be
ignored.

Jon

PS: Sorry about the double postings coming up .. I dunno what's with my
Outlook Express/Yahoo.