[alicebot-archcomm] Whitespace clarification

Kim Sullivan alicebot-archcomm@list.alicebot.org
Tue, 16 Sep 2003 02:39:14 +0200


> From memory, the problem I had from Kim's AIML was two sets of
> tags (<srai>
> </srai><srai>  </srai>) which were separated only by a blank line, and the
> intent was to produce output where the results of both srai's
> were separated
> by a single space (was the math aiml for adding).

I checked, it probably was two srais separated by a space or nested srais or
somehting, can't find the threads in the archieve...

> So, if anyone has some extremely concrete definitions of how whitespace is
> to be handled, it would be greatly appreciated .. preferrably something in
> plain english (unlike the xml spec).

We're the arch, here we make the rules ;-) I can explain how whitespace is
to be handled, but i can't guarantee that it's in accordance with the spec.
But then, the spec is very unclear on some issues so we can just as well
decide on what is "right" (BTW, where IS the spec?).

0. If a block of data is marked up by either a CDATA section or a xml:space
attribute, the respective XML rules apply.
1. Whitespace inside element content is mushed to a single space character
2. Whitespace that separates elements gets stripped
3. Whitespace between element and element content is stripped, unless it
serves to separate textual content, in which case #1 applies.
4. All variable values fall under #1, with the addition of stripping any
leading and trailing whitespace
5. (The "ultimate" rule) After processing the template, all leading and
trailing whitespace is stripped, and all remaining whitespace is mushed to a
single space character unless it is specifically marked up, see #0 (only a
failsafe, if rules 0-4 are applied correctly, this rule should be
unnecessary).

I'm open to comments and suggestions. One complication of these rules I've
stumbled upon is this case:
<template>
<random>
<li>Some sentence.</li>
<li>Some other sentence.</li>
</random>
<random>
<li>Another sentence.</li>
<li>Yet another sentence.</li>
</random>
</template>

Where the two sentences aren't separated with a single space as would
normally be expected (due to #2). This could easily be corrected by adding
the "unless it serves to separate textual content" clause, but I'm nearly
certain that it's a real pain to implement ;-) So far, adding a single space
character after the period has done it for me. Besides, one could imagine a
situation where this space could be of harm (constructing words in german
via concatenation or adding word endings) where you'd have to resort to
long, ugly one-liners if #2 was modified, so I'll keep it at that.

My 2 cents,
Kim