[alicebot-archcomm] Whitespace clarification
Kim Sullivan
alicebot-archcomm@list.alicebot.org
Wed, 17 Sep 2003 00:34:41 +0200
> > 0. If a block of data is marked up by either a CDATA section or a
> xml:space
> > attribute, the respective XML rules apply.
> > 1. Whitespace inside element content is mushed to a single
> space character
> > 2. Whitespace that separates elements gets stripped
> > 3. Whitespace between element and element content is stripped, unless it
> > serves to separate textual content, in which case #1 applies.
> > 4. All variable values fall under #1, with the addition of stripping any
> > leading and trailing whitespace
> > 5. (The "ultimate" rule) After processing the template, all leading and
> > trailing whitespace is stripped, and all remaining whitespace
> is mushed to
> a
> > single space character unless it is specifically marked up, see
> #0 (only a
> > failsafe, if rules 0-4 are applied correctly, this rule should be
> > unnecessary).
>
> Point 1 is a tiny bit unclear. This just means leading and trailing
> whitespace right?
No, that is covered by #3. I really meant whitespace inside the text - about
the same as happens when writing HTML, unless you use <br>, you won't get a
linebreak even if you separate every word with newlines:
<template>{newline}
Some sentence.{newline}
Some other sentence.{tab}And another.{newline}
</template>
will be rendered as "Some sentence. Some other sentence. And another."
> Point 2: could you give an example of which whitespace would be stripped?
I'll use my previous example:
<template>{newline}
<random>{newline}
<li>Some sentence.</li>{newline}
<li>Some other sentence.</li>{newline}
</random>{newline}
<random>{newline}
<li>Another sentence.</li>{newline}
<li>Yet another sentence.</li>{newline}
</random>{newline}
</template>
Here, all the newlines will be stripped. It get's more clear when written on
a single line:
<template>{newline}<random>{newline}<li>Some sentence.</li>{newline}...
Point 2 covers all the withespace that appears directly between two tags.
> Point 3: isn't this just point 2 with an extra clause?
No, point 3 handles the case when you write a template like this:
<template>{nl}
{tab}Some text.{nl}
</template>
Both the newlines separate the element from it's content, but do not
separate some text from another text, and therefore should be stripped,
resulting in "Some text." and not " Some text. "
The extra clause is supposed to handle cases like the two consecutive
<random> tags. Up to this point, we nicely got rid of all the superfluous
whitespace, but we got rid of too much of it - the sentences from the above
won't be separated by a space as we'd want, because the only thing that
separates them is the #2 whitespace. Even if we put the extra space in
there, like this: <li>Some sentence.{space}</li> it would be removed. So we
let this extra whitespace "live" until we're sure that we can strip it (only
elements, no content) or we have to keep it (some more text).
The reason why I think we shouldn't use the inter-element whitespace for
this is the following: let's say we want concatenate random strings, for
example for a letter-digit code:
<random>
<li>A</li>
<li>B</li>
<li>C</li>
</random>
<random>
<li>1</li>
<li>2</li>
<li>3</li>
</random>
I'd want the result to be "A1" or "B3" and not "A 1".
Kim