[alicebot-archcomm] Whitespace clarification
Jonathan Roewen
alicebot-archcomm@list.alicebot.org
Sun, 14 Sep 2003 10:48:03 +1200
Hi,
One of the biggest problems I always have with XML is proper whitespace
handling.
I've been using a new XML parser, and it's just screwed up all the
whitespace even more, as I haven't yet applied the rules from AIML yet.
However, there are a couple of things that I'm not sure about.
If I have the following category, will the predicate's string be " a test",
" a test ", "a test ", or "a test"? And what would the template string be?
<template>{newline}
{tab}<set name="predicate">{newline}
{tab}{tab}a test{newline}
{tab}</set>{newline}
</template>
from the spec:
2.10:
1: never needs to be applied.
2: where an element is separated by any whitespace characters from any other
element or character data, a single space character is inserted, which from
my understanding would result in a total of 4 normalised space characters:
'<template> <set name="predicate"> a test </set> </template>'
which, in most cases, is not the desired effect.
3: doesn't apply.
2.11:
end of line handling to be the same as for XML specification, which, if I
remember correctly, simply states that consecutive newline characters are
normalised to a single newline ('\n' in many programming languages), which
would count as a whitespace character for section 2.10.
Some AIML sets I've used, whitespace is used a lot, just as in the example
above, and it looks like the current definition of whitespace handling in
the spec would add a lot of unnecessary space characters.
I'm not sure how current interpreters deal with this, and I'm also not sure
about the whole concept of ignorable whitespace in the XML specification ..
does the AIML spec imply that all ignorable whitespace is already stripped
from the content before applying rules in section 2.10?
Regards,
Jon