[alicebot-archcomm] Whitespace clarification

Ernest Lergon alicebot-archcomm@list.alicebot.org
Tue, 23 Sep 2003 20:21:14 +0200


I think, we should tighten the whitespace handling, because AIML has
nothing to do with formatting. If you want to add special markup to
the bot's output, you could always use HTML or VoiceXML tags for it i.e.

Proposed behaviour:

Any whitespace [1] is replaced by a blank (#x20) and afterwards all
siblings are normalized into one space. Leading and trailing whitespace
is also removed. Whitespace inside tags, where no text content is
allowed, is ignored. Thus the internal processing will always see a
template in one line. The exception is the CDATA tag, which is only
allowed in <javascript>, <system> and other external processing tags
like <php> or <perl>. The attribute xml:preserve is not allowed.

Maybe the whitespace normalization has to be done two-fold - e.g.:

1. Normalize whitespace (template is now in one line).
2. Process template.
3. Normalize whitespace again (double whitespaces added by 2.).

So if you really want an adjacent output, you have to ommit the
whitespace - to pick up Kim's example:

...
    <li>A</li>
</random>
<random>
    <li>3</li>
...

will become

...<li>A</li></random> <random><li>3</li>... putting out 'A 3'

while

...
    <li>A</li>
</random><random>
    <li>3</li>
...

will stay

...<li>A</li></random><random><li>3</li>... putting out 'A3' i.e.

Let me name the HTML markup, where whitespace is handled the same way.

Therefore I second Jonathan's proposal for <aiml:br/> to be the only way
to put a break into the output flow of AIML.

Think of it not like a formatting tag but just a pause in the
conversation. Therefore <aiml:br/> should have an optional attribut
'time' as in <vxml:break/> [2].

Please don't think, this is a contradiction to my above demand of strict
whitespace handling, which is based on deprecating all formatting
elements from AIML.

I rather look at <aiml:br/> as an structuring element, which has a more
general meaning than <html:br/> i.e., thus dividing the output in a more
abstract way by putting a pause in the flow.

Of course this pause has to be converted into a meaningful output at
the frontend of an Alice - some examples:

Console   ASCII   <aiml:br time="3s"/> becomes CR/LF,
                      attribut ignored
                      or waiting for 3 secs before printing the next line

Browser   HTML    <aiml:br time="3s"/> becomes <html:br/>,
                      attribut ignored
                      or waiting for 3 secs before printing the next line

Voice     TTS     <aiml:br time="3s"/> becomes <vxml:break time="3s"/>,
                      Alice is quiet for 3 secs

Ernest


[1] http://www.w3.org/TR/REC-xml#NT-S
[2] http://www.w3.org/TR/speech-synthesis/#S2.2.3

-- 
               ProgramV - Alice on Perl - available at
                http://www.virtualitas.net/perl/aiml/

       VIRTUALITAS - Manufacturer of fine OOPPS - since 1996
*********************************************************************
* VIRTUALITAS Inc.               *       http://www.virtualitas.net *
* Ernest Lergon                  *    mailto:Ernest@virtualitas.net *
*********************************************************************
       PGP-Fingerprint 6E6F DC17 A886 342D  D63F 7880 12F5 6BA9
         PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc

        Member of the Alicebot and AIML Architecture Committee
         http://www.alicebot.org/committees/architecture.html

---------------------------------------------------------------------
SPAM ALERT                       http://www.virtualitas.net/spam.html
---------------------------------------------------------------------