[alicebot-archcomm] More on: "Fetching Info from the Web"
Ernest Lergon
alicebot-archcomm@list.alicebot.org
Wed, 30 Jul 2003 21:48:36 +0200
Gary Robertson wrote:
> Jon (earlier):
>> The only problem I have is trying to follow the chain of requests,
>> and the
>> input/output expected. If you could do like a simple ascii diagram (like a
>> flowchart) of how it's all supposed to fit together, that would be awesome
>> =)
>
> WOW! I will work on this tonight; my earliest opportunity.
>
Please safe that work, because:
Your outline doesn't more than describing the CGI protocol and says
nothing about how AIML should handle the content.
The problem is, that you have to know beforehand, what the remote server
is giving back. If you only receive HTML, you have an unstructured
answer only understandable by a human. In former times HTML was used to
structure a document with logical tags like <h1>, <em>, <cite>, <code>
etc.. Today - multimedia is when everything flickers - you can't rely on
this, because any tag can be redefined by CSS to look like any other. So
HTML - as it's used nowadays - degenerated into a text formatting
language. So it's hardly possible to extract information from an HTML
page to process it further as proposed.
An example: If you get a pricelist from a shop, you know, that today the
prices are always in the third column of the second table on the page.
But what if this changes tomorrow? Than Alice gives back the ISBN
instead of the price, because the HTML designer added one column.
Another: Alice posts a subscription form to a mailing-list page and
get's back a 'HTTP/1.1 200 OK' - but the text reads: 'Please go back to
the form and re-enter/correct your ZIP-code'.
Even worse: What if the form has a CAPTCHA? See http://www.captcha.net/
- in action at http://addurl.altavista.com/addurl/new or
https://www.e-gold.com/acct/login.html
If you receive XML, it's much better, but you have to know the structure
to be able to interpret the content.
That's why SOAP was invented, W3C works on the standard for a semantic
web and even the AIML specs mentioned it - see
http://www.alicebot.org/TR/2001/WD-aiml/#section-embedding-aiml
There are so much boundary conditions, that specifying a simple
<aiml:submit> as proposed will open pandora's box.
BTW: Your example <submit> on
http://studio.tellme.com/vxml2/ref/elements/submit.html is pure VoiceXML
and has nothing to do with fetching webcontent - it just a hyperlink to
another VXML "page", which of course can be created dynamically by any
means including a CGI script. But it always returns valid VXML!
An interesting feature of <vxml:submit> is the audio file to be played
while fetching the next utterance of the bot - useful for lengthy
database queries - e.g.:
user> Do you have a flight on Saturday to New York?
bot> uhm... let me see... please hold the line...
many traffic today on the net... maybe...
please wait... uhm... let me see... please hold the line...
many traffic today on the net... maybe...
Here it is: Air America on 15:30 - Ok?
It might be better to think about a mechanism, how different ALICEs can
communicate with eachother.
Imagine the following AIML tags:
<aiml>
<startup>...<bot>...
<friends>
<friend name="Jimmy" url="http://jimmy.home.de:3421"/>
<friend name="JonDoe" url="http://www.unknown.org/aiml_remote"/>
<friend name="Elvis" url="http://presley.com/elvis.cgi"/>
</friends>
...</bot>...</startup>
<category>
<pattern>DONTKNOWSELF *</pattern>
<template>
One moment please, I'll ask my friends.
<remote><star/></remote>
</template>
</category>
</aiml>
The tag <remote> triggers the interpreter to create a well-structured
query, to send it via internet to his remote buddies defined in
<friends> and to receive a well-structured answer.
While the hidden knowledge exchange part of AIML might be complicated
and therefore must be very well designed (using SOAP or something
home-made), the interface for the AIML writer remains simple - isn't it
this, the AIML pragma is all about?
Just a raw idea to be discussed.
Issues of security and trustability ommitted for now ;-)
Ernest
--
ProgramV - Alice on Perl - available at
http://www.virtualitas.net/perl/aiml/
VIRTUALITAS - Manufacturer of fine OOPPS - since 1996
*********************************************************************
* VIRTUALITAS Inc. * http://www.virtualitas.net *
* Ernest Lergon * mailto:Ernest@virtualitas.net *
*********************************************************************
PGP-Fingerprint 6E6F DC17 A886 342D D63F 7880 12F5 6BA9
PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc
---------------------------------------------------------------------
SPAM ALERT http://www.virtualitas.net/spam.html
---------------------------------------------------------------------