[alicebot-developer] A.L.I.C.E Web page Spider
Gary Dubuque
gdubuque at scattercreek.com
Fri Jun 2 00:14:25 PDT 2006
AIMLpad (Program N) can retrieve a web page using it scripting language.
With careful scripting it can probably find the text on the page. Once
found, it can split the text into sentences. It can extract from those
sentences nouns, verbs, etc. using WordNet. When used in conjunction with
ConceptNet it can summarize the text into a few sentences or tag the
sentences with the parts of speech or even predict the mood of the text.
AIMLpad is designed to use its scripting language to create AIML categories
and even put them into the appropriate files. When used with OpenCyc, it
can reason about the concepts such as "humans eat food" or "humans have two
hands", etc. OpenCyc does deductive reasoning. AIMLpad also has a simple
expert rule system (both forwards and backwards chaining) built in where you
could create the transformations to apply to the web page's text. Its fuzzy
logic is not very effective since the multiple valued variables don't have
confidence factors on each multivalue, but the framework is there to be
fixed to make this work too. While all these tools are available in the one
editor, it still is not an easy solution to scrape web pages for content and
make it into AIML. In theory, it should be possible. In practice, it
probably won't happen.
On the otherhand, if the content is laid out correctly, i.e., question
followed by answer followed by a blank line repeated for however many
questions you have, AIMLpad will automatically convert that into AIML
categories just by selecting one of its menu options (Tools --> Quick Build
Categories...).
I have used AIMLpad to get the quote or joke of the day or to get the
weather report by hacking web pages.
-----Original Message-----
From: alicebot-developer-bounces at list.alicebot.org
[mailto:alicebot-developer-bounces at list.alicebot.org]On Behalf Of Ty
Ademosu
Sent: Thursday, June 01, 2006 10:03 PM
To: alicebot-developer at list.alicebot.org
Subject: [alicebot-developer] A.L.I.C.E Web page Spider
Please Please help with this.
I need a spider created that will crawl a webpage and grammatically parse
the page and create AIML (Artificial Inelligence Markup Language) data. This
data will be saved into an AIML file and used to teach a chatterbot the
contents of the web page.
The way we see it working is:
1. The spider crawls a page examining the text of each sentence.
2. Then using a grammatic parser it will reformulate that sentence data into
possible patterns and responses to be entered as data in the AIML file.
3. Then it will format this into a standard AIML file and allow you to save
this code or copy and paste it to another source
This will require someone experience in AIML as well as grammatic sentence
parsing.
The idea for this project is to get the stand-alone script, possibilities of
this being a desktop VB script. But I hear that existing perl and php
extensions may make this easier. Open to other suggestions. Preferably a
desktop application to start.
--
View this message in context:
http://www.nabble.com/A.L.I.C.E-Web-page-Spider-t1720827.html#a4674266
Sent from the Alicebot Developer forum at Nabble.com.
_______________________________________________
alicebot-developer mailing list
alicebot-developer at list.alicebot.org
http://list.alicebot.org/mailman/listinfo/alicebot-developer
More information about the alicebot-developer
mailing list