[alicebot-developer] A.L.I.C.E Web page Spider
Gary Dubuque
gdubuque at scattercreek.com
Fri Jun 16 09:30:37 PDT 2006
Ty--
Last question first... Does that happen on the view menu option where you
check MS Agent on/off? To manually turn off the agent edit the options.ini
file (found in the same folder as the AIMLpad.exe) and set the MSagent=0 ,
it will probably be a 1 or 2 before you change it to 0 to turn off the
agent. The most likely reason the program stops when trying to set ms agent
preferences is that the agent stuff is not installed correctly. There is a
tutorial at AIMLpad.com for figuring out the correct installation of MS
Agent.
question 3. I am assuming you mean WordNet because Worldnet is an AT&T
product. It is not necessary to use the public site for WordNet since as a
dictionary of sorts it is just as current using it locally. The problem
with WordNet is that it is a file base system and really needs to be on your
computer to work efficiently. Actually I haven't visited WordNet for
several months, so if there is a public interface through the internet, it
is new to me. I'll have to look into that.
Once WordNet is installed, you can use AIMLpad script commands to extract
nouns, verbs, etc. from given text. There are tutorials at AIMLpad.com for
learning how to do this too. The prime example adds additional categories
(by substituting words) similar to a new one added for a response not
matched.
ConceptNet can be the public site if you want. I had to add a fix to the
software for the server to offer the full functionality of the user
interface. I doubt that the public version even exposes the XMLRPC
services. If it does, the script code sample provided in your download can
be easily modified to use that resource. You need to know that ConceptNet
is only interfaced through the AIMLpad scripting language. The provided
script examples show how to get related concepts, paraphrase a text block,
calculate emotional content as well as a few other things like parsing
sentences into their parts of speech.
question 2. The AIML for utilizing OpenCyc needs to be loaded. Kino
Coursey extended AIML with new tags to accommodate functions provided by
Cyc. Unless the AIML is written to use those extensions, the connection to
the public server will not be accessed. How AIMLpad operates with Cyc is
explained in a separate pdf located in the doc folder of the standard
install. I believe there are sample AIML files to try out included too.
question 1. May I suggest something? Part of the problem with web pages is
finding what text you are wanting to "index" or "translate". Perhaps you
might want to consider RSS feeds which already have this organized in a way
that makes it much easier to find. That said, I have not played with RSS
yet (although I bet AIMLpad is capable of doing so.)
This is another example where script commands are required. Again there are
tutorials explaining the techniques. Briefly a script command named "url"
which is followed by the http address loads the web pages into variables
(predicates in the AIML terminology) starting with URLdata1 and continuing
through URLdata2, URLdata3, etc. in chucks of 32K. If there is alot of
advertisements on a page, you often have more than 32K and a whole bunch to
wade through before you get to readable content like news or weather or even
the joke of the day, etc. You can use the "find" command and the SUBSTR()
function to extract the text you want.
So here we are with text to convert. How is the question. How do we take
text and extract patterns to index it? I suppose you could send it to
ConceptNet to extract the keywords most likely that the text contains.
Given words which the text is about, are you guessing forms like "What is
???" or "Who Is ???" to be the prompts to respond with the web search
information?
If you are thinking this, perhaps you want to explore the AnswerBus link
which can be adapted to look up keywords and only needs a couple of AIML
categories instead of manufacturing many, many categories from web crawling.
The bottom line comes down to what kind of text do you want to turn into
AIML. The techniques probably will vary for different kinds of information.
Again, AIMLpad has some simple utilities for strictly formatted text to
convert to AIML.
If you can figure out the process for transforming the text, I'd be glad to
help make the tool to do it.
BTW, Google tries something like this and probably has the largest computer
complex to do so in the world. IBM provides a specialized version of
extracting from the web (called web fountain), but it is extremely expensive
and hand tailored to each specific request. As I have said before, the
greatest minds and billions and billions of dollars are currently dedicated
to this - it is not a simple task.
Hope this helps,
Gary Dubuque
-----Original Message-----
From: alicebot-developer-bounces at list.alicebot.org
[mailto:alicebot-developer-bounces at list.alicebot.org]On Behalf Of Ty
Ademosu
Sent: Thursday, June 15, 2006 1:17 PM
To: alicebot-developer at list.alicebot.org
Subject: Re: [alicebot-developer] A.L.I.C.E Web page Spider
Gary --
1) How do I fetch a webpage with aimlpad.
2) I tried to connect to a public opencyc server but it's doesn't seem to be
helping my chat sessions.
3) Also are there public worldnet and concept net servers and how would
aimlpad connect to them. 4) Finally, I cannot seem disable the little
merlin msagent, aimlpad crashes when I try to access that tab.
Many Thanks
Ty
--
View this message in context:
http://www.nabble.com/Help%2C-Need-A.L.I.C.E-Web-page-Spider-t1720827.html#a
4889869
Sent from the Alicebot Developer forum at Nabble.com.
_______________________________________________
alicebot-developer mailing list
alicebot-developer at list.alicebot.org
http://list.alicebot.org/mailman/listinfo/alicebot-developer
More information about the alicebot-developer
mailing list