[alicebot-archcomm] more random <random>
Dr. Richard S. Wallace
alicebot-archcomm@list.alicebot.org
Sun, 28 Apr 2002 11:37:58 -0700
I think a statistician would say all of this in four words: random,
independent, uniform selection.
Whether or not the underlying computer or language implements a truly
independent uniform random number generator is another question entirely,
and well beyond the scope of AIML. Entire books, conferences and academic
journals, indeed whole careers, are devoted to solving just that one little
problem.
Rich
----- Original Message -----
From: "Noel Bush" <noel@alicebot.org>
To: "Alicebot and AIML Architecture Committee"
<alicebot-archcomm@list.alicebot.org>
Sent: Sunday, April 28, 2002 9:01 AM
Subject: [alicebot-archcomm] more random <random>
> It has been observed by several people that the <random> processor in
> Program D didn't produce results that seemed very random. In a given
> <random> element, certain <li>s would seem to be favored over others;
> some would even appear to be almost never chosen.
>
> The topic of useful pseudo-random number generation is a deep one (not
> that I am one to speak on the subject), and I am sure that AIML
> processing is not the most demanding application of this. In any case,
> thinking about how to improve the situation led me to a couple of
> insights about what <random> "really means" -- that is, what the
> specification ought to say that it means.
>
> Take two categories, each of which has a template containing a <random>
> element. Say that the two <random> elements have an unequal number of
> <li> children (this is not essential to make the point, but helps make
> it clearer).
>
> Let's say you have a single-bot, single-user environment. Now express
> what you intuitively expect the <random> elements to produce over time.
> Which of the following two statements would you agree with?
>
> 1. The choice of which <li> is selected from one of the <random>
> elements at any given activation of its category is somehow related to
> previous choices of <li> elements from the other <random> element.
>
> 2. The choice of which <li> is selected from each <random> element is
> independent of choices made about other <random> elements.
>
> I would say #2 is the correct and desirable statement, and I think so
> would most people. If one of the <random>s has five <li>s, and the
> other one has seven, I want a "pure" 1:5 chance that any of the elements
> from the first will be chosen when it's activated, and a "pure" 1:7
> chance for any of the second.
>
> But if I only use a single pseudo-random number generator, I don't get
> the "purity" I'm looking for. This is because a pseudo-random number
> generator is not actually producing "true" randomness -- it's using some
> algorithm to produce a sequence of numbers that fits some formal
> description of a kind of "randomness". Most (all?) pseudo-random number
> generators produce the same sequence of numbers each time they are
> executed with the same "seed" -- this seed can of course be varied, but
> the source of this variation is subject to the same problems. And
> regardless of whether you always produce the same sequence or produce a
> different sequence each time (a sequence of sequences), it is critical
> that the numbers in the sequence *are* related to one another, however
> inscrutable that relationship might be to human eyes.
>
> A common way of getting a series of pseudo-random numbers in Java is
> like this:
>
> Random generator = new Random();
> for (int count = 0; count < 100000; count++)
> {
> System.out.println(generator.nextInt());
> }
>
> This will print 100000 pseudo-random integers between 0 and 2^32. Each
> time you run the program, the same sequence will be produced.
>
> If we want to get a random number within a different range, we can use
> the form
>
> generator.nextInt(n)
>
> which produces numbers between 0 and n - 1. Or it is common to see
> something like
>
> (int)Math.ceil(maximum * generator.nextDouble())
>
> which is in fact the form proposed by a contributor to the code a while
> back. The nextDouble() method returns a decimal number between 0.0
> inclusive and 1.0 (exclusive). The ceil function gets the closest
> integer that is higher than the number.
>
> In either case, or whatever variant, the fact is that the sequence has a
> "hidden" order to it -- I am sure there is a better way to describe
> this, but I'm no mathematician. The point is that because an algorithm,
> and not some natural, "truly random" source produces the sequence, we
> run into complications if we want to use this randomness for multiple
> purposes that are supposed to be unrelated.
>
> For all intents and purposes, the sequence of pseudo-random numbers can
> be said to be static, and when I ask for the "next" one I'm always going
> to get the same "next" one as long as I have that particular sequence
> (if I vary the seeds, the method of variation just multiplies the
> sequences). The problem is that if two different <random> elements are
> imagined to each have an independent random "space", tying all of their
> choices to the same sequence of pseudo-random numbers really throws a
> wrench into the works. The sequence in which the <random>s are chosen
> will determine which of the numbers from the master pseudo-random
> sequence are chosen. Whether or not that would have a more randomizing
> or less randomizing effect is probably not possible to say generically.
> What *is* possible to say is that the randomness can no longer be
> guaranteed to meet whatever standard is asserted by the random number
> generator in use.
>
> To make matters worse, in a real bot application, you have multiple
> users, and perhaps even multiple bots, all of whom might be sharing that
> random number generator. It seems that there is a great likelihood that
> the result will not be what people intuit about the <random> function.
> No matter how nice a random number generator you find, it won't solve
> this problem.
>
> So I propose (and have implemented a rudimentary variant of) an
> amplification in the AIML specification about <random>. It would be
> something like this:
>
> * Each <random> element should have its own "space". This means that,
> for example, if <random> element A contains five <li> children, and
> <random> element B contains seven <li> children, the probability that
> any given <li> of A will be chosen when A is activated should
> consistently be 1:5, and the probability that any given <li> of B will
> be chosen when B is chosen should consistently be 1:7. Essentially,
> each <random> must have its own unique sequence of random numbers.
>
> * A "unique space" requirement exists as well on a per-user basis: each
> user should have an equivalent experience of randomness for each
> <random> element, independent of any other users.
>
> * The individual bot also has a uniqueness requirement, multiplying the
> previous two. In effect, if there are m bots, n users, and p random
> elements, there are (potentially) m * n * n independent random number
> sequences.
>
> An example implementation of this is in the latest CVS version of
> RandomProcessor. I also found a nice class called MersenneTwister which
> claims to do better random number generation, but like I said above,
> this in itself doesn't really deal with the issue at hand.
>
> What do you think?
>
> P.S. Sorry if I offend any real mathematicians with my sloppy
> phraseology here.
>
>
> _______________________________________________
> alicebot-archcomm mailing list
> alicebot-archcomm@list.alicebot.org
> http://list.alicebot.org/mailman/listinfo/alicebot-archcomm