[alicebot-archcomm] more random <random>
Noel Bush
alicebot-archcomm@list.alicebot.org
28 Apr 2002 20:01:19 +0400
It has been observed by several people that the <random> processor in
Program D didn't produce results that seemed very random. In a given
<random> element, certain <li>s would seem to be favored over others;
some would even appear to be almost never chosen.
The topic of useful pseudo-random number generation is a deep one (not
that I am one to speak on the subject), and I am sure that AIML
processing is not the most demanding application of this. In any case,
thinking about how to improve the situation led me to a couple of
insights about what <random> "really means" -- that is, what the
specification ought to say that it means.
Take two categories, each of which has a template containing a <random>
element. Say that the two <random> elements have an unequal number of
<li> children (this is not essential to make the point, but helps make
it clearer).
Let's say you have a single-bot, single-user environment. Now express
what you intuitively expect the <random> elements to produce over time.
Which of the following two statements would you agree with?
1. The choice of which <li> is selected from one of the <random>
elements at any given activation of its category is somehow related to
previous choices of <li> elements from the other <random> element.
2. The choice of which <li> is selected from each <random> element is
independent of choices made about other <random> elements.
I would say #2 is the correct and desirable statement, and I think so
would most people. If one of the <random>s has five <li>s, and the
other one has seven, I want a "pure" 1:5 chance that any of the elements
from the first will be chosen when it's activated, and a "pure" 1:7
chance for any of the second.
But if I only use a single pseudo-random number generator, I don't get
the "purity" I'm looking for. This is because a pseudo-random number
generator is not actually producing "true" randomness -- it's using some
algorithm to produce a sequence of numbers that fits some formal
description of a kind of "randomness". Most (all?) pseudo-random number
generators produce the same sequence of numbers each time they are
executed with the same "seed" -- this seed can of course be varied, but
the source of this variation is subject to the same problems. And
regardless of whether you always produce the same sequence or produce a
different sequence each time (a sequence of sequences), it is critical
that the numbers in the sequence *are* related to one another, however
inscrutable that relationship might be to human eyes.
A common way of getting a series of pseudo-random numbers in Java is
like this:
Random generator = new Random();
for (int count = 0; count < 100000; count++)
{
System.out.println(generator.nextInt());
}
This will print 100000 pseudo-random integers between 0 and 2^32. Each
time you run the program, the same sequence will be produced.
If we want to get a random number within a different range, we can use
the form
generator.nextInt(n)
which produces numbers between 0 and n - 1. Or it is common to see
something like
(int)Math.ceil(maximum * generator.nextDouble())
which is in fact the form proposed by a contributor to the code a while
back. The nextDouble() method returns a decimal number between 0.0
inclusive and 1.0 (exclusive). The ceil function gets the closest
integer that is higher than the number.
In either case, or whatever variant, the fact is that the sequence has a
"hidden" order to it -- I am sure there is a better way to describe
this, but I'm no mathematician. The point is that because an algorithm,
and not some natural, "truly random" source produces the sequence, we
run into complications if we want to use this randomness for multiple
purposes that are supposed to be unrelated.
For all intents and purposes, the sequence of pseudo-random numbers can
be said to be static, and when I ask for the "next" one I'm always going
to get the same "next" one as long as I have that particular sequence
(if I vary the seeds, the method of variation just multiplies the
sequences). The problem is that if two different <random> elements are
imagined to each have an independent random "space", tying all of their
choices to the same sequence of pseudo-random numbers really throws a
wrench into the works. The sequence in which the <random>s are chosen
will determine which of the numbers from the master pseudo-random
sequence are chosen. Whether or not that would have a more randomizing
or less randomizing effect is probably not possible to say generically.
What *is* possible to say is that the randomness can no longer be
guaranteed to meet whatever standard is asserted by the random number
generator in use.
To make matters worse, in a real bot application, you have multiple
users, and perhaps even multiple bots, all of whom might be sharing that
random number generator. It seems that there is a great likelihood that
the result will not be what people intuit about the <random> function.
No matter how nice a random number generator you find, it won't solve
this problem.
So I propose (and have implemented a rudimentary variant of) an
amplification in the AIML specification about <random>. It would be
something like this:
* Each <random> element should have its own "space". This means that,
for example, if <random> element A contains five <li> children, and
<random> element B contains seven <li> children, the probability that
any given <li> of A will be chosen when A is activated should
consistently be 1:5, and the probability that any given <li> of B will
be chosen when B is chosen should consistently be 1:7. Essentially,
each <random> must have its own unique sequence of random numbers.
* A "unique space" requirement exists as well on a per-user basis: each
user should have an equivalent experience of randomness for each
<random> element, independent of any other users.
* The individual bot also has a uniqueness requirement, multiplying the
previous two. In effect, if there are m bots, n users, and p random
elements, there are (potentially) m * n * n independent random number
sequences.
An example implementation of this is in the latest CVS version of
RandomProcessor. I also found a nice class called MersenneTwister which
claims to do better random number generation, but like I said above,
this in itself doesn't really deal with the issue at hand.
What do you think?
P.S. Sorry if I offend any real mathematicians with my sloppy
phraseology here.