[alicebot-archcomm] performance testing

Noel Bush alicebot-archcomm@list.alicebot.org
Wed, 14 Nov 2001 13:21:30 +0300


As I have hinted, I have been "renovating" Program D quite extensively
over the last few weeks.  There is virtually no piece of code that I
have not touched.  At minimum I have cleaned up formatting and
introduced a nominal coding style to almost all of the code; in many
cases I have introduced what seem to be successful optimizations in a
great many methods; in several cases I have completely reworked the
architecture (Classifier is now just one of many possible
"Multiplexors", AIMLReader is a faster state machine, AIMLParser uses
so-called "pluggable" processors, substitutions are handled by
externally-configurable maps, logging and tracing is much cleaner and
more configurable, etc.).

I have also partially succeeded in excising some of the third-party code
that was used, although this will take longer in some cases.

In any case, it's still in progress, but at almost any given moment you
should find that the latest material in CVS will pass almost all of the
AIML test cases.  (The only ones it doesn't pass are some of the new
ones I added.)  There are a number of issues about text formatting still
to be resolved, and a whole list of various small quirks, but I have
great hope of ironing those out within about a week.

As always the overall goal here is to provide a clean, readable, useful
reference implementation that will be of use to vendors who implement
their own AIML interpreters and bots.  Naturally this owes a deep debt
to Tom & Pedro, Jon, Richard Wallace, and the zillions of contributors.

Point is, this is nearing the state when some performance testing would
be really useful.  This is one of those big unanswered questions, and
until recently answering it probably wouldn't have been practical, since
despite the (rather funny) major-version designation of "4", the code
has long been in a very transitionary state -- tracking down causes of
performance problems would have been a major headache.  But I think the
situation is now quite different.

One thought I have is to provide a "benchmark mode", in which the
engine, being fed with a giant set of inputs for hours or days at a
time, will periodically write out some stats about running average
response time, memory usage, and so on.  The response timing is easy;
the memory usage I'm less sure about.  I would like to provide this in a
fashion that we can invite people from the general community to run
benchmarks and send us their results.  We'll need to collect standard
information about the environment in which they're running it, and we
should supply a giant set of inputs to all volunteers.

Question 1: Who can give pointers on the memory usage question?  Without
lumping on some big third-party package, I'd like to know if I can get
at this info from Java at all.
Question 2: Who has suggestions about the giant set of inputs?  Since
A.L.I.C.E. has been running on the alicebot.org machine, we have a big
set there.  But there are naturally the privacy concerns to deal with.
On the other hand, supplying a lot of "fake text" won't be helpful.

Note that this is mainly concerned with looking for performance
degradation in a single-client scenario.  The multiple client scenario
is more complicated, and I believe such testing should wait until we
succeed in making Jetty wholly optional for the server.  (Most of the
3rd-party code has licensing issues, among other reasons.)  This may
already be easier given some changes I've made, but I haven't yet tried,
for instance, Rich's homegrown web server.  In any case, I think that
this kind of testing is better handled using an external testing program
rather than something built into the engine.

All thoughts welcome.