Wednesday, November 21, 2007

Round 2, Zillions 2 vs. fmax4, Draw in 9 moves!

With remarkable brevity, the new version of fmax4 and Zillions 2 concluded their game in only 9 moves with a 3-fold repetition draw!

Zillions 2 vs. fmax4

With the Vortex-SMIRF game ending rather quickly, and with a new copy of fmax freshly installed, I decided to see how it would fare against Zillions. The game began in the style of the Gothic Queenside Petroff that is somewhat popular with new Gothic Chess programs. Then, fmax allowed the d5 pawn to fall, but it could have won the d4 pawn in upcoming moves with gain of tempi.

With the search not being deep enough to see the regain of the pawn, it could only see the "0" score of the repetition draw by continuing to chase the white Archbishop. Zillions 2 did not risk placing it anywhere other than on f3 and d5, so the draw was achieved on move 9!

15 comments:

Smirf said...

Hmm ... it really is somehow frustrating to find SMIRF now even behind those two point gatherers. ;-)

So I must hope for my SMIRF 'zombie' to finally slowly move out of its grave ...

DuelingBishops said...

There's still 12 rounds left to play isn't there? So your program could technically still finish 12-2.

Victor Vector said...

These things happen. This game was a strange draw. It just shows how hard it is to make a program play the way we expect.

ChessCarpenter said...

I think this game will forever be the shortest Gothic game ever "played"! Some work is needed to improve fmax4, but you have to beat Zillions 2x's. These are the gimme games :)

geography_teacher said...

I've got to agree that Zillions is the whipping post and should be handled 2-0 by the rest of the field. We should still show the newcomers some respect especially now that the bar has apparently been raised rather high by Vortex. Remember, smirf has scored wins against Vortex in the past, which must mean it has been improved also. It's just tough to say how good each program is with not even 2 rounds done.

I read the postings at chessvariants.org mostly and you'd think ChessV and Zillions were the only 10x8 capable engines on the planet. Most of the games they play are just silly, variants that are more like cartoons versions of chess than actual candidates to replace chess like Gothic Chess is (though I still play mostly chess.)

Why aren't their more posts about Gothic Chess on that site? It's a mystery to me. You'd think the variant community would recognize an outstanding variant when they saw one. Perhaps they are too caught up in the folly of introducing variant after variant and they can't be bothered with the "detail" that Gothic Chess is such an excellent game!

Who knows.

I think we should let those people know where ChessV and Zillions stand as the tournament progresses. What do you all think?

H.G.Muller said...

I am ashamed to say that the version of fMax that played in the second round contained several bugs. I did repair a bug that caused it to think it was always in the end game, but it turned out its middle-game evaluation values were not tuned well, and discouraging Pawn moves too much, with totally passive play as a result.

In addition, it turned out that allfMaxes so far had a bug that made their hash table totally inoperative (and therefore also their repetition-draw detection, as the game history is stored in the hash table).

I submitted a new version for the next round, where all this is corrected.

GothicChessInventor said...

Don't worry H.G., take a look at this blog entry:

http://gothicchess.blogspot.com/2007/09/vortex-vs-smirf-blog-game.html

Vortex played a blog game against SMIRF at one day per move and got wiped off the board even with a huge lead in development! I stopped counting the bugs I introduced into that version.

There is only one way to determine if a change to one's source code produces better play: you must run macthes against the older version, and diligently review the game by creating a "log" of what was happening during the search.

This was the hard lesson I had to learn after Vortex was killed by SMIRF. If took me over 120 hours of coding and testing to produce this new version of Vortex, and even yesterday ChessCarpenter sent me a game he won against it in the endgame no less! So, it was time to put the code through the microscope again, and I found something else that was not working as I had intended it (not really a bug, just untested code!)

So, it's a process, sometimes a long process, but once you have your program fully tuned, it is very rewarding.

H.G.Muller said...

Yes, I know. But fMax was an entirely new thing, that played games the version from which it was derived would not play at all. Perhaps I should have tested it for normal Chess, in particular if it could still solve Fine #70. But I figured the bugs would be likely to only show up when I would use the new features.

A non-functioning hash table is not very obvious. It slows you down, but if you have nothing to compare to you won't notice that. I was set on the trail when it started to not avoid rep-draws when ahead. And that became only apparent when I corrected the game-stage-detection bug, as with this bug it was so eager to push Pawns that it would not think about repeating moves anyway.

I did have time to properly test the current version, though. I played a match of 67 blitz games against TSCP-G, which did not end too disastrously. (18+, 7=, 42- is 32%) I understand that TSCP-G is really a state-of-the-art program, not at all like its normal-Chess cousin (which does not have null-move or hash table). So I guess this is really the best result you could hope for, from a ~100-line engine...

GothicChessInventor said...

From my email exchange with Michel Langeveld of a few year ago, I am 100% certain that TSCP Gothic is much stronger than the TSCP Chess counterpart. Michel debugged TSCP and found several ways to improve it. Michel is the author of the NullMover chess program on ICC, so he is well versed in chess engine design.

TSCP Gothic has a rather "flat" evaluation function that searches very deeply. It outsearches Vortex in nominal depth, and its nodes/second is almost 3 times as fast as Vortex.

Vortex gains the upper hand in its new "extended extensions" code, which I have now fully debugged. Also, I think Vortex has the most elaborate Gothic Chess specific evaluation function, which helps a great deal.

I would like to run some engine matches between fmax and TSCP also, but my version of Winboard only is looking for chess engines, and it finds no Gothic Chess engines for some reason.

I must be doing something wrong and I'll play with it more on my day off tomorrow.

Victor Vector said...

Ed, in your opinion, what are the ratings of the programs you have seen so far? It's tough to keep track with all of the leapfrogging going on. You know, Vortex beats SMIRF, SMIRF has a new version, SMIRF beats Vortex, Vortex has a new version, Vortex beats SMIRF again, new programs show up and nobody knows how good they are and so forth.

GothicChessInventor said...

Well we'll have a nice sample to do a provisional rating after this tournament, certainly. If you include the results from the 2004 Championship and later, I would make the following guesses as to rating performances:

Gothic Vortex = 2375
SMIRF = 2100
TSCP Gothic = 2075
ChessV = 1650
Zillions = 1500

Pulverizer = unknown
Tornado = unknown
fmax = unknown

I base this on the fact that SMIRF has improved but TSCP Gothic has only gotten faster. So, TSCP has to be 400 points over its next nearest rival (it searches 1 million nodes per second after all!) and SMIRF is probably slightly stronger.

ChessV just plays weak Class C style, winning only when the opponent leaves something hanging. Zillions is the same, only weaker yet.

Vortex may be underrated, but I can't claim a higher rating without more proof. It lost badly to SMIRF in the blog game, after all, so it should account for this in its forecast of rating. If it goes 14-0 here without a misplay, then I think it will be closer to 2425. If more bugs are discovered and then corrected, time will tell what its true rating is.

This is all speculation of course, but it's my best guess at this point.

Smirf said...

Such Eló ratings could be calibrated by giving SMIRF's engine an 8x8 Chess Eló value by making a sufficient amount of traditional Chess games playing engines with known strength. This is possible, because SMIRF is using only ONE unique engine for all its variants. Unfortunately nobody is interested in doing that task, because SMIRF is neither a Winboard or UCI engine, and people no longer are willing to perform tests by hand ...

GothicChessInventor said...

I just performed the first official Winboard_F UCI engine match between the newest version of fmax (version 8s) and TSCP Gothic.

Each program lost as white and won with black under the tournament time control.

I will post a download link and instructions for those interested in downloading Winboard_F and TSCP Gothic as well as fmax.

H.G.Muller said...

I did a match between fMax and TSCP-G at 40 moves per 2 min time control (on Core 2 Duo 2,4GHz, so these are really long minutes :-) )

In 67 games TSCP-G clearly had the upper hand: 42+, 7=, 18-, and this was the old 32-bit version of TSCP-G as downloaded from this website.

Translated to a rating this would mean that fMax is about 125 Elo points weaker than TSCP-G. I have no idea how this translates to longer time controls.

GothicChessInventor said...

Hello H.G.,

On vacation and running matches I see! How dedicated!

You may have noticed one of the drawbacks about blogs. Once a post reaches a certain "age", replies to it are not seen as often.

That's why we also have a discussion board. Anyone can create a topic of discussion on there, and when you have something to let everyone know about, it is rather easy for them to see.

Discussion Board Link

Dr. Hyatt was the last person to join the board, so feel free to join as well!