Since I've posted the WSTest results I've been wondering what the results would be when using serialization methods other than XML and what is really the difference between Aalto and SJSXP XML parsers
I've set up a test using JSON and native Java serialization. The JSON bits are handled by the Jackson processor because it's claim to fame is that it's fast and in addition to that it supports easy binding to Java objects
As always the code is available for download
| Test Name | xml - sjsxp | xml - aalto | json - jackson | java serialization |
|---|---|---|---|---|
| GetOrder-20 | 8662 | 11632 | 11650 | 11465 |
| GetOrder-100 | 4509 | 5126 | 5651 | 7077 |
| EchoStruct-20 | 9441 | 12773 | 14788 | 12914 |
| EchoStruct-100 | 4133 | 5389 | 6796 | 7575 |
| EchoList-20 | 8610 | 12986 | 13889 | 13649 |
| EchoList-100 | 3846 | 5129 | 6067 | 7238 |
What are the conclusions?
For this tests Aalto seems to be about 30-40% faster than SJSXP (the bundled StaX parser).
Jackson is screamingly fast, it gives Java serialization a run for it's money, specially for small message sizes.
In my last post I've talked about my Java implementation of WSTest that can compete performance wise with the Microsoft implementation. My initial tests were performed using Virtual Machines and Windows2003 but since then I've managed to get my hands on a install CD of Windows2008 and a gigabit switch, so I finally could perform tests with some resemblance of a valid setup.
I still don't have server class hardware to play but at least I can run the tests on more than one machine. The web service host server is a Thinkpad T61p (Core2 Duo T7700 @ 2.40GHz, 4GB RAM), the client machines are a Macbook (Core2 Duo @ 2.16GHz, 1GB RAM) and a HP Compaq 2510p (Core2 Duo U7600 @ 1.20GHz, 2GB RAM)
I've used ApacheBench as the tool for generating load, example of use for the GetOrder test:
ab -c30 -n500000 -k -p post_files/getorder20.xml -H 'SOAPAction: "uri:WSTestWeb-TestService/GetOrder"' -T "text/xml;charset=UTF-8" http://172.17.1.100:8050/WSTest
The WCF implementation tested was the "WSTestSelfHost" (the numbers for the "WSTest_IISHosted" are lower) running on Windows2008 Std with all the latest updates and .NET 3.5SP1. The Java implementation runs on Ubuntu 8.10 with the Generic kernel using the sun-java-jdk-1.6.10 jvm.
| Test Name | Windows/.NET/WCF | Windows/Mina | Linux/Mina | Linux/Grizzly |
|---|---|---|---|---|
| GetOrder-20 | 5095 | 7660 | 8569 | 9962 |
| GetOrder-100 | 2773 | 3907 | 4214 | 4923 |
| EchoStruct-20 | 2808 | 8412 | 9939 | 11343 |
| EchoStruct-100 | 1582 | 3913 | 4170 | 4742 |
| EchoList-20 | 2520 | 7844 | 9200 | 10112 |
| EchoList-100 | 1346 | 3766 | 4124 | 4531 |
Some notes:
Update: I've update the values to include tests runs with the Mina based http engine, both on Windows and on Linux. Results for Grizzly on Windows are not included because Grizzly aborts/resets connections way to frequently when running on Windows2008
I've recently come across the Microsoft updated versions of the WSTest Web Services Benchmark and the .NET StockTrader Sample Application. They wasted no time bragging about the results :-)
Microsoft encourages people to download the benchmark kit and perform their own tests, so I did that. I will ignore the StockTrader App for now because is more complex to install and analyze, I will focus on the WSTest benchmark. The .NET/WCF results are very good and the guys at benchmark labs seem to really know their stuff. It's a pity that the benchmark choose to compare .NET/WCF against WebSphere, probably the most expensive, slow and cumbersome of all Java Application servers.
In the Java-Land there are faster solutions to choose from. I decided to implement my own version of the benchmark to verify just how fast or how slow can a Java implementation be.
The test is essentially a XML serialization/deserialization benchmark, so I picked the speedy JiBX as the framework for Java/XML data binding. JiBX is only as fast as the underlying XML parser and the fastest Stax parser that I know is the Aalto XML Processor. We will also need an HTTP layer and for this I really like the Mina Http Codec.
With all the ingredients in place it didn't took long to produce a benchmark implementation that doesn't suck :-). The code is available here.
And what about the results? Unfortunately I don't have a server class machine laying around for running proper tests. However, I do have VirtualBox and two virtual machines, one with "Windows2003 Server" that runs the "Self-Hosted" WSTest application and another with "Ubuntu 8.10 Server" that runs the Java implementation using the sun-java-jdk-1.6.10 jvm. Using soapUI as a load generator the "linux/java" setup runs circles around the "windows/.net/wcf", in some cases the throughput numbers are more than twice as high. Of course that these results should be taken with a truckload of salt. The tests should have been performed with a proper server machine, using Windows 2008 Server in the .NET setup and with several machines running the load generators. I would love to hear from someone that has a "benchmark lab".
Update: The http bits are now handled by Grizzly, the performance seems to be better.
Update: Check the follow-up post for a more detailed performance test.
I'm all set for PGDay, the European PostgreSQL conference.
It would be cool to be joined by fellow "tugas" :-)
Sapo Broker just got an huge performance boost.
There are two reasons for it. The first was the change from H2 to Berkeley DB. H2 is a very nice database but not adequate to a high performance message store. I chose H2 initially because of the ease of use and familiarity with SQL, but it just doesn't hold up against our usage patterns. I picked BDB only looking for increase stability under load, which I got, the boost in performance was a surprise. Tests in various scenarios show that BDB is rock solid and an increase in throughput that ranges from 100% to 300%!
The other source of performance improvement has to do with XML parsing. We now use Woodstox. I've heard of Woodstox, but I didn't imagine that the performance difference would be so significant compared with SJSXP - that comes bundled with the JVM. Zero code changes and an extra ".jar" file, that was the cost of having nearly twice as much throughput... I couldn't believe my eyes when I saw it.
Unexpected performance improvements.... I like them!
At work I'm constantly seeing the (anti)pattern of (ab)using the RDBMS as datasource. The main reason for not using a cache for data lookup seems to be the fear of serving stale data. This is easily solved if all access to the database goes through somekind of service layer, in this case the service layer must have the added responsability of updating the cache. If there isn't a service layer in place what is needed is a way for the database to send "cache eviction events" whenever there is a change in some record.
Pgmemcache is a solution to this problem but it's tightly coupled to memcached and in the future we may want to receive events not related to caching, also, altough memcached supports an UDP protocol, pgmemcache seems to require a persistent TCP connection to the memcached servers, which is something that makes me uncomfortable.
After reading the excelent documentation on how to write custom C functions I come up with this:
#include "postgres.h"
#include "fmgr.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
PG_FUNCTION_INFO_V1(publish_event);
Datum
publish_event(PG_FUNCTION_ARGS)
{
int32 port = PG_GETARG_INT32(0); // destination port
text *payload = PG_GETARG_TEXT_P(1); // message payload
int ret;
int fd = socket(PF_INET, SOCK_DGRAM , 0);
extern int errno;
struct sockaddr_in dst_addr;
int flag = 1;
int len = VARSIZE(payload) - VARHDRSZ;
setsockopt(fd, SOL_SOCKET, SO_BROADCAST, &flag, sizeof(int));
if ( fd < 0 )
elog(NOTICE,"publish_event: %s\n", strerror(errno));
inet_aton("localhost", &dst_addr.sin_addr);
dst_addr.sin_family=AF_INET;
dst_addr.sin_port=htons(port);
ret = sendto(fd, VARDATA(payload),len , 0,
(struct sockaddr * ) &dst_addr, sizeof(struct sockaddr_in));
if ( ret < 0 )
elog(NOTICE,"publish_event: %s\n", strerror(errno));
closesocket(fd);
PG_RETURN_VOID();
}
And this is an example of how this function can be used:
CREATE OR REPLACE FUNCTION item_trg_upd() RETURNS trigger AS $BODY$ BEGIN IF ROW(OLD.*) IS DISTINCT FROM ROW(NEW.*) THEN PERFORM publish_event(1234, '<id>' || OLD.id || '</id>' ); END IF; RETURN NEW; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; CREATE TRIGGER item_trg_upd AFTER UPDATE ON item FOR EACH ROW EXECUTE PROCEDURE item_trg_upd();
And that's it, whenever there is a change in some record in the "item" table a notification is sent. In order for this to work there must be local daemon listening in wichever port the message is sent to, something like this:
import static java.lang.System.out;
import java.net.DatagramPacket;
import java.net.DatagramSocket;
import java.net.InetAddress;
public class SimpleDaemon
{
public static void main(String[] args) throws Throwable
{
InetAddress inet = InetAddress.getByName("localhost");
DatagramSocket socket = new DatagramSocket(1234, inet);
// increase the rcv buffer size to minimize packet loss
socket.setReceiveBufferSize(1024 * 1024);
int mSize = (int) Math.pow(2, 16);
DatagramPacket packet = new DatagramPacket(new byte[mSize], mSize);
out.println("listening ...");
while (true)
{
socket.receive(packet);
byte[] receivedData = packet.getData();
int len = packet.getLength();
byte[] messageData = new byte[len];
System.arraycopy(receivedData, 0, messageData, 0, len);
String message = new String(messageData, "UTF-8");
out.println(message);
}
}
}
. Alternative serialization...
. WSTest, some numbers - Up...
. A fast implementation of ...
. Receiving events/notifica...