GT3 security performance evaluation
Olle Mulmo, PDCI have a created a small test service with GT3, and then turn on security. What do I see?
You see exactly what the people in theLCG Grid Technology Area at CERN showcased in their tests with their DummyService tests: response times will increase by a factor 10. In addtion, it takes 5 seconds or more to interact with the service factory and create a new service.5 seconds? Surely, this can't be?
The short answer is unfortunately yes. There are many factors that contribute though: I have found several of them and will try to explain what I have found as thoroughly as possible.
NOTE: Actual timings don't mean a thing as they only apply to the computer you run on. Below, you will find the timings I measured on my instrumented GT3 installation running on my particular hardware with my particular version of the JVM. The important thing to remember from the figures are their RELATIVE sizes.
Timings are denoted in milliseconds throughout this document. Prefixes K
and M denote kibi and mibi, not kilo and mega.
So the service creation takes several seconds? It must be an overly complex operation?
Actually, the service creation process itself takes about 50 msec. It's neglible.What hits you and accounts for the many seconds is the initialization of the underlying tooling in your client: initializing the Axis handlers and the XML security libraries account for the major part of this time.
In addition, there are many other one-time operations happening,
such as initializing the secure random number generator, loading the
proxy certificate and your trusted CA certificates from disk,
loading the WSDL defining the factory service porttype, and finally
establish a security context with the server when using
GSI-SecureConversation (which you tend to do). None of this is
performed if you were to create a second service instance in the
same security context.
OK, but there's still a factor 10 penalty on my method invocations regardless of this. Is it the complicated and expensive crypto operations, perhaps?
What I tried to showcase (but failed miserably) in an earlier email was that the XML security implementation is broken in that it is generating a huge overhead: parsing the surrounding XML is 2-3 times more expensive than the crypto operation itself.The DummyService tests make use of HMAC, a lightweight form of creating a digital signature to ensure message integrity. Another alternative is full-blown encryption. The table below displays characteristics for both. The case of no encryption (plain) is shown as well.
Security | Payload (b) | SOAPsize (b) | T(engine) | T(roundtrip) |
| plain | 16 | 481 | 0 | 15 |
| plain | 16000 | 16465 | <10 | 23 |
| plain | 160000 | 160465 | 20 | 106 |
| HMAC | 16 | 1390 | 40 | 120 |
| HMAC | 16000 | 17490 | 70 | 250 |
| HMAC | 160000 | 161400 | 360 | 1100 |
| Encr | 16 | 1570 | 30 | 75 |
| Encr | 16000 | 23200 | 80 | 220 |
| Encr | 160000 | 218050 | 500 | 1380 |
SOAPsizeis the average size of request/response messages for anService.echo(Payload)operation: the incoming and outgoing messages differ by only 50-60 bytes. Encrypted data gets Base64-encoded which increases the the wire message sizes by 33%.
T(engine)is the time spent in the Axis engine itself on the server. Time spent (de-)serializing the SOAP messages and doing the HTTP transfer is NOT included, but can be derived from the roundtrip time.
T(roundtrip)is the total roundtrip time, measured on the client side.
- The
EncrandHMACtimings do NOT include the time for establishing a shared security context between client and server (GSI-SecureConversation).
How can you say that? Clearly, I see figures differing by more than a factor 10 in that table!
Yes, but remember that the load on the client is the same as the load on the server! If you look at the size-16 case for, you get a penalty of 40 msec on both ends in the HMAC case. The enveloping of the encrypted or signed data enlarges the message size, but I wouldn't attribute too much of the overhead to slower network transfer and XML (de-)serializing.I think this is where the DummyService tests made were WRONG: they put many clients on the same machine, expecting to create a huge load on the server. In fact, the load on the client machine was just as high, and this may attribute to the fact that the CPU utilization on the server was not 100% when it reached "max".
Notice also that the poor implementation of XML security hits you
4-fold on the roundtrip time, as you need to encrypt and decrypt on
both client and server side, in sequence. Thus, for every msec of
overhead that we can save by improving the implementation of this
library, we will gain 4.
But these figures are still in the millisecond range: how do you attribute my factor of 10 slowdown?
The table does not include the overhead for establishing a shared secret used to encrypt the data, and this is what hits you.
The handshake between the client and the server is an exchange of 3
roundtrip messages in total. In my case, the size of the SOAP messages
sent were {request,response}:
{787,3275} {2606,787} {670,686}
This sequence of messages takes roughly 150 msec to complete.
BUT, this is for establishing a context only. In the
DummyService case, the client performs a credential delegation to the
service instance as well; This costs an additional roundtrip. The exchange
pattern and corresponding messages sizes in my case were:
{787,3275} {2606,787} {670,1131} {1322,686}
A handshake with delegation implies much more work for the involved
parties as it involves the creation and validation of a RSA key
pair: the four roundtrips takes roughly 400 msec to complete.
But I already had my secure context established -- I did that when talking to the factory!
No: you establish a new context for each new port type reference that you make use of.In the case of the secure DummyService client (source code), this means the following message pattern:
- Factory:
- Establish context with DummyServiceFactory port type
- invoke
create()
- Service:
- Establish context with Dummy port type, perform delegation
- invoke
echo() - invoke
getTime()
- invoke
destroy()on GridService port type
Notice the BUG in the client program: the service instance is
destroyed without any security! Furthermore, since the DummyService
does not extend the GridService port type, the client needs to grabs
a new reference to a GridService port type from the same service
locator. The new reference would need 3 additional roundtrips in
order to establish a new security context with this new port type,
before invoking destroy(). (One can argue heavily that
all this is nothing but a good example of stupid shortcomings in the
tooling.)
Huh? I thought I would understand more by reading all this?
The following is a bit misleading and wrong, but makes for a good conceptual summary of what happens: When using security, you get a factor-3 performance penalty when you invoke a single message. Furthermore, you invoke 3 times as many messages: multiply them together, and you have your slowdown of a factor-10.Anything else?
Clearly, the tooling does not handle the simple use case very well. The initialization overheads are way off the chart, and invoking a single method securely increases the overhead by a factor 10.The tooling works much better in scenarios where initialization cost is neglible, for instance when you perform hundreds of invocations using the same security context: The overhead can then be measured in additional 10ths of milliseconds for small payloads.
On a related note, Java Hotspot works quite well on GT3: while performing
hundreds of invocations on the same service instance, you will gradually
see improvements on both client and server side that eventually cut as
much as 40% of the roundtrip time.
Conclusions?
Not really. Clearly, we need a solution for the invoke-once-then-die usage scenario which the current tooling is simply not built for. It will need some serious thinking on how you can go about to fix this.In the meanwhile, I suggest we concentrate us on the XML security library and its duplicate parsing and internal use of XPath queries: any millisecond saved there will cut the roundtrip time by 4.
Notes:
Security costs:
Using of SSL imposes a performance penality. To place the above security ananylsis in context, the following numbers, taken from this webpage, give an indication of how a server performs when insecure and secure connections are made to it. No SSL accelarators were used for these numbers.A study by Networkshop showed a Pentium server with Linux and Apache supporting 322 unsecured sessions, when SSL was turned on; the connects per second decreased to 2.4.
GT3 Security Code:
One of the issues that was apparent in profiling GT3 security code, the conversion from Axis representation to DOM flagged as expensive operation. In the Axis 1.2 release, the SOAP API will implement the DOM API and hence the conversion will not be required.Apache XML Security Libary:
- The library has initialization code which reads the configuration file and parses it. This is quite expensive, but is invoked once per JVM. This affects the client side the most, since it is a good percentage of the round trip time.
- XPath queries on the configuration files (appear to) have been reworked.
- In case of GSI Secure conversation with encryption, an XPath query has been eliminated, so should improve performance.
- In case of GSI Secure conversation, the optimizations have been added to reduce size of the context.
Security Mechanisms:
GSI Secure Conversation, especially when used with delegation, is an expensive operation. It is recommended that this mechanism not be used unless delegation is required.
For all operations not requiring delegation, GSI Secure Message would be an option which has lesser performance penalities. This currently has only integity (signature) option.
Round trip numbers:(server and client run on separate machines)
SecurityPayload (b)T(roundtrip)plain 16 13.56 plain 16000 53.53 HMAC 16 275.51 HMAC 16000 626.09 Encr 16 163.58 Encr 16000 603.72