Communication protocol

This page provides details about the slow communication protocol used to communicate between the testbed, the services and experiments.

Catkit2 uses a standard request-reply server-client model for communication. The Testbed and each Service runs a server that can respond to requests about its API. To connect to a Service, the ServiceProxy first contact the Testbed to get a port Communication is done over TCP using ZeroMQ sockets using the REQ-REP pattern. A typical exchange goes like this:

The client sends a request message to the server. This message contains two parts. The first parts is a string indicating the type of request, for example set_property_request. The second part contains data for this request, in this case the property name and the new property value. This data is serialized using protobuffers which yields the binary data for the second part of the message.
The server receives this message and looks into its dictionary of request handlers for a handler for that type of message.
The server executes the request handler, which returns data, serialized by protobuffers, if the request was successful. In this case, the data would contain the new property value. If there was an error, the request handler can raise/throw an exception.
The server sends a reply to the client. This reply contains two parts. The first part indicates whether the request was successful, either containing OK or ERROR. The second part contains either the data returned by the request handler, or the exception message.
The client receives the reply, and raises/throws an error with the error message if the request failed on the server. Otherwise, the data is returned.

There are some implementation details that are worth mentioning here. Currently, the server performs all request handling on a single thread. This means that if there is a long-running request handler, the server itself doesn’t respond to new requests. Therefore, long-running requests should be avoided. Ie. there should be no command run_wavefront_control(num_iterations) that runs a few iterations of wavefront control, but rather a command start_wavefront_control(num_iterations) that starts the wavefront control loop on the main thread of the service. This way of thinking might require some getting used to for people not familiar with this way of thinking.

Secondly, the client reuses sockets as much as possible. It maintains a pool of unused sockets and when client->MakeRequest() is called, a socket from that pool is used to send to message to the server. If the pool is empty, a new socket will be created, which might take a tiny bit of time to connect. After the request is finished, the used socket is automatically returned to the socket pool. Therefore, during a request, the socket is exclusively used for that request, which avoids mixing of requests. If the server doesn’t respond to the request in a certain amount of time, then the request is considered lost. If, after this time, the server still sends the reply, it will be ignored. This is done internally by ZeroMQ using the ZMQ_CORRELATE option, which uses request identifiers to link received replies back to their corresponding requests.