Sockets on OpenShift
In this article, a followup to my previous post regarding long-poll servers and Python, we investigate the code changes that were necessary to make the code work when deployed on OpenShift.
In the previous post, we implemented IO polling to watch for client disconnects at the same time we were waiting for messages on a message bus:
poll = zmq.Poller()
poll.register(subsock, zmq.POLLIN)
poll.register(rfile, zmq.POLLIN)
events = dict(poll.poll())
.
.
.
If you were to try this at home, you would find that everything worked as described…but if you were to deploy the same code to OpenShift, you would find that the problem we were trying to solve (the server holding file descriptors open after a client disconnected) would still exist.
So, what’s going on here? I spent a chunk of time trying to figure this out myself. I finally found this post by Marak Jelen discussing issues with websockets in OpenShift, which says, among other things:
For OpenShift as a PaaS provider, WebSockets were a big challenge. The routing layer that sits between the user’s browser and your application must be able to route and handle WebSockets. OpenShift uses Apache as a reverse proxy server and a main component to route requests throughout the platform. However, Apache’s mod_proxy has been problematic with WebSockets, so OpenShift implemented a new Node.js based routing layer that provides scalability and the possibility to expand features provided to our users.
In order to work around these problems, an alternate Node.js based
front-end has been deployed on port 8000. So if your application is
normally available at http://myapplication-myname.rhcloud.com
, you
can also access it at http://myapplication-myname.rhcloud.com:8000
.
Not unexpectedly, it seems that the same things that can cause difficulties with WebSockets connections can also interfere with the operation of a long-poll server. The root of the problem is that your service running on OpenShift never receives notifications of client disconnects. You can see this by opening up a connection to your service. Assuming that you’ve deployed the pubsub example, you can run something like this:
$ curl http://myapplication-myname.rhcloud.com/sub
Leave the connection open and log in to your OpenShift
instance. Run netstat
to see the existing connection:
$ netstat -tan |
grep $OPENSHIFT_PYTHON_IP |
grep $OPENSHIFT_PYTHON_PORT |
grep ESTABLISHED
tcp 0 0 127.6.26.1:15368 127.6.26.1:8080 ESTABLISHED
tcp 0 0 127.6.26.1:8080 127.6.26.1:15368 ESTABLISHED
Now close your client, and re-run the netstat
command on your
OpenShift instance. You will find that the client connection from
the front-end proxies to your server is still active. Because the
server never receives any notification that the client has closed the
connection, no amount of select
or poll
or anything else will
solve this problem.
Now, try the same experiment using port 8000. That is, run:
$ curl http://myapplication-myname.rhcloud.com:8000/sub
Verify that when you close your client, the connection is long evident in your server. This means that we need to modify our JavaScript code to poll using port 8000, which is why in pubsub.js you will find the following:
if (using_openshift) {
poll_url = location.protocol + "//" + location.hostname + ":8000/sub";
} else {
poll_url = "/sub";
}
But wait, there’s more!⌗
If you were to deploy the above code with no other changes, you would find a mysterious problem: even though your JavaScript console would show that your code was successfully polling the server, your client would never update. This is because by introducing an alternate port number to the poll operation you are now running afoul of your brower’s same origin policy, a security policy that restricts JavaScript in your browser from interacting with sites other than the one from which the script was loaded.
The CORS standard introduces a mechanism to work around this
restriction. An HTTP response can contain additional access control
headers that instruct your browser to permit access to the resource from
a select set of other origins. The header is called
Access-Control-Alliow-Origin
, and you will find it in the pubsub
example in pubsub.py:
if using_openshift:
bottle.response.headers['Access-Control-Allow-Origin'] = '*'
With this header in place, your JavaScript can poll your OpenShift-hosted application on port 8000 and everything will work as expected…
…barring bugs in my code, which, if discovered, should be reported here.