So, I've been rethinking the architecture we used with a customer, and which we are planning on using, in slightly modified form, for some in-house projects.
The application is, basically, a cloud-based service which takes data from devices, analyzes and stores it, and presents it to the user in a web site.
We had decided, early on, to go with Java as our language of choice, both because it was what the customer was comfortable with, and because we felt like it offered the best performance/flexibility tradeoff (and, yes, performance is an issue here, especially on the data collection side).
The pieces of our architecture include:
- A RESTful web service, to both collect the data and to serve as an API for apps which want to access the data
- Some sort of queuing mechanism for taking the data in, and processing it on the back end asynchronously
- A database of some sort for storing the readings
- A database (possibly the same as the previous one) for storing the website data
- A notification framework for sending notifications (emails, SMS, etc) of various events from the devices
Now, one of the things I've learned about architecture (or, at least I thought I had learned) was to only use what you absolutely need, nothing more. A lean architecture is a good architecture. However, we made some decisions early on which, now, I feel like violate those principles.
We opted to put the code in an app server for various reasons. We needed a Servlet engine to facilitate the web service, but we decided to use JBoss AS 7 as the app server because we'd all had experience with it in the past, and it was being used by the customer currently. We felt like the container could manage all the "plumbing" (threads, database connections, etc), rather than dealing with it ourselves. I think that decision came more from old architectural design assumptions ("three-tiered good, two-tiered bad", "managing resources is hard") which are no longer really relevant.
However, more importantly (and I don't mean to bash JBoss here, I think this would be true of any app server), we ended up severely limiting our "functional" scalability by using an app server. Let me explain:
When I refer to "functional" vs. "theoretical" scalability, I'm referring to the fact that, while JBoss is certainly very scalable in theory (you can cluster the instances, and add new instances dynamically), in practice, it's really frickin' difficult to scale. The reason: you can't just "add another instance", you need to go through the hell that is cluster configuration to add a new instance, which means, literally, making on the order of a dozen changes to the stock JBoss configuration files.
To put an even finer point on it: the question you should ask yourself in terms of scalability is: if I were to get a call at 2 AM that we need to scale up because the application's pegged, could I scale it quickly and easily, even half-asleep? The answer in this case would be a resounding "No!".
A lot of this came out of a gig we had auditing another consulting company for a customer. They hired us to talk to the other company with them, because things weren't going well and they needed a second opinion on whether they were being "took" (we said "yes, kinda"). However, the most interesting thing was that they had a very similar architecture to ours, but it was seriously slow, so they were constantly having to add in servers in the back end. The sys admin who worked there (and who was probably the smartest guy in the company, as far as I could tell), pointed out that starting up new instances was trivial, because their backend server was a simple process - no app server, no nothing. He could start it up from the command line if he wanted.
That's how it all should work! It's the Google/Facebook architecture vs. the old-school "Big Iron" approach. It was a serious "Aha!" moment for me.
So, what are we getting from using JBoss? EJBs? We are using them, but we don't need them (they're not on the list I mentioned above) - the web service should be our internal as well as our external API. Thread pools? They used to be complicated, but since JDK 1.5, it's trivial. Database connection pools? Same thing. Caching? We're using a third-party caching mechanism, so no.
Time to ditch JBoss.
The queue that comes with JBoss is HornetQ. It's supposedly a very fast queue, and it comes embedded in JBoss. However, there are several issues with it:
- It, also, doesn't functionally scale well. You can cluster it, but it's painful, and a configuration nightmare, just like JBoss
- It doesn't handle having the queue fill up very gracefully, in my opinion. While I was doing load testing of the app, sometimes the back end would not quite be able to keep up, and when a large number of journal files ended up in the queue, it froze, and needed not just to be restarted, but to have the journal files cleared out first.
- It's not trivial swapping out a different queue in JBoss. It's doable, but not very well documented, and it's hard to say how JBoss would handle it.
Another questionable architectural decision (and I can say this because I was one of the big advocates) was to use Spring. Now, don't get me wrong - I love Spring, but it has a tendency to take over once you start using it. We had even talked about, early on, how we weren't going to use Spring for "everything", but we ended up using it way too much, IMHO. Plus, I think if we rearchitect everything to be in terms of standalone services, we'll have a lot less need for that sort of configurational complexity.
Emphasize the Positive
So, those are the things I think we should ditch. What should our architecture look like? I think we could function well with the web services being in something like Tomcat or Jetty, with everything else being standalone Java services (i.e., no containers, just POJOs with a main()).
That takes care of the big pieces, but what about the glue to put it together? I was looking around for scalable, distributed queues, when I came across Kafka. It's not really a queue so much as a pub/sub framework, which you can use like a queue. However, once I started playing with it (which will be the subject of a different post), I came to the realization that it could be used for more than just the queue - it could be the communication mechanism between the services, basically a communication bus for the architecture. And, it's functionally scalable, because you can add new server instances dynamically using ZooKeeper. We could also reuse ZooKeeper as our own service locater service to allow us to dynamically add new services.
Now, suddenly, things are looking very different. Rather than an app server with all our code deployed on it, we would have compact, self-contained services which communicate via Kafka, find new instances via ZooKeeper, and which scale by just starting up new instances to take on the load. Doing this then implies that the web service front-end would have to be very thin - just a façade for taking in web service requests, which would then be delegated to the back-end services.
The key would be to define the services well, so they would be small, light, decoupled, and cohesive.