Service Oriented Architecture (SOA) Defined
Service Oriented Architecture (SOA) seems like a vague, fuzzy term. At its core, SOA means separating software into isolated, reusable components (“services”) that are accessed through well-defined interfaces. It emphasizes loose coupling, so that instead of directly calling into another component (like when using a shared library, static linking, etc.), components communicate using a network protocol. In practical terms, this means that changes can be made to a service without affecting its clients (“consumers”), as long as the protocol remains compatible.
It is an “architectural style”, not an implementation. There are many ways to implement SOA, including REST and SOAP web services, language-specific implementations such as Java RMI, and older approaches using RPC frameworks such as CORBA (which dates back to the 90′s!). Some folks also decide to build their own service / RPC layer on top of messaging frameworks such as Protocol Buffers or MsgPack. Others may even go as far as implementing their own custom network protocols at the wire level. There is also Thrift, which we use here at Tracelytics.
SOA with Modern Web Applications
Many web applications start off as a monolithic code base, then move to a more distributed system using SOA. One example I’ve seen at another company: an application that generates reports from search engine marketing data. The original web application was written in PHP, with all processing, database queries, etc. occurring synchronously in the context of the HTTP request. Large reports could take several minutes (and, sometimes, hours!) to complete, and users would become impatient waiting for their browser to finally display the result. To improve the user experience, it was decided that reports would be run asynchronously, in the background, which allowed users to check on their status, receive emails when they were finished processing, archive the report for later, etc.
This was done by moving the report processing logic into a separate component: a SOAP web service implemented using JBoss, a Java application server. The front end web application remained in PHP. The PHP web app would send a HTTP SOAP request to the report service, requesting that a report be initiated. If there was too much load on the system (for example, more than N reports running in parallel), the report request would be queued for later processing. The user could continue working with the application and would receive notice when their report was complete. The system also performed better, since the new service code was more efficient than the previous PHP implementation, and the number of actively processing reports could be controlled, based on system capacity.
Also, it was a more effective use of development resources since two completely separate teams could develop the code: a front-end / PHP team, and a back-end / Java services team. These teams only needed to agree on the interface (in this case, the WSDL), which defined the method names, parameters, return values and any associated data structures.
How We Use SOA Inside Tracelytics
Tracelytics is also built on an SOA architecture. One example is our data-sharding service (internally known as “shardy”). When the web application needs to retrieve certain types of data, for example the heat map / latency data for a particular customer app for a particular time range (last day, last 7 days, etc), it makes a request to the data-sharding service. The service locates the data, which, depending on the time range, could be spread across dozens of tables and database servers (“shards”). It then queries it, running several SQL queries in parallel, performing the necessary aggregation (such as filtering and sorting of the merged data set), returning a single result set to the application.
This service was developed using Thrift, a framework for building services. Though originally started as a Facebook project, Thrift has since found a home at Apache. It is cross-language, meaning a service can be written in one language and called into by another. It supports C++, PHP, Python, Perl, Java, and many other languages. Currently, our service and client (web app) are both written in Python, but this compatibility means that there is no reason this couldn’t change in the future.
Developing a Service using Apache Thrift
Before developing any type of service (whether with Thrift or not!), one should design the service interface. This involves specifying the service calls, which includes the service name, method names, parameters, data types, and return values in a generic, language independent way. With Thrift, this is done by writing an IDL (interface definition language) file. There are many tutorials on how to do this, so we won’t repeat the same information here.
The IDL file is then used as input to a code generator (“thrift -gen …”), which generates both client and server stubs for you. Thrift takes care of a lot of the tedious work: you don’t have to worry about writing error prone network communications code. Of course, the server stub will then need to be modified to actually implement your service (the “business logic” that actually does the work of your application ).
Tracing Performance Of Thrift Services
If you’re using a Thrift service, you must monitor the performance of that service to see how it’s affecting the rest of your application. Tracelytics allows you to do this by customizing your client code. Internally, we’ve actually modified the Thrift code generator to make this automatic. Please let us know if you’re interested in this functionality!
(Photo Credit: southbeachcars)