API Design is not a joke
In the past I have posted some articles on API design and they can be found here. It's such a vast topic and the fact that is depends a lot on the business of the client that it's hard to describe how to build a complete API in a blog post. However, in this particular case, I will share my experience where a small change from ../products/search or ../orders/search to ../search saved the client not just a lot of effort and hence money upfront but it also scaled the app in a way that 30% of the traffic got diverted to the new system. This helped scale the app to tens of millions of hits a day more than what it was capable of handling.
This is how it started. A senior dev came next to the architects' zen right before we were going to head out for lunch(where 3 architects from the client and I, the only consultant, take pride in using and sometimes mis--using our whiteboards and write some great foundational code plus API architecture). Question was simple.
"I am using Web API and would like to create a method each for 'Search' inside my API controllers. For example, if there is a Products controller, I'd like to have a method 'Search'. I'd like to do the same for Orders, Customers and any other controller that requires any search functionality."
My fellow architects seemed to take it lightly and were almost ready to consent that one of them asked what's the reasoning behind this approach. The answer was "It's simple. Also, since I am using Entity Framework 6 code first to make queries, I'd like to keep the entities specific to Products in a Products Controller, Orders in an Orders Controller and so on. It also is the right thing to do given my understanding of Single Responsibility Principle (SRP)".
This is where I sensed something wrong in the reasoning.
- First, to me it appeared as if it's not simple. It's actually deceptively simple.
- The fact that the solution already assumed or had a constraint on a particular technology and was the reason behind the architecture suggests that somewhere or the other the developer had this as his constraint. This is not a criticism of the developer as this is something we deal with on a daily basis and is important to consider this. But as an architect and more as a consultant I'm supposed to do the right thing not just for the present but also keeping in mind the non--functional requirements like scalability, maintainability etc.
- I sensed that it was in agreement with SRP but might not be in agreement with other software engineering principles.
Well, so by now I'm sure you would have guessed that there ain't any lunch happening :)
My developer was right in understanding and implementing SRP to an extent as SRP says 'There should never be more than one reason for a class to change". Great! There isn't. Anything to do with Products creation, updating or retrieval is going to be the responsibility of the Products controller and in a way it only deals with one responsibility. Some people might argue that retrieval and creating/updating are different responsibilities but for the sake of simplicity let's keep it that way.
The major issue however, is, the principle of DRY or Don't Repeat Yourself. Imagine, you have to write a similar kind of code in all these controllers. After all, all they are doing is search and if designed correctly they should have a similar way of retrieval based on some kind of query string. If that’s the case, let's just assume we are to rip out Entity Framework later in favor of another ORM or just rip it out completely as we might be ripping out Sql Server altogether. The counter argument is -- 'When was the last time you had to do that?'. Well, not quite surprisingly, in about 7 times in last 4 years. And towards the end of the article you will surely see the benefit of polyglot persistence and how I've been able to save millions of dollars to my clients just by something like this and how I have been able to show a record percentage growth in sales after implementing some of these techniques. Again, I am a consultant and I don’t own the IP so I will keep it sacred by not mentioning any of the clients. For the sake of an example, we will discuss a hypothetical case study from Best Buy as it's a name familiar to all of us and I haven't worked with them yet so it's safe to talk about it.
Again, what if we don't want our search queries to go to a SQL Server in the first place? How about creating a better user experience for the customer and speeding up the first few pages for him by speeding up search?
Hypothetical Case Study: The CXO of Best Buy has been tasked to increase sales of Best Buy by 30%. Increasing sales by 30% could mean a lot of things. A lot of data has to be taken into account. Something like how many users arrive to our website daily? How many new users buy goods online every day/month/year? How many old users buy once or more than once a day/month/year? How many users drop off by the time they find what they need and have seen the price? How long does it take for them to arrive there? How many users actually leave before they see something really tangible? And a lot more questions.
However, you'll be surprised that in today's world the user doesn't have the patience to wait. If I am best buy I'd like to gauge what are the competitors saying about the same problem. So sharing what Amazon and Google have to say about this.
- Amazon - Just an extra 1/10th of a second on their response times will cost them 1% in sales.
- Google - ½ a second increase in latency caused traffic to drop by a fifth.
Psychological studies have proven that anything above 200ms makes the user feel that he has been waiting.
Wow! This is huge. That means my first task is to at least match what my competitors have. So, here is my argument. It appears that if I am a user on the Best Buy website and if all I am looking for is a 24" monitor, I personally don’t care about anything Best Buy has to show. However, the best user experience for me should be a fully functional and intuitive 'Search Box' which is loaded and ready to use before the rest of the scripts fire on the page. I really want the search queries on the server to be less than 15ms and adding about 40ms average latency my response time should be less than 55ms.
I am sure I don't have to convince any of you smart techies reading this article about why that is going to increase my sales but I'd like to outline the workflow here.
- The user searches for a 24" monitor.
- Even before the scripts load on the page the search box is ready to go and has a highly intuitive, responsive, smart and fast auto-complete which pulls up some relevant data.
- If the user hits the 'Search' button the user should be able to see enough details about the monitor including a thumbnail, a small description, make, model, price etc or whatever information he needs to make an informed decision.
- May be as a company I would want to show him a 27" monitor which is on sale. This is called 'boosting'.
- Lastly, I'd like to may be show some recommendations on what similar products he could buy.
Trying doing this with your RDBMS database and in a traditional web app.
- The RDBMS is already having a lot of load assuming you are a website like Best Buy and we all know that they are good for a lot of things especially ACID transactions but can't scale out easily. Last thing you want it for your database to be a bottleneck and just in order to favor ACID you end up having to deal with the acidity (lol) caused by the RDBMS.
- A lot of people end up creating different services to scale the same solution. The services are still hitting the same database. Services do help in creating granular pieces of code and they are easy to maintain and helps with code being written by independent teams, however, it still doesn't help scale your app.
- We all know the RDBMSes are more robust, and provide some great advantages but I guess at this point of time speed is the Number 1 consideration.
- We can always fall back to an RDBMS after the customer has searched the monitor he really needs and once we've locked him down there is no harm in using a robust but slow RDBMS for the ecommerce part of the transaction.
So, the approach is simple. We need a no-sql search database and we can pick between Lucene, FAST, Solr and Elasticsearch. I generally pick Elasticsearch but there have been times when I have used and had great success with the rest of the 3 too. Whereas Elasticsearch does help the developers be able to use it fairly quickly, my personal experience is that you still need a search expert as it's easy to ignore some important considerations when you are dealing with abstractions like Solr or Elasticsearch.
So, in my case I'd recommend Elasticsearch because
- I can speed up the customer search experience and for the most part keep it under 100ms for every request he makes.
- Since this number is less than half of 200ms, I can be assured at least from the Technical Architect's point of view that the reason for not buying won't be technical anymore. It could be price related, product related but surely not related to the user experience.
- I am sure I can take away at least 30% of the traffic away from my RDBMS and since Elasticsearch is built with scalability in mind I know that it can scale out with commodity level machines without a huge expenditure on some top hardware that's needed in case of an RDBMS.
- We can still use the RDBMS for the eCommerce and other important parts of the user experience.
- We've already saved a lot of money and have prevented our RDBMS from becoming a bottlenect at least for the major part of the Best Buy website which is searching and buying products.
- Since we can scale with commodity level machines adding data to Elasticsearch is not a problem at all. We can duplicate as much data as possible and we can create much more efficient indexing strategy that gives us better results from sql indexing.
- The work required to do this is also less. So, we save on development costs.
- Adding features like recommendation engines, boosting, faceting etc are a lot of work with an RDBMS whereas with a Search database we get all these features out of the box.
- Elasticsearch is free.
- And the best part I don't even need an ORM.
By the end of two hours I was able to convince the architecture team of the approach. Now does the url matter? Yes, it does. Whereas you could keep the URL to be ..products/search or ../orders/search it makes more sense to
- Keep it simple: Anytime we have search now we have the user going to one place. Something like http://bestbuy.com/search? ………(followed by a generic query string). This approach creates a standard way of querying. SRP: The search controller now has one and only reason to change.
- We are not constrained with an existing framework like Entity Framework or even the related RDBMS.
- We have already figured out a way to scale this solution.
- While we fixed this problem we have also reduced for a while, the need to partition, purge and/or any other hacks we would have to apply to our RDBMS with increasing data.
- The solution is futuristic, takes less development time, improves user experience, is faster and will surely lead to increase in sales. So, the business is happy too :). After all, they are the ones who pay the bills.
Conclusion: When we do API design one thing to keep in mind is that the person designing the API should not just be technical alone, he/she should be well versed with the business or work with the business as 50--50 partners to make the design a success. With Web APIs there are a lot of ways things can go wrong and someone needs to think through these things way beforehand.
If we had gone the route suggested by the developer and had to rewrite this after a years' worth of hundreds of thousands of developer man hours we could have not just lost millions, but would have also run into a solution where the RDBMS would not scale beyond a point or become costly to scale.
I think skipping one lunch to save millions is worth it.
PS: Here's the closest to the actual architecture I end up proposing but again it encompassed not just the Search elements but evidently a lot of other problems we were trying to solve. We have scaled it to more than billion hits a day and will be talking about it at TechEd Europe on Oct 27, 2014.
Again, it's hard to get the real amount of detail out on blog posts as most of the projects I work under are NDA projects and very sadly, I can't talk about the best projects I work on, on my website because of that reason. However, I wanted to get the point across. If you'd like to discuss your personal problems please feel free to contact me.