Here in GO-MMT, we are running our services on HTTP with JSON as payload. As we already discussed in my last post latency was always a concern for us and some third parties are coming to us for Protocol buffer response.
That gave us one more reason to migrate to gRPC and gRPC is more compatible with Protocol buffer and work on http2.
Why Protocol buffer?
So what’s the benefit of proto over JSON and how it’s going to solve our bottleneck? As per multiple benchmarks, proto buff performs better than JSON.
HTTP vs HTTP2
I will not spend time in advocacy of http2 as it’s the clear winner another wise why http2 is developed in the first place. I have added references in the footnote.
- Single Connection. Only one connection to the server is used to load a website, and that connection remains open as long as the website is open. This reduces the number of round trips needed to set up multiple TCP connections.
- Multiplexing. Multiple requests are allowed at the same time, on the same connection. Previously, with HTTP/1.1, each transfer would have to wait for other transfers to complete.
- Server Push. Additional resources can be sent to a client for future use.
- Prioritization. Requests are assigned dependency levels that the server can use to deliver higher priority resources faster.
- Binary. Makes HTTP/2 easier for a server to parse, more compact, and less error-prone. No additional time is wasted translating information from the text to binary, which is the computer’s native language.
- Header Compression. HTTP/2 uses HPACK compressions, which reduces overhead. Many headers were sent with the same values in every request in HTTP/1.1.
- Encryption: Most browsers supporting HTTP/2 will require HTTPS encryption. So may add a bit overhead in certificate validation.
- Potentially wasted bandwidth: Because the HTTP 2 protocol allows servers to push anticipated assets, you might be wasting bandwidth. Just because a request to load a web page might need an asset, it doesn’t mean it will. You could be sending data that isn’t needed and thus wasting bandwidth.
So we started developing our solution migrating business logic to proto, Developed a service agreement between 2 services.
It’s time to go live. But it was not easy as a surprise came.
- As gRPC works on netty, it doesn’t support access logs.
- AWS doesn’t support HTTP2 Load balancing :(. i.e. it doesn’t have support to do health check over gRPC protocol
For the first problem, we used interceptors here.
While researching we came across the concept of service discovery. There are multiple solutions available in the market such as Consul, Apache Zookeeper, Envoy, etc. But as we have consul already in our infra and Consul support gRPC health check. So We decided to move ahead with the consul-based approach.
Prediction: By going with consul we were removing a hop for the byte transferred between layers(No ALB), So we may get some advantage here.
How this all works.
So once a consul server is setup the Client and server need to connect to the consul server and should rely on the same service name.
As soon as any server joins the cluster it registers itself to consul under the same service name and a unique service id will be assigned to it. Here server itself generates the unique id i.e. IP:PORT:SERVICENAME. How a server know itself what’s its IP? and what if multiple docker containers are running over the same IP? For that, we have an IP port combination and for IP we have written a bash script that connects with AWS service to get the IP address of the host machine and set the ENV variable and our application read these values and generate the unique ID.
Whenever a server registers it has to pass the ID(Sever Unique ID), health check interval(time interval on which consul health check of the server), health URL(Health endpoint of gRPC), and deregister interval(if the server doesn’t respond for deregister interval then consul will remove this server from consul registry.)
Once client-server boots it connects with the common consul server using the same service name. Consul then returns a list of registered servers against the same service-name. The client caches this information in memory and routes request using a round-robin algorithm[load-balancing].
Health Check: The client also takes care of the health check of servers every client poll the consul every X(configurable) sec and gets the servers list and compares the result with the cached result. In case there is a difference between the result(Upscaling/Downscaling/Abnormal server). It will find the unhealthy server(downscaling) and run a fast polling health check i.e every 500 ms on the unhealthy server for Y(configurable) retries. In case the server is still unhealthy then the client will fire deregister event from the client-side and load balancing will occur across all peer clients. So in the worst case, an unhealthy server will stay in the cluster for Y*500 microseconds.
Graceful shutdown of server: In case of AWS downscaling, server fires deregister event to consul and client polls consul and update its in-memory cache. What happens to request which are already in queue? They will result in a 5XX error. To solve this issue we added a shutdown hook in our application with some delay as soon as the server receives the shutdown event it deregisters itself from the consul but waits for few minutes before going down meanwhile serving inflight requests.
UnGraceful shutdown of server: What if the server is having some memory issue or threads are choked and health check is failing? In such cases, clients come for the rescue and do fast polling on the unhealthy servers and raise remove events to the consul.
But what if this server came back alive? The client has already removed that server from the cluster but if it gets back to work but no new request will land here and this is waisting AWS resources.
To resolve this we added a poller at the server end as well. It’s not quite frequent as a client. So it checks self-health on consul if the consul says the server is not there or unhealthy then it’s fire register event once again and joins the cluster back
Migration to AWS ALB HTTP-2
In July 2020 AWS comes with the support of health check over gRPC and there was some limitation in our system as well. So we planned to move to AWS.
Limitation of the existing system
- Transparency: Target registration and deregistration is not as transparent as AWS ALB
- Dependency over Application: Integration with ECS or Docker orchestration needs to be handled in the application and Container movement, due to infra scaling needs to be handled by the server & client.
- Race Condition: Consul relies on rest endpoint for register/deregister and a little latency from the consul may impacts our systems so it is the single point of failure.
- GracefulShutdown: Graceful shutdown was a kind of hack in the implementation you can’t predict when your inflight request queue will drain.
- Unhealthy Server Movements: In such cases when the server rejoins the cluster you can’t guarantee that server is joining in the worst case you will keep retrying.
- Information duplication: We are maintaining server info in two places infra level when servers are invoked and consul where the client was taking the decision when to remove the server. So there was no way to synch consul info to AWS.(Consul have a solution to this)
Pretty much simple to set up compared to the consul approach, and also less support as AWS requires inbuilt health-check/load balancing functionalities.
Below are the steps required for setup
How we went live with AWS ALB.
In the migration phase, we are running rest service and gRPC parallelly. We have exposed two ports one for the rest one for gRPC, and configured 2 health endpoint on a single pool but with two different target groups. Whenever a request is coming on port 80 it’s redirected to the rest endpoints on http1 and when on 443 redirected to the gRPC endpoint on http2.
- Create SSL context:
GrpcSslContexts sslContext = GrpcSslContexts.forClient().trustManager(new File(getCertPath())).build();
- Create Channel:
ManagedChannel channel = NettyChannelBuilder
- Configure Health Check:
Head to your AWS prod using OneLogin and head to EC2. Click on Target group and configure health check. You can give a health check-in form of a package. service/method for example
- Server-side streaming, Now we can send larger chunks in streaming.
- 10% reduction in CPU cores
- 100 ms improvement in 95th percentile
There is still scope of improvement as there may be more scope of gRPC which we haven’t explored yet.
- AWS gRPC health check only works on HTTPS 😵
- In case we run proto on two separate tech stacks for example go and java. and we maintain the SSL context using cert at our end you have to always provide an implementation for different tech stacks.
- In case we are running on multiple channels we will get SSL context overhead.
Beating JSON performance with Protobuf