Creating a Custom Load Balancer and Service Discovery System
NOTE: This article is translated using AI from the original Croatian article and may contain errors.
Introduction
For my faculty project, I chose to research existing load balancing and service discovery solutions and develop my own solution. In this article, I’ll describe which technologies I researched and how these technologies inspired me to create my own solution.
Existing Solutions
1. Docker
For the simplest example of a service discovery system, I took Docker networking. Docker networking allows containers to see each other through container names. I created an example of using Docker networking in the project repository.
In this example, I created two services: docker-server
and docker-client
.
Both services are Java applications using Spring Boot, and their Dockerfiles look like this:
FROM openjdk:21-slim
COPY target/*-0.0.1-SNAPSHOT.jar /app.jar
ENTRYPOINT ["java", "-jar", "/app.jar"]
In the docker-compose.yml
file, I defined one instance of the docker-server
service and three instances of the docker-client
service.
Since Docker networking allows containers to see each other via container names,
and container names must be unique, this means I actually have “four services,” each with its own name.
services:
docker-server:
image: docker-echo-server
networks:
- echo-network
docker-client1:
image: docker-echo-client
depends_on:
- docker-server
environment:
- TARGET_SERVER_ADDRESS=http://docker-server:8080
networks:
- echo-network
docker-client2:
image: docker-echo-client
depends_on:
- docker-server
environment:
- TARGET_SERVER_ADDRESS=http://docker-server:8080
networks:
- echo-network
docker-client3:
image: docker-echo-client
depends_on:
- docker-server
environment:
- TARGET_SERVER_ADDRESS=http://docker-server:8080
networks:
- echo-network
networks:
echo-network:
driver: bridge
Advantages
With this approach, we got only a service discovery system, but not a load balancer.
The docker-server
service doesn’t need to have an open port because we limited it to communication only within our Docker network.
The docker-client1
service, which is in the same network as docker-server
, can communicate with the docker-server
service via the container name.
This means that docker-client1
can call docker-server
via http://docker-server:8080
, and Docker networking automatically redirects that call to the correct IP address and port.
Port 8080
is an internal port, and it isn’t exposed to the host machine unless we explicitly request it.
This is quite a simple solution that can be sufficient for some simpler applications. It’s important to be familiar with this solution as it’s the basis for many other, more advanced systems.
Disadvantages
- No load balancer
- No scaling capabilities
- Limited to one Docker host
2. Netflix Eureka
Eureka is a service discovery system developed by Netflix.
The idea of the Eureka service is to have a service discovery server
and clients.
Clients register with the central server by submitting their service name and IP address along with the port they’re listening on.
After registration, the Eureka server periodically checks if the services are still available and “healthy.”
Spring cloud has very good integration with the Eureka service which I used in my example. I created an example of using the Eureka service in the project repository.
In that example, I created three services: eureka-server
, echo-server
, and echo-client
.
eureka-server
is a Spring Boot application that serves as the central Eureka server.
All applications that want to be part of our network must see this service, register with it, and allow the Eureka server to contact them periodically.
echo-server
and echo-client
are applications that register with the Eureka server and use its service discovery capabilities.
For these two services, I set them to listen on port 0
, which means Spring Boot will automatically choose an available port.
This allows me to run multiple instances of the same service on the same host with each listening on a different port.
In my example, echo-client
needs to fetch a resource located on echo-server
.
If we use only the Eureka client library, then to get the location of the echo-server
service, we need to do it like this:
@Service
public class EchoClientService {
@Autowired
@Lazy
private EurekaClient eurekaClient;
private RestTemplate restTemplate = new RestTemplate();
public ServerResponseType getEchoServerResource() {
var server = this.eurekaClient.getNextServerFromEureka("echo-server", false);
// ^- InstanceInfo(host, port, ...)
//
// OR
var options = this.eurekaClient.getApplication(this.serverName).getInstances();
// ^- List<InstanceInfo>
return this.restTemplate.getForEntity(String.format("http://%s:%s/",
server.getHostName(),
server.getPort()), ServerResponseType.class);
}
}
As seen in the example, to access echo-server
, we need to use EurekaClient
and call one of the methods that give us information about the service.
Eureka keeps a local cache of all services it has registered, and we can get all instances of the echo-server
service by calling the getInstances()
method.
Selecting the best instance isn’t built into the Eureka client library; it’s up to us to do that.
If the getNextServerFromEureka
method is used, then Eureka uses the round-robin method.
To maximize the capabilities of the Eureka service, Ribbon and Feign libraries are used. These two libraries enable client-side load balancing. Using them, we can transfer part of the load-balancing responsibility from the server to the client. Using Ribbon, we can set it up so that if by chance at the time of service call the instance we got isn’t available, Ribbon will try to call another instance.
3. Reverse Proxy Systems
Reverse proxy systems are systems that sit between the client and server. The client makes an HTTP request to the server, the request arrives at the reverse proxy server which will, based on domain, protocol, URL, or some other criterion, redirect the request to the appropriate server. These systems often also have load-balancing capabilities using the round-robin method.
The reverse proxy systems I used in this project are Nginx and Caddy.
I didn’t consider it necessary to set up examples of using these systems as we only need to configure configuration files.
I found Caddy simpler to use because it has the ability to automatically obtain and set up SSL certificates. I set up Caddy on my VPS server where I deployed an example of my own implementation of a load balancer and service discovery system (which we’ll cover next).
On my DNS server, I set up an A record that redirects all domains of the form *.jerkic.dev
to my VPS server.
On the VPS server, I ran a Caddy server that listens on port 443
and automatically obtains and sets up SSL certificates for all domains available to it.
Then I just had to modify the configuration file to redirect the subdomain service-discovery.jerkic.dev
to localhost:8080
and that’s it.
Very simple, very fast.
Custom Solution
In creating my own solution, I was inspired by all the above solutions. My system focuses only on servers that use the SpringBoot framework. The goal was to create a system that, with minimal configuration, enables load balancing, service discovery, and reverse proxy functionality.
Service Configuration
A service that we want to be part of our system would need to take the following steps:
- Add the following dependencies to the
pom.xml
file:
<dependency>
<groupId>dev.jerkic.custom-load-balancer</groupId>
<artifactId>client</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
- Add the following configurations to the
application.properties
file:
spring.application.name=example-server-1 # service name
server.port=0 # randomly selected port
server.servlet.context-path=/example-server-1 # context path for reverse proxy
discovery-client.discoveryServerUrl=http://localhost:8080 # discovery server address
discovery-client.serviceName=${spring.application.name} # service name
- Add the following configurations to the
@SpringBootApplication
annotation:
@SpringBootApplication(
scanBasePackages = {
"dev.jerkic.custom_load_balancer",
})
public class ExampleServer1Application {
public static void main(String[] args) {
SpringApplication.run(ExampleServer1Application.class, args);
}
}
With these steps, the service will automatically register with the discovery server and will be available via the reverse proxy.
System Architecture
In the following points, I’ll explain why the developed system works as a solution for load balancing, service discovery, and reverse proxy.
Service Discovery
When starting the service, the service registers with the discovery server.
The library that the service must use registers a DiscoveryServiceConfiguration
bean that takes care of registering the service with the discovery server.
/** Register on every startup of server */
@PostConstruct
public void register() {
var registerInput =
RegisterInput.builder()
.serviceInfo(this.getServiceInfo())
.serviceHealth(this.getServiceHealth())
.build();
try {
var instanceId = this.clientHealthService.registerService(registerInput);
log.info("Registered service, id is {}", instanceId);
} catch (Exception e) {
log.error("Error registering service. Going to after 10s");
Executors.newSingleThreadScheduledExecutor()
.schedule(
() -> {
log.info("Retrying registration");
register();
},
10,
TimeUnit.SECONDS);
}
}
The getServiceInfo
method collects data from the configuration, such as service name, base path, etc.
The getServiceHealth
method collects data about the service’s health, among which the most important is the number of currently active requests.
After successful registration, the service receives an instanceId
which the service then uses to update the health data of that instance.
If the service failed to register, the service will try to register again after 10 seconds. And so on in a loop until it succeeds in registering.
Upon successful registration, the server registers that there exists a service named XYZ
with base path /XYZ
on port 1234
. The IP address and port on which the instance listens are data associated with the instance, not with the service.
The port is sent as data in serviceHealth
, and the IP address is automatically inferred from the TCP connection.
It’s important to note that the system was built with the assumption that services and the platform are in the same internal network, i.e., they can communicate via private IP addresses.
After this, it’s possible to send a query to the platform for a service named XYZ
, and the platform will be able to recognize that it has N
different instances of service XYZ
and will be able to select one of them.
After successful registration, the service will periodically send health data to the discovery server.
// Define cron job for every 1 min
@Scheduled(fixedRate = 60000)
public void updateHealth() {
try {
var oInstanceId = this.clientHealthService.getInstanceId();
if (oInstanceId.isEmpty()) {
log.warn("Service not registered");
return;
}
var instanceId = oInstanceId.get();
var healthStatus =
HealthUpdateInput.builder()
.instanceId(instanceId)
.serviceName(this.clientProperties.getServiceName())
.health(this.getServiceHealth())
.build();
log.info("Updating health: {}", healthStatus);
this.clientHealthService.updateHealth(healthStatus);
} catch (Exception e) {
log.error("Error updating health", e);
log.warn("Trying to register again after failed health update");
this.register();
}
}
Load Balancing
When the platform receives a query for a service, the platform will select one of the service instances. Resolving the best instance is done with the following SQL query:
SELECT
s.entry_id,
s.service_model_id as service_id,
max(s.instance_recorded_at) as latest_timestamp
FROM
service_instance s
WHERE
s.is_healthy = 1
AND (strftime('%s', 'now') * 1000 - s.instance_recorded_at) <= (3*60*1000)
GROUP BY
s.instance_id
The query takes all service instances that are healthy and have registered in the last 3 minutes.
When updating service health, the service sends health data to the discovery server.
This means that for one service_id
and instance_id
, there can be multiple rows in the service_instance
table.
The last row that was inserted for one service_id
and instance_id
is found using max(s.instance_recorded_at) as latest_timestamp
.
After retrieving data about available service instances, this data is stored in a cache of the form ConcurrentHashMap<String, PriorityQueue<UsedResolvedInstance>>
, where UsedResolvedInstance
is a class that contains data about the service instance and the number of currently active requests. The cache is updated every 10 seconds.
When selecting the best instance, the poll
method from PriorityQueue
takes the instance with the lowest number of currently active requests. After an instance is taken from the PriorityQueue
, the number of currently active requests is increased by one in the cache. The platform doesn’t write this data to the database, but only temporarily in the cache. The actual number of currently active requests is updated in the regular health update that the service itself sends to the discovery server every 60 seconds.
The LoadBalancingService
class returns the best available instance for a specific service by service name or by the registered service’s base path. This information can be used by the reverse proxy system or by the service itself using the ProxyRestTemplate
class.
Reverse Proxy
The service discovery server defines a couple of controllers that serve to retrieve data about services and service instances.
Their paths are prefixed with /health
and /register
, which means these two paths are reserved and cannot be used for reverse proxy purposes.
All other requests that don’t start with /health
and /register
are caught in the following controller:
@RestController
@RequiredArgsConstructor
@Slf4j
public class ProxyController {
private final ProxyRestTemplate restTemplate;
@RequestMapping("/**")
public ResponseEntity<?> proxy(HttpServletRequest request) throws IOException {
var requestedPath = request.getRequestURI();
log.debug("Requested path: {}", requestedPath);
var requestEntity = RequestEntityConverter.fromHttpServletRequest(request);
return this.restTemplate.exchange(requestEntity, String.class);
}
}
The RequestEntityConverter
class converts HttpServletRequest
to RequestEntity
which is used in the RestTemplate
class.
A ProxyRestTemplate
class was created that extends the RestTemplate
class by adding an interceptor.
@Component
public class ProxyRestTemplate extends RestTemplate {
@Autowired
public ProxyRestTemplate(LoadBalancingService loadBalancingService) {
super();
this.setInterceptors(List.of(new LoadBalancingHttpRequestInterceptor(loadBalancingService)));
}
}
The LoadBalancingHttpRequestInterceptor
class is an interceptor that:
- retrieves the best service instance before sending the request
- changes the URI in the request to the URI of the best instance
- adds
X-Load-balanced
andX-LB-instance
headers to the response
@RequiredArgsConstructor
@Slf4j
public class LoadBalancingHttpRequestInterceptor implements ClientHttpRequestInterceptor {
private final LoadBalancingService loadBalancingService;
@Override
public ClientHttpResponse intercept(
HttpRequest request, byte[] body, ClientHttpRequestExecution execution) throws IOException {
// Select best instance
var bestInstance =
this.loadBalancingService.getBestInstanceForBaseHref(request.getURI().toString());
if (bestInstance.isEmpty()) {
throw new NoInstanceFoundException(
"No instance found for the given base href " + request.getURI());
}
var uri = this.getProxiedUriFromOriginal(bestInstance.get().uri(), request);
// Recreate request
HttpRequest newRequest =
new HttpRequestImplementation(
uri,
request.getURI().toString(),
request.getHeaders(),
request.getMethod(),
request.getAttributes());
// Send request to best instance
var result = execution.execute(newRequest, body);
// Add metadata headers
var responseHeaders = result.getHeaders();
responseHeaders.add("X-Load-balanded", "true");
bestInstance.ifPresent(
instance -> {
responseHeaders.add("X-LB-instance", instance.instanceId());
});
// Return response
return result;
}
private URI getProxiedUriFromOriginal(String bestInstanceUri, HttpRequest request) {
try {
var query = request.getURI().getQuery() == null ? "" : "?" + request.getURI().getQuery();
var realRequestUri = bestInstanceUri + request.getURI().getPath() + query;
log.info("Sending request to {}", realRequestUri);
return new URI(realRequestUri);
} catch (URISyntaxException e) {
log.error("Error building real uri", e);
throw new RuntimeException(e);
}
}
}
Demo
In the example from the video above (which is available here), we can see an example of how the system works.
In the example, we are on the details page of the example-server-1
service.
This service has two available instances. Each instance is assigned a color for easier recognition.
On the right side of the instance list is a list of elements that send requests to /example-server-1/test
and /example-server-1/slow-request
paths every 5 seconds. Since no domain is specified, requests are sent to the current domain, i.e., the reverse proxy domain.
The requests receive some HTML content as a response which is pasted into the corresponding element. The example was created using HTMX. In the header of each response is the X-LB-instance
header which contains the instanceId
of the instance that responded to the request. We use this information to color the response in the color of the instance that responded.
This makes it clearly visible how requests are distributed to different service instances.
Conclusion
In this article, I described how I researched existing solutions for load balancing and service discovery and how I was inspired by them to create my own solution.
The entire source code of the project is available on GitHub. The example from the video is available here.