Creating a Custom Load Balancer and Service Discovery System

25. 01. 2025.

NOTE: This article is translated using AI from the original Croatian article and may contain errors.

Introduction

For my faculty project, I chose to research existing load balancing and service discovery solutions and develop my own solution. In this article, I’ll describe which technologies I researched and how these technologies inspired me to create my own solution.

Existing Solutions

1. Docker

For the simplest example of a service discovery system, I took Docker networking. Docker networking allows containers to see each other through container names. I created an example of using Docker networking in the project repository.

In this example, I created two services: docker-server and docker-client. Both services are Java applications using Spring Boot, and their Dockerfiles look like this:

FROM openjdk:21-slim
COPY target/*-0.0.1-SNAPSHOT.jar /app.jar
ENTRYPOINT ["java", "-jar", "/app.jar"]

In the docker-compose.yml file, I defined one instance of the docker-server service and three instances of the docker-client service. Since Docker networking allows containers to see each other via container names, and container names must be unique, this means I actually have “four services,” each with its own name.

services:
    docker-server:
        image: docker-echo-server
        networks:
            - echo-network
    docker-client1:
        image: docker-echo-client
        depends_on:
            - docker-server
        environment:
            - TARGET_SERVER_ADDRESS=http://docker-server:8080
        networks:
            - echo-network
    docker-client2:
        image: docker-echo-client
        depends_on:
            - docker-server
        environment:
            - TARGET_SERVER_ADDRESS=http://docker-server:8080
        networks:
            - echo-network
    docker-client3:
        image: docker-echo-client
        depends_on:
            - docker-server
        environment:
            - TARGET_SERVER_ADDRESS=http://docker-server:8080
        networks:
            - echo-network
networks:
    echo-network:
        driver: bridge

Advantages

With this approach, we got only a service discovery system, but not a load balancer. The docker-server service doesn’t need to have an open port because we limited it to communication only within our Docker network. The docker-client1 service, which is in the same network as docker-server, can communicate with the docker-server service via the container name.

This means that docker-client1 can call docker-server via http://docker-server:8080, and Docker networking automatically redirects that call to the correct IP address and port. Port 8080 is an internal port, and it isn’t exposed to the host machine unless we explicitly request it.

This is quite a simple solution that can be sufficient for some simpler applications. It’s important to be familiar with this solution as it’s the basis for many other, more advanced systems.

Disadvantages

No load balancer
No scaling capabilities
Limited to one Docker host

2. Netflix Eureka

Eureka is a service discovery system developed by Netflix. The idea of the Eureka service is to have a service discovery server and clients. Clients register with the central server by submitting their service name and IP address along with the port they’re listening on.

After registration, the Eureka server periodically checks if the services are still available and “healthy.”

Spring cloud has very good integration with the Eureka service which I used in my example. I created an example of using the Eureka service in the project repository.

In that example, I created three services: eureka-server, echo-server, and echo-client.

eureka-server is a Spring Boot application that serves as the central Eureka server. All applications that want to be part of our network must see this service, register with it, and allow the Eureka server to contact them periodically.

echo-server and echo-client are applications that register with the Eureka server and use its service discovery capabilities. For these two services, I set them to listen on port 0, which means Spring Boot will automatically choose an available port. This allows me to run multiple instances of the same service on the same host with each listening on a different port.

In my example, echo-client needs to fetch a resource located on echo-server.

If we use only the Eureka client library, then to get the location of the echo-server service, we need to do it like this:

@Service
public class EchoClientService {
    @Autowired
    @Lazy
    private EurekaClient eurekaClient;
    private RestTemplate restTemplate = new RestTemplate();


    public ServerResponseType getEchoServerResource() {
        var server = this.eurekaClient.getNextServerFromEureka("echo-server", false);
        //   ^- InstanceInfo(host, port, ...)
        //
        // OR
        var options = this.eurekaClient.getApplication(this.serverName).getInstances();
        //   ^- List<InstanceInfo>

        return this.restTemplate.getForEntity(String.format("http://%s:%s/",
            server.getHostName(),
            server.getPort()), ServerResponseType.class);
    }

}

As seen in the example, to access echo-server, we need to use EurekaClient and call one of the methods that give us information about the service. Eureka keeps a local cache of all services it has registered, and we can get all instances of the echo-server service by calling the getInstances() method.

Selecting the best instance isn’t built into the Eureka client library; it’s up to us to do that. If the getNextServerFromEureka method is used, then Eureka uses the round-robin method.

To maximize the capabilities of the Eureka service, Ribbon and Feign libraries are used. These two libraries enable client-side load balancing. Using them, we can transfer part of the load-balancing responsibility from the server to the client. Using Ribbon, we can set it up so that if by chance at the time of service call the instance we got isn’t available, Ribbon will try to call another instance.

3. Reverse Proxy Systems

Reverse proxy systems are systems that sit between the client and server. The client makes an HTTP request to the server, the request arrives at the reverse proxy server which will, based on domain, protocol, URL, or some other criterion, redirect the request to the appropriate server. These systems often also have load-balancing capabilities using the round-robin method.

The reverse proxy systems I used in this project are Nginx and Caddy.

I didn’t consider it necessary to set up examples of using these systems as we only need to configure configuration files.

I found Caddy simpler to use because it has the ability to automatically obtain and set up SSL certificates. I set up Caddy on my VPS server where I deployed an example of my own implementation of a load balancer and service discovery system (which we’ll cover next).

On my DNS server, I set up an A record that redirects all domains of the form *.jerkic.dev to my VPS server. On the VPS server, I ran a Caddy server that listens on port 443 and automatically obtains and sets up SSL certificates for all domains available to it.

Then I just had to modify the configuration file to redirect the subdomain service-discovery.jerkic.dev to localhost:8080 and that’s it. Very simple, very fast.

Custom Solution

In creating my own solution, I was inspired by all the above solutions. My system focuses only on servers that use the SpringBoot framework. The goal was to create a system that, with minimal configuration, enables load balancing, service discovery, and reverse proxy functionality.

Service Configuration

A service that we want to be part of our system would need to take the following steps:

Add the following dependencies to the pom.xml file:

<dependency>
    <groupId>dev.jerkic.custom-load-balancer</groupId>
    <artifactId>client</artifactId>
    <version>0.0.1-SNAPSHOT</version>
</dependency>

Add the following configurations to the application.properties file:

spring.application.name=example-server-1 # service name
server.port=0 # randomly selected port
server.servlet.context-path=/example-server-1 # context path for reverse proxy

discovery-client.discoveryServerUrl=http://localhost:8080 # discovery server address
discovery-client.serviceName=${spring.application.name} # service name

Add the following configurations to the @SpringBootApplication annotation:

@SpringBootApplication(
    scanBasePackages = {
      "dev.jerkic.custom_load_balancer",
    })
public class ExampleServer1Application {

  public static void main(String[] args) {
    SpringApplication.run(ExampleServer1Application.class, args);
  }
}

With these steps, the service will automatically register with the discovery server and will be available via the reverse proxy.

System Architecture

In the following points, I’ll explain why the developed system works as a solution for load balancing, service discovery, and reverse proxy.

Service Discovery

When starting the service, the service registers with the discovery server. The library that the service must use registers a DiscoveryServiceConfiguration bean that takes care of registering the service with the discovery server.

  /** Register on every startup of server */
  @PostConstruct
  public void register() {
    var registerInput =
        RegisterInput.builder()
            .serviceInfo(this.getServiceInfo())
            .serviceHealth(this.getServiceHealth())
            .build();

    try {
      var instanceId = this.clientHealthService.registerService(registerInput);
      log.info("Registered service, id is {}", instanceId);
    } catch (Exception e) {
      log.error("Error registering service. Going to after 10s");

      Executors.newSingleThreadScheduledExecutor()
          .schedule(
              () -> {
                log.info("Retrying registration");
                register();
              },
              10,
              TimeUnit.SECONDS);
    }
  }

The getServiceInfo method collects data from the configuration, such as service name, base path, etc. The getServiceHealth method collects data about the service’s health, among which the most important is the number of currently active requests.

After successful registration, the service receives an instanceId which the service then uses to update the health data of that instance.

If the service failed to register, the service will try to register again after 10 seconds. And so on in a loop until it succeeds in registering.

Upon successful registration, the server registers that there exists a service named XYZ with base path /XYZ on port 1234. The IP address and port on which the instance listens are data associated with the instance, not with the service.

The port is sent as data in serviceHealth, and the IP address is automatically inferred from the TCP connection.

It’s important to note that the system was built with the assumption that services and the platform are in the same internal network, i.e., they can communicate via private IP addresses.

After this, it’s possible to send a query to the platform for a service named XYZ, and the platform will be able to recognize that it has N different instances of service XYZ and will be able to select one of them.

After successful registration, the service will periodically send health data to the discovery server.

  // Define cron job for every 1 min
  @Scheduled(fixedRate = 60000)
  public void updateHealth() {
    try {

      var oInstanceId = this.clientHealthService.getInstanceId();
      if (oInstanceId.isEmpty()) {
        log.warn("Service not registered");
        return;
      }
      var instanceId = oInstanceId.get();

      var healthStatus =
          HealthUpdateInput.builder()
              .instanceId(instanceId)
              .serviceName(this.clientProperties.getServiceName())
              .health(this.getServiceHealth())
              .build();

      log.info("Updating health: {}", healthStatus);

      this.clientHealthService.updateHealth(healthStatus);
    } catch (Exception e) {
      log.error("Error updating health", e);
      log.warn("Trying to register again after failed health update");
      this.register();
    }
  }

Load Balancing

When the platform receives a query for a service, the platform will select one of the service instances. Resolving the best instance is done with the following SQL query:

SELECT
    s.entry_id,
    s.service_model_id as service_id,
    max(s.instance_recorded_at) as latest_timestamp
FROM
    service_instance s
WHERE
    s.is_healthy = 1
    AND (strftime('%s', 'now') * 1000 - s.instance_recorded_at) <= (3*60*1000)
GROUP BY
    s.instance_id

The query takes all service instances that are healthy and have registered in the last 3 minutes. When updating service health, the service sends health data to the discovery server. This means that for one service_id and instance_id, there can be multiple rows in the service_instance table. The last row that was inserted for one service_id and instance_id is found using max(s.instance_recorded_at) as latest_timestamp.

After retrieving data about available service instances, this data is stored in a cache of the form ConcurrentHashMap<String, PriorityQueue<UsedResolvedInstance>>, where UsedResolvedInstance is a class that contains data about the service instance and the number of currently active requests. The cache is updated every 10 seconds.

When selecting the best instance, the poll method from PriorityQueue takes the instance with the lowest number of currently active requests. After an instance is taken from the PriorityQueue, the number of currently active requests is increased by one in the cache. The platform doesn’t write this data to the database, but only temporarily in the cache. The actual number of currently active requests is updated in the regular health update that the service itself sends to the discovery server every 60 seconds.

The LoadBalancingService class returns the best available instance for a specific service by service name or by the registered service’s base path. This information can be used by the reverse proxy system or by the service itself using the ProxyRestTemplate class.

Reverse Proxy

The service discovery server defines a couple of controllers that serve to retrieve data about services and service instances. Their paths are prefixed with /health and /register, which means these two paths are reserved and cannot be used for reverse proxy purposes.

All other requests that don’t start with /health and /register are caught in the following controller:

@RestController
@RequiredArgsConstructor
@Slf4j
public class ProxyController {
  private final ProxyRestTemplate restTemplate;

  @RequestMapping("/**")
  public ResponseEntity<?> proxy(HttpServletRequest request) throws IOException {
    var requestedPath = request.getRequestURI();

    log.debug("Requested path: {}", requestedPath);

    var requestEntity = RequestEntityConverter.fromHttpServletRequest(request);
    return this.restTemplate.exchange(requestEntity, String.class);
  }
}

The RequestEntityConverter class converts HttpServletRequest to RequestEntity which is used in the RestTemplate class.

A ProxyRestTemplate class was created that extends the RestTemplate class by adding an interceptor.

@Component
public class ProxyRestTemplate extends RestTemplate {
  @Autowired
  public ProxyRestTemplate(LoadBalancingService loadBalancingService) {
    super();
    this.setInterceptors(List.of(new LoadBalancingHttpRequestInterceptor(loadBalancingService)));
  }
}

The LoadBalancingHttpRequestInterceptor class is an interceptor that:

retrieves the best service instance before sending the request
changes the URI in the request to the URI of the best instance
adds X-Load-balanced and X-LB-instance headers to the response

@RequiredArgsConstructor
@Slf4j
public class LoadBalancingHttpRequestInterceptor implements ClientHttpRequestInterceptor {
  private final LoadBalancingService loadBalancingService;

  @Override
  public ClientHttpResponse intercept(
      HttpRequest request, byte[] body, ClientHttpRequestExecution execution) throws IOException {

    // Select best instance
    var bestInstance =
        this.loadBalancingService.getBestInstanceForBaseHref(request.getURI().toString());

    if (bestInstance.isEmpty()) {
      throw new NoInstanceFoundException(
          "No instance found for the given base href " + request.getURI());
    }

    var uri = this.getProxiedUriFromOriginal(bestInstance.get().uri(), request);

    // Recreate request
    HttpRequest newRequest =
        new HttpRequestImplementation(
            uri,
            request.getURI().toString(),
            request.getHeaders(),
            request.getMethod(),
            request.getAttributes());
    // Send request to best instance
    var result = execution.execute(newRequest, body);

    // Add metadata headers
    var responseHeaders = result.getHeaders();
    responseHeaders.add("X-Load-balanded", "true");
    bestInstance.ifPresent(
        instance -> {
          responseHeaders.add("X-LB-instance", instance.instanceId());
        });

    // Return response
    return result;
  }

  private URI getProxiedUriFromOriginal(String bestInstanceUri, HttpRequest request) {
    try {
      var query = request.getURI().getQuery() == null ? "" : "?" + request.getURI().getQuery();

      var realRequestUri = bestInstanceUri + request.getURI().getPath() + query;
      log.info("Sending request to {}", realRequestUri);
      return new URI(realRequestUri);
    } catch (URISyntaxException e) {
      log.error("Error building real uri", e);
      throw new RuntimeException(e);
    }
  }
}

Demo

In the example from the video above (which is available here), we can see an example of how the system works.

In the example, we are on the details page of the example-server-1 service. This service has two available instances. Each instance is assigned a color for easier recognition.

On the right side of the instance list is a list of elements that send requests to /example-server-1/test and /example-server-1/slow-request paths every 5 seconds. Since no domain is specified, requests are sent to the current domain, i.e., the reverse proxy domain.

The requests receive some HTML content as a response which is pasted into the corresponding element. The example was created using HTMX. In the header of each response is the X-LB-instance header which contains the instanceId of the instance that responded to the request. We use this information to color the response in the color of the instance that responded.

This makes it clearly visible how requests are distributed to different service instances.

Conclusion

In this article, I described how I researched existing solutions for load balancing and service discovery and how I was inspired by them to create my own solution.

The entire source code of the project is available on GitHub. The example from the video is available here.