Engineering

From Polling to gRPC Streaming at Cabify

Apr 22, 2026

Every 60 seconds, every driver in our logistics fleet was asking the same question: “Has anything changed?” That polling loop meant higher latency, wasted bandwidth, and a user experience that felt sluggish when parcels were being reassigned in real time. We needed drivers to know instantly when a service was assigned, updated, or cancelled. This is the story of how we replaced polling with gRPC bidirectional streaming across our Go backend and Android app, and what we learned about proxies, horizontal scaling, and contract management along the way.

We are part of Cabify Logistics, the team behind Cabify’s parcel delivery operations: same-day delivery (SND), express, and food delivery. Unlike ride-hailing, a logistics driver can be actively managing multiple services at once: picking up, delivering, or getting services reassigned mid-route. In that context, a 60-second polling delay is not just inefficient, it’s a real operational problem.

Why gRPC?

When we set out to build the communication layer between our Go backend (the logistics-driver Backend for Frontend, or BFF) and our Android client app, we evaluated REST with JSON, WebSockets, and gRPC. We chose gRPC for several practical reasons:

  • Efficiency. gRPC runs on HTTP/2 and uses Protocol Buffers as its binary serialization format. Compared to REST with JSON, this means smaller payloads and lower latency, which is critical when drivers are on mobile networks with variable connectivity.

  • Strong contracts. Protocol Buffers enforce a typed schema at compile time on both sides. With REST, we would need to maintain OpenAPI specs manually and hope both sides stay in sync. With gRPC, the .proto file is the contract, and code generation ensures compliance.

  • Native streaming support. gRPC provides bidirectional streaming as a first-class primitive, unlike REST where we would need to bolt on WebSockets or Server-Sent Events as a separate protocol with its own connection management.

  • Ecosystem maturity. Both Go and Kotlin have well-supported gRPC libraries with code generation plugins, interceptors, and testing utilities.

Architectural Overview

Our architecture relies on four components:

  • The Android client app: The mobile application used by logistics drivers.
  • The Backend for Frontend (BFF): The logistics-driver BFF, responsible for managing driver connections.
  • The Message Bus: Our event-driven backbone, propagating changes across services.
  • The service-offers backend: Responsible for handling and notifying service assignments to drivers.
flowchart LR
    A["📱 Android App"] -->|"gRPC stream"| B["logistics-driver BFF"]
    C["service-offers backend"] -->|"service events"| D[("Message Bus")]
    D -->|"fan-out"| B

From Polling to Real-Time Streams

Previously, the Android app polled the BFF every 60 seconds to fetch updated service information. The data a driver consumes is alive: it changes throughout the delivery workflow, and drivers must always see the freshest version. While polling partially mitigated the “multiple writers” problem, changes were slow to propagate, network traffic was high, and resource utilization was inefficient.

We implemented gRPC server streams to solve this. The backend service (service-offers) knows precisely when a service changes and notifies the client directly. We maintain a single, long-lived stream per driver. We still keep the original polling mechanism as a fallback to ensure robustness if the real-time stream or the Message Bus fails.

flowchart LR
    subgraph before ["❌ Before: Polling"]
        A1["Android App"] -->|"GET /services (every 60s)"| B1["BFF"]
        B1 -->|"response"| A1
    end
    subgraph after ["✅ After: gRPC Stream"]
        A2["Android App"] -->|"open stream"| B2["BFF"]
        B2 -->|"push on change"| A2
    end

The Contract Layer: A Single Source of Truth

The .proto files live in a dedicated Git repository, separate from both the backend and the mobile app. This repository is the single source of truth for the API contract, containing service definitions, message types, and all shared models.

We use buf for linting and format checking. The buf.yaml configuration enforces rules like no import cycles, consistent enum zero values, and proper formatting. Every merge request runs these checks automatically in CI.

The repository follows Semantic Versioning through a CHANGELOG.md. Every change (whether a new endpoint, a new field, or a deprecation) gets a version bump and a changelog entry.

Code Generation Pipeline

When a change is merged to master, the CI pipeline triggers code generation:

  • Go bindings: The pipeline publishes Go bindings as a versioned Go module. The backend team bumps the dependency version in go.mod to consume the new contract.

  • Kotlin bindings: A script clones the mobile repository, replaces all existing .proto files, adapts them for the Android build system (adding Java package options and adjusting import paths), commits to a new branch, and opens a Merge Request automatically via the GitLab API. The mobile team reviews through their standard code review process.

This automation eliminates manual file copying and ensures the mobile team is immediately notified of any contract change.

flowchart LR
    A["📁 Proto Repository<br/>(merge to master)"] --> B["CI Pipeline<br/>(buf lint + breaking check)"]
    B --> C["Go Module<br/>(versioned)"]
    B --> D["Kotlin script"]
    C -->|"go.mod bump"| E["🖥️ Backend (Go)"]
    D -->|"auto MR via GitLab API"| F["📱 Mobile (Kotlin)"]

Scaling gRPC Streams in Go

Connection Handling: The Fan-Out Solution

When a driver opens a server stream, the request goes to a specific BFF instance, which holds an in-memory map of active streams. The difficulty is that the Message Bus, using round-robin assignment, might send an event relevant to a driver to an instance that does not hold that driver’s active connection.

We solved this using fan-out: the broker sends a copy of every event to all subscribed instances. Each instance then:

  1. Receives the fanned-out event and checks the driver ID.
  2. Compares it against its own in-memory map of active streams.
  3. Only the instance holding the active gRPC connection processes and forwards the message. All others discard it with an ACK.
flowchart LR
    MB[("Message Bus")] -->|"fan-out"| I1["BFF Instance 1<br/>✅ stream active"]
    MB -->|"fan-out"| I2["BFF Instance 2<br/>❌ discard + ACK"]
    MB -->|"fan-out"| I3["BFF Instance 3<br/>❌ discard + ACK"]
    I1 -->|"gRPC push"| D["📱 Driver App"]

By localizing connection state and performing the check in memory, we achieve efficient horizontal scalability.

Keepalives: Bypassing the Proxy Problem

The standard gRPC keepalive mechanism relies on HTTP/2 PING frames to verify connection health. This fails in production because proxies like Nginx and Envoy sit in front of the server.

The problem: When the server sent a PING frame, the proxy intercepted it and responded with an ACK instead of forwarding it to the client. The server incorrectly believed the connection was healthy even when the client was disconnected. Relying on server-side gRPC messages as pings was also insufficient because Nginx buffered them.

We implemented a custom, application-level keepalive by upgrading our unidirectional server stream to a bidirectional stream. This approach uses regular gRPC messages (Ping and Pong) that proxies forward reliably:

  • Client-side: Periodically sends Ping messages and expects a Pong reply. If no reply is received within a timeout, the connection is terminated.
  • Server-side: Monitors incoming Ping messages and replies with a Pong. If no Ping is received within the expected interval, the server terminates the connection.

We use 10-second intervals for ping frequency, with a rate limit of 10 pings per 10-second window per connection to prevent potential DoS attacks from authenticated clients.

sequenceDiagram
    participant C as Client
    participant P as Proxy (Nginx/Envoy)
    participant S as Server

    Note over C,S: ❌ Native HTTP/2 PING (fails through proxy)
    S->>P: HTTP/2 PING frame
    P-->>S: ACK (proxy intercepts)
    Note over C: Never receives PING — proxy hides disconnection

    Note over C,S: ✅ Custom Ping/Pong messages (works)
    C->>P: Ping message
    P->>S: Ping message
    S->>P: Pong message
    P->>C: Pong message

gRPC on Android with Kotlin

On the mobile side, the proto definition drives the entire implementation. A typical service looks like this:

service ExampleService {
    rpc Example(ExampleRequest) returns (ExampleResponse);
    rpc ExampleStream(stream ExampleStreamRequest) returns (stream ExampleStreamResponse);
}

message ExampleStreamRequest {
  oneof request {
    PingRequest ping_request = 1;
  }
}

message ExampleStreamResponse {
  oneof response {
    StateChanged state_change = 1;
    ServiceAssigned service_assigned = 2;
  }
}

Single Calls

Once the project compiles, the grpc-kotlin plugin generates a Kotlin DSL builder. Making a call is straightforward:

suspend fun myCall(myId: String) {
    val request = exampleRequest {
        this.id = myId
    }
    val result = stub.example(request)
    result.map { content ->
        ...
    }
}

Bidirectional Streaming

For bidirectional connections, we add the stream keyword to both request and response in the proto definition. The generated code returns a Flow instead of a suspend fun:

public fun exampleStream(requests: Flow<ExampleStreamRequest>): Flow<ExampleStreamResponse>

We emit data through a request flow and collect responses from the response flow:

private val streamRequestFlow = MutableSharedFlow<ExampleStreamRequest>()

fun myStream(): Flow<StreamDto> {
    return stub.exampleStream(streamRequestFlow)
        .catch { ex ->
            emit(ExampleStreamResponse.getDefaultInstance())
        }.map { response ->
            when {
                response.hasStateChange() -> StreamDto.StateChangedDto(...)
                response.hasServiceAssigned() -> StreamDto.ServiceAssignedDto(...)
                else -> StreamDto.Unknown
            }
        }
}

Testing gRPC on Mobile

The grpc-kotlin plugin generates not only the client implementation but also a server base class. This means we can override service methods and return controlled responses directly in tests:

class TestExampleService : ExampleServiceGrpcKt.ExampleServiceCoroutineImplBase() {

    private var response: ExampleResponse = ExampleResponse.getDefaultInstance()

    fun setResponse(response: ExampleResponse) {
        this.response = response
    }

    override suspend fun example(
        request: ExampleRequest
    ): ExampleResponse {
        return response
    }
}

To redirect requests to this test implementation, we create an in-process server and managed channel. This is a real gRPC channel running locally, not a mock or fake:

object TestManagedChannel {
    private val exampleService = TestExampleService()
    private var channel: ManagedChannel? = null
    private var server: Server? = null

    fun setUp(): ManagedChannel {
        val serverName = UUID.randomUUID().toString()
        val serviceRegistry = MutableHandlerRegistry().apply {
            addService(exampleService)
        }

        server = InProcessServerBuilder
            .forName(serverName)
            .fallbackHandlerRegistry(serviceRegistry)
            .directExecutor()
            .build()
            .start()

        channel = InProcessChannelBuilder
            .forName(serverName)
            .directExecutor()
            .build()

        return channel!!
    }

    fun getChannel(): ManagedChannel = channel 
        ?: throw IllegalStateException("Channel not initialized. Call setUp() first.")

    fun setExampleResponse(response: ExampleResponse) {
        exampleService.setResponse(response)
    }

    fun tearDown() {
        channel?.shutdown()
        server?.shutdown()
        channel = null
        server = null
    }
}

This channel can be injected in unit or integration tests:

class ExampleTest {

    private lateinit var exampleRemoteDataSource: ExampleRemoteDataSourceImpl

    @Before
    fun setup() {
        TestManagedChannel.setUp()
        exampleRemoteDataSource = ExampleRemoteDataSourceImpl(
            channel = TestManagedChannel.getChannel()
        )
    }

    @After
    fun tearDown() {
        TestManagedChannel.tearDown()
    }

    @Test
    fun exampleTest() = runTest {
        val expectedValue = ...
        TestManagedChannel.setExampleResponse(...)
        val result = exampleRemoteDataSource.myCall(...)
        assertEquals(expectedValue, result)
    }
}

Observability

Robust observability is essential for long-lived stream connections. We monitor metrics critical to stream management, visualized on Grafana dashboards and feeding into Prometheus alerts.

In a Proof of Concept running 1,500 concurrent streams across three pods, the system demonstrated efficiency:

Metric Value
Max CPU Usage 11% (12.0 millicores)
Max Memory Usage 40% (37.7 MiB)

Memory usage is primarily attributed to goroutines, as one connection equals one goroutine in Go.

Alerts are configured for:

  • Connection Loss Detection: When the custom keepalive mechanism fails.
  • Resource Saturation: When CPU or memory usage approaches dangerous thresholds.
  • Event Fan-out Issues: When event latency in the Message Bus exceeds expected thresholds.

Managing API Evolution

Our API started unversioned (what we now call v0). As the service model evolved, we migrated to v1 (e.g., GetServices became GetServicesV1), with the original v0 RPC still present in the proto for backward compatibility. We treat backward compatibility as a hard constraint:

  • Adding new fields: New fields get the next available field number. Existing clients ignore fields they do not recognize.
  • Deprecating fields: We mark fields with [deprecated = true], generating compiler warnings on both sides.
  • Reserving removed fields: We use the reserved keyword to prevent field number reuse, which would cause data corruption.
  • New endpoint versions: When an endpoint needs a fundamentally different shape, we add a new RPC method (e.g., GetServicesV2) rather than modifying the existing one. Both versions coexist until the old one can be retired.

A buf breaking check validates changes against master before pushing, catching accidental breaking changes early.

This conservative approach means both teams can move at their own pace. The backend can ship a new field without waiting for the mobile team to adopt it, and the mobile team can upgrade when ready.

Pros and Cons

Category Pros Cons
Technology High efficiency from HTTP/2 and Protobuf binary format Standard gRPC HTTP/2 PING frames fail behind common proxies like Nginx/Envoy
Performance Great horizontal scalability via localized connection state and fan-out One connection = one goroutine leads to higher memory consumption in Go
Real-Time Instant updates, eliminating the 60-second polling lag Requires implementing a custom bidirectional Ping/Pong to replace broken native keepalive
Architecture Leverages existing Go and gRPC stack and tooling Requires maintaining the deprecated polling mechanism as a critical fallback

Conclusion

gRPC bidirectional streaming delivered what we needed: sub-second push delivery, efficient resource usage, and a clean contract between backend and mobile. But the path there was not as straightforward as the gRPC documentation suggests. Two problems will hit anyone running gRPC streams behind a proxy at scale: keepalive frames are silently eaten by Nginx and Envoy, and a naive fan-out architecture will route events to the wrong instance. Both are solvable, but neither is free.

If you are considering this architecture, here is what we would tell you upfront: gRPC streaming is worth it if you need real-time push and control the proto contract end-to-end. It is not worth it if your update frequency is low enough that polling is acceptable, or if you cannot afford the operational overhead of custom keepalives and fan-out routing. In our case, the logistics scenario (drivers managing live parcel assignments in the field) made the tradeoff obvious. For a lower-frequency use case, Server-Sent Events over REST would have been simpler.

The lessons above are symptoms of a single underlying truth: long-lived streams are a different operational model from request-response. Plan for that from day one.

Hector Compañ

Senior Software Engineer

Franklin Olivares

Senior Software Engineer

Choose which cookies
you allow us to use

Cookies are small text files stored in your browser. They help us provide a better experience for you.

For example, they help us understand how you navigate our site and interact with it. But disabling essential cookies might affect how it works.

In each section below, we explain what each type of cookie does so you can decide what stays and what goes. Click through to learn more and adjust your preferences.

When you click “Save preferences”, your cookie selection will be stored. If you don’t choose anything, clicking this button will count as rejecting all cookies except the essential ones. Click here for more info.

Save preferences