From Polling to gRPC Streaming at Cabify

Every 60 seconds, every driver in our logistics fleet was asking the same question: “Has anything changed?” That polling loop meant higher latency, wasted bandwidth, and a user experience that felt sluggish when parcels were being reassigned in real time. We needed drivers to know instantly when a service was assigned, updated, or cancelled. This is the story of how we replaced polling with gRPC bidirectional streaming across our Go backend and Android app, and what we learned about proxies, horizontal scaling, and contract management along the way.

We are part of Cabify Logistics, the team behind Cabify’s parcel delivery operations: same-day delivery (SND), express, and food delivery. Unlike ride-hailing, a logistics driver can be actively managing multiple services at once: picking up, delivering, or getting services reassigned mid-route. In that context, a 60-second polling delay is not just inefficient, it’s a real operational problem.

Why gRPC?

When we set out to build the communication layer between our Go backend (the logistics-driver Backend for Frontend, or BFF) and our Android client app, we evaluated REST with JSON, WebSockets, and gRPC. We chose gRPC for several practical reasons:

Efficiency. gRPC runs on HTTP/2 and uses Protocol Buffers as its binary serialization format. Compared to REST with JSON, this means smaller payloads and lower latency, which is critical when drivers are on mobile networks with variable connectivity.
Strong contracts. Protocol Buffers enforce a typed schema at compile time on both sides. With REST, we would need to maintain OpenAPI specs manually and hope both sides stay in sync. With gRPC, the .proto file is the contract, and code generation ensures compliance.
Native streaming support. gRPC provides bidirectional streaming as a first-class primitive, unlike REST where we would need to bolt on WebSockets or Server-Sent Events as a separate protocol with its own connection management.
Ecosystem maturity. Both Go and Kotlin have well-supported gRPC libraries with code generation plugins, interceptors, and testing utilities.

Architectural Overview

Our architecture relies on four components:

The Android client app: The mobile application used by logistics drivers.
The Backend for Frontend (BFF): The logistics-driver BFF, responsible for managing driver connections.
The Message Bus: Our event-driven backbone, propagating changes across services.
The service-offers backend: Responsible for handling and notifying service assignments to drivers.

flowchart LR
    A["📱 Android App"] -->|"gRPC stream"| B["logistics-driver BFF"]
    C["service-offers backend"] -->|"service events"| D[("Message Bus")]
    D -->|"fan-out"| B

From Polling to Real-Time Streams

Previously, the Android app polled the BFF every 60 seconds to fetch updated service information. The data a driver consumes is alive: it changes throughout the delivery workflow, and drivers must always see the freshest version. While polling partially mitigated the “multiple writers” problem, changes were slow to propagate, network traffic was high, and resource utilization was inefficient.

We implemented gRPC server streams to solve this. The backend service (service-offers) knows precisely when a service changes and notifies the client directly. We maintain a single, long-lived stream per driver. We still keep the original polling mechanism as a fallback to ensure robustness if the real-time stream or the Message Bus fails.

flowchart LR
    subgraph before ["❌ Before: Polling"]
        A1["Android App"] -->|"GET /services (every 60s)"| B1["BFF"]
        B1 -->|"response"| A1
    end
    subgraph after ["✅ After: gRPC Stream"]
        A2["Android App"] -->|"open stream"| B2["BFF"]
        B2 -->|"push on change"| A2
    end

The Contract Layer: A Single Source of Truth

The .proto files live in a dedicated Git repository, separate from both the backend and the mobile app. This repository is the single source of truth for the API contract, containing service definitions, message types, and all shared models.

We use buf for linting and format checking. The buf.yaml configuration enforces rules like no import cycles, consistent enum zero values, and proper formatting. Every merge request runs these checks automatically in CI.

The repository follows Semantic Versioning through a CHANGELOG.md. Every change (whether a new endpoint, a new field, or a deprecation) gets a version bump and a changelog entry.

Code Generation Pipeline

When a change is merged to master, the CI pipeline triggers code generation:

Go bindings: The pipeline publishes Go bindings as a versioned Go module. The backend team bumps the dependency version in go.mod to consume the new contract.
Kotlin bindings: A script clones the mobile repository, replaces all existing .proto files, adapts them for the Android build system (adding Java package options and adjusting import paths), commits to a new branch, and opens a Merge Request automatically via the GitLab API. The mobile team reviews through their standard code review process.

This automation eliminates manual file copying and ensures the mobile team is immediately notified of any contract change.

flowchart LR
    A["📁 Proto Repository<br/>(merge to master)"] --> B["CI Pipeline<br/>(buf lint + breaking check)"]
    B --> C["Go Module<br/>(versioned)"]
    B --> D["Kotlin script"]
    C -->|"go.mod bump"| E["🖥️ Backend (Go)"]
    D -->|"auto MR via GitLab API"| F["📱 Mobile (Kotlin)"]

Scaling gRPC Streams in Go

Connection Handling: The Fan-Out Solution

When a driver opens a server stream, the request goes to a specific BFF instance, which holds an in-memory map of active streams. The difficulty is that the Message Bus, using round-robin assignment, might send an event relevant to a driver to an instance that does not hold that driver’s active connection.

We solved this using fan-out: the broker sends a copy of every event to all subscribed instances. Each instance then:

Receives the fanned-out event and checks the driver ID.
Compares it against its own in-memory map of active streams.
Only the instance holding the active gRPC connection processes and forwards the message. All others discard it with an ACK.

flowchart LR
    MB[("Message Bus")] -->|"fan-out"| I1["BFF Instance 1<br/>✅ stream active"]
    MB -->|"fan-out"| I2["BFF Instance 2<br/>❌ discard + ACK"]
    MB -->|"fan-out"| I3["BFF Instance 3<br/>❌ discard + ACK"]
    I1 -->|"gRPC push"| D["📱 Driver App"]

By localizing connection state and performing the check in memory, we achieve efficient horizontal scalability.

Keepalives: Bypassing the Proxy Problem

The standard gRPC keepalive mechanism relies on HTTP/2 PING frames to verify connection health. This fails in production because proxies like Nginx and Envoy sit in front of the server.

The problem: When the server sent a PING frame, the proxy intercepted it and responded with an ACK instead of forwarding it to the client. The server incorrectly believed the connection was healthy even when the client was disconnected. Relying on server-side gRPC messages as pings was also insufficient because Nginx buffered them.

We implemented a custom, application-level keepalive by upgrading our unidirectional server stream to a bidirectional stream. This approach uses regular gRPC messages (Ping and Pong) that proxies forward reliably:

Client-side: Periodically sends Ping messages and expects a Pong reply. If no reply is received within a timeout, the connection is terminated.
Server-side: Monitors incoming Ping messages and replies with a Pong. If no Ping is received within the expected interval, the server terminates the connection.

We use 10-second intervals for ping frequency, with a rate limit of 10 pings per 10-second window per connection to prevent potential DoS attacks from authenticated clients.

sequenceDiagram
    participant C as Client
    participant P as Proxy (Nginx/Envoy)
    participant S as Server

    Note over C,S: ❌ Native HTTP/2 PING (fails through proxy)
    S->>P: HTTP/2 PING frame
    P-->>S: ACK (proxy intercepts)
    Note over C: Never receives PING — proxy hides disconnection

    Note over C,S: ✅ Custom Ping/Pong messages (works)
    C->>P: Ping message
    P->>S: Ping message
    S->>P: Pong message
    P->>C: Pong message

gRPC on Android with Kotlin

On the mobile side, the proto definition drives the entire implementation. A typical service looks like this:

service ExampleService {
    rpc Example(ExampleRequest) returns (ExampleResponse);
    rpc ExampleStream(stream ExampleStreamRequest) returns (stream ExampleStreamResponse);
}

message ExampleStreamRequest {
  oneof request {
    PingRequest ping_request = 1;
  }
}

message ExampleStreamResponse {
  oneof response {
    StateChanged state_change = 1;
    ServiceAssigned service_assigned = 2;
  }
}

Single Calls

Once the project compiles, the grpc-kotlin plugin generates a Kotlin DSL builder. Making a call is straightforward:

suspend fun myCall(myId: String) {
    val request = exampleRequest {
        this.id = myId
    }
    val result = stub.example(request)
    result.map { content ->
        ...
    }
}

Bidirectional Streaming

For bidirectional connections, we add the stream keyword to both request and response in the proto definition. The generated code returns a Flow instead of a suspend fun:

public fun exampleStream(requests: Flow<ExampleStreamRequest>): Flow<ExampleStreamResponse>

We emit data through a request flow and collect responses from the response flow:

private val streamRequestFlow = MutableSharedFlow<ExampleStreamRequest>()

fun myStream(): Flow<StreamDto> {
    return stub.exampleStream(streamRequestFlow)
        .catch { ex ->
            emit(ExampleStreamResponse.getDefaultInstance())
        }.map { response ->
            when {
                response.hasStateChange() -> StreamDto.StateChangedDto(...)
                response.hasServiceAssigned() -> StreamDto.ServiceAssignedDto(...)
                else -> StreamDto.Unknown
            }
        }
}

Testing gRPC on Mobile

The grpc-kotlin plugin generates not only the client implementation but also a server base class. This means we can override service methods and return controlled responses directly in tests:

class TestExampleService : ExampleServiceGrpcKt.ExampleServiceCoroutineImplBase() {

    private var response: ExampleResponse = ExampleResponse.getDefaultInstance()

    fun setResponse(response: ExampleResponse) {
        this.response = response
    }

    override suspend fun example(
        request: ExampleRequest
    ): ExampleResponse {
        return response
    }
}

To redirect requests to this test implementation, we create an in-process server and managed channel. This is a real gRPC channel running locally, not a mock or fake:

object TestManagedChannel {
    private val exampleService = TestExampleService()
    private var channel: ManagedChannel? = null
    private var server: Server? = null

    fun setUp(): ManagedChannel {
        val serverName = UUID.randomUUID().toString()
        val serviceRegistry = MutableHandlerRegistry().apply {
            addService(exampleService)
        }

        server = InProcessServerBuilder
            .forName(serverName)
            .fallbackHandlerRegistry(serviceRegistry)
            .directExecutor()
            .build()
            .start()

        channel = InProcessChannelBuilder
            .forName(serverName)
            .directExecutor()
            .build()

        return channel!!
    }

    fun getChannel(): ManagedChannel = channel 
        ?: throw IllegalStateException("Channel not initialized. Call setUp() first.")

    fun setExampleResponse(response: ExampleResponse) {
        exampleService.setResponse(response)
    }

    fun tearDown() {
        channel?.shutdown()
        server?.shutdown()
        channel = null
        server = null
    }
}

This channel can be injected in unit or integration tests:

class ExampleTest {

    private lateinit var exampleRemoteDataSource: ExampleRemoteDataSourceImpl

    @Before
    fun setup() {
        TestManagedChannel.setUp()
        exampleRemoteDataSource = ExampleRemoteDataSourceImpl(
            channel = TestManagedChannel.getChannel()
        )
    }

    @After
    fun tearDown() {
        TestManagedChannel.tearDown()
    }

    @Test
    fun exampleTest() = runTest {
        val expectedValue = ...
        TestManagedChannel.setExampleResponse(...)
        val result = exampleRemoteDataSource.myCall(...)
        assertEquals(expectedValue, result)
    }
}

Observability

Robust observability is essential for long-lived stream connections. We monitor metrics critical to stream management, visualized on Grafana dashboards and feeding into Prometheus alerts.

In a Proof of Concept running 1,500 concurrent streams across three pods, the system demonstrated efficiency:

Metric	Value
Max CPU Usage	11% (12.0 millicores)
Max Memory Usage	40% (37.7 MiB)

Memory usage is primarily attributed to goroutines, as one connection equals one goroutine in Go.

Alerts are configured for:

Connection Loss Detection: When the custom keepalive mechanism fails.
Resource Saturation: When CPU or memory usage approaches dangerous thresholds.
Event Fan-out Issues: When event latency in the Message Bus exceeds expected thresholds.

Managing API Evolution

Our API started unversioned (what we now call v0). As the service model evolved, we migrated to v1 (e.g., GetServices became GetServicesV1), with the original v0 RPC still present in the proto for backward compatibility. We treat backward compatibility as a hard constraint:

Adding new fields: New fields get the next available field number. Existing clients ignore fields they do not recognize.
Deprecating fields: We mark fields with [deprecated = true], generating compiler warnings on both sides.
Reserving removed fields: We use the reserved keyword to prevent field number reuse, which would cause data corruption.
New endpoint versions: When an endpoint needs a fundamentally different shape, we add a new RPC method (e.g., GetServicesV2) rather than modifying the existing one. Both versions coexist until the old one can be retired.

A buf breaking check validates changes against master before pushing, catching accidental breaking changes early.

This conservative approach means both teams can move at their own pace. The backend can ship a new field without waiting for the mobile team to adopt it, and the mobile team can upgrade when ready.

Pros and Cons

Category	Pros	Cons
Technology	High efficiency from HTTP/2 and Protobuf binary format	Standard gRPC HTTP/2 PING frames fail behind common proxies like Nginx/Envoy
Performance	Great horizontal scalability via localized connection state and fan-out	One connection = one goroutine leads to higher memory consumption in Go
Real-Time	Instant updates, eliminating the 60-second polling lag	Requires implementing a custom bidirectional Ping/Pong to replace broken native keepalive
Architecture	Leverages existing Go and gRPC stack and tooling	Requires maintaining the deprecated polling mechanism as a critical fallback

Conclusion

gRPC bidirectional streaming delivered what we needed: sub-second push delivery, efficient resource usage, and a clean contract between backend and mobile. But the path there was not as straightforward as the gRPC documentation suggests. Two problems will hit anyone running gRPC streams behind a proxy at scale: keepalive frames are silently eaten by Nginx and Envoy, and a naive fan-out architecture will route events to the wrong instance. Both are solvable, but neither is free.

If you are considering this architecture, here is what we would tell you upfront: gRPC streaming is worth it if you need real-time push and control the proto contract end-to-end. It is not worth it if your update frequency is low enough that polling is acceptable, or if you cannot afford the operational overhead of custom keepalives and fan-out routing. In our case, the logistics scenario (drivers managing live parcel assignments in the field) made the tradeoff obvious. For a lower-frequency use case, Server-Sent Events over REST would have been simpler.

The lessons above are symptoms of a single underlying truth: long-lived streams are a different operational model from request-response. Plan for that from day one.

Provider	Purpose
Google Ads	A. Data processing based on consent: - Store or access information on a device - Build a personalised ad profile - Select personalised ads B. Based on legitimate interest: - Personalise content - Improve products - Measure ad performance - Select basic ads - Select personalised content - Use market research to generate audience insights Extra processing: - Match and combine offline data - Ensure security, prevent fraud and debug - Technically deliver ads or content - Link devices Google Advertising Products follow the IAB Transparency & Consent Framework. More in their Privacy Policy, business safety & privacy site and Terms of Service.
Facebook	Based on consent: store or access information on a device. Learn more in their Privacy Policy.
LinkedIn	Based on consent: - Store/access info - Build a personalised ad profile - Select personalised ads Based on legitimate interest: - Improve products - Measure ad performance - Select basic ads Extra processing: - Ensure security, prevent fraud and debug - Technically deliver LinkedIn content LinkedIn follows the IAB Framework. See their Privacy Policy.
X	Based on consent: store or access information on a device. More info in their Privacy Policy.
Taboola	Based on consent: store or access information on a device. Based on legitimate interest: - Personalise content and ads - Measure performance - Link devices - Improve products - Use offline data Taboola follows the IAB Framework. Read their Privacy Policy.
TikTok	Based on consent: - Store or access information - Build a personalised ad profile - Select personalised ads Based on legitimate interest: - Measure ad performance - Select basic ads - Improve products Extra processing: - Prevent fraud - Ensure security - Deliver ads TikTok follows the IAB Framework. See their Privacy Policy.
Microsoft Advertising	Uses the UET tag to track site usage and optimise campaigns. Helps personalise ads and measure effectiveness. Follows the IAB Framework. More in their Privacy Policy.
StackAdapt	Based on consent: uses cookies to uniquely identify users for retargeting, conversion tracking, and lookalike profiles. Tracks campaign performance and engagement. Follows the IAB Framework. Read their Privacy Policy.
Criteo	Based on consent: stores or accesses information on a device for personalised advertising and retargeting. Uses device identifiers and browsing data to show relevant ads based on your interests and previous shopping behaviour. This includes creating personalised advertising profiles and measuring ad performance. Criteo follows the IAB Transparency & Consent Framework. More info in their Privacy Policy.

Provider	Purpose
Google Analytics	Google’s tool for measuring site use. It uses cookies (like “_ga”) to track visits, without identifying individuals. Data may be used with advertising cookies to personalise and measure ads across Google and the web. More info. See also Google’s business safety & privacy site and Terms of Service.
Amplitude	Tracks how users navigate our site, what features they use, and what actions they take — all to help us improve. More info.
Microsoft Clarity	Behavioural analytics tool from Microsoft that uses heatmaps and session recordings to understand how users interact with our site. Tracks clicks, scrolling, mouse movements and navigation patterns to help us improve user experience and site design. More info.
Hotjar	Behavioural analytics tool that tracks user interactions like clicks and scrolling. It helps us improve usability and design. More info.

From Polling to gRPC Streaming at Cabify

Why gRPC?

Architectural Overview

From Polling to Real-Time Streams

The Contract Layer: A Single Source of Truth

Code Generation Pipeline

Scaling gRPC Streams in Go

Connection Handling: The Fan-Out Solution

Keepalives: Bypassing the Proxy Problem

gRPC on Android with Kotlin

Single Calls

Bidirectional Streaming

Testing gRPC on Mobile

Observability

Managing API Evolution

Pros and Cons

Conclusion

Hector Compañ

Franklin Olivares

Bringing our culture to life through stories—discover it in our blog.

From Polling to gRPC Streaming at Cabify

From SQL Queries to Real-Time Search

Where is my domain?

Why do you need Switchback for experimentation

Cabify Ciudad: a proprietary, accessible, and expressive typeface

How design critiques elevate product quality at Cabify

An interview about the User Week

The Hidden Cost of System Degradations