Apr 22, 2026
Every 60 seconds, every driver in our logistics fleet was asking the same question: “Has anything changed?” That polling loop meant higher latency, wasted bandwidth, and a user experience that felt sluggish when parcels were being reassigned in real time. We needed drivers to know instantly when a service was assigned, updated, or cancelled. This is the story of how we replaced polling with gRPC bidirectional streaming across our Go backend and Android app, and what we learned about proxies, horizontal scaling, and contract management along the way.
We are part of Cabify Logistics, the team behind Cabify’s parcel delivery operations: same-day delivery (SND), express, and food delivery. Unlike ride-hailing, a logistics driver can be actively managing multiple services at once: picking up, delivering, or getting services reassigned mid-route. In that context, a 60-second polling delay is not just inefficient, it’s a real operational problem.
When we set out to build the communication layer between our Go backend (the logistics-driver Backend for Frontend, or BFF) and our Android client app, we evaluated REST with JSON, WebSockets, and gRPC. We chose gRPC for several practical reasons:
Efficiency. gRPC runs on HTTP/2 and uses Protocol Buffers as its binary serialization format. Compared to REST with JSON, this means smaller payloads and lower latency, which is critical when drivers are on mobile networks with variable connectivity.
Strong contracts. Protocol Buffers enforce a typed schema at compile time on both sides. With REST, we would need to maintain OpenAPI specs manually and hope both sides stay in sync. With gRPC, the .proto file is the contract, and code generation ensures compliance.
Native streaming support. gRPC provides bidirectional streaming as a first-class primitive, unlike REST where we would need to bolt on WebSockets or Server-Sent Events as a separate protocol with its own connection management.
Ecosystem maturity. Both Go and Kotlin have well-supported gRPC libraries with code generation plugins, interceptors, and testing utilities.
Our architecture relies on four components:
logistics-driver BFF, responsible for managing driver connections.flowchart LR
A["📱 Android App"] -->|"gRPC stream"| B["logistics-driver BFF"]
C["service-offers backend"] -->|"service events"| D[("Message Bus")]
D -->|"fan-out"| B
Previously, the Android app polled the BFF every 60 seconds to fetch updated service information. The data a driver consumes is alive: it changes throughout the delivery workflow, and drivers must always see the freshest version. While polling partially mitigated the “multiple writers” problem, changes were slow to propagate, network traffic was high, and resource utilization was inefficient.
We implemented gRPC server streams to solve this. The backend service (service-offers) knows precisely when a service changes and notifies the client directly. We maintain a single, long-lived stream per driver. We still keep the original polling mechanism as a fallback to ensure robustness if the real-time stream or the Message Bus fails.
flowchart LR
subgraph before ["❌ Before: Polling"]
A1["Android App"] -->|"GET /services (every 60s)"| B1["BFF"]
B1 -->|"response"| A1
end
subgraph after ["✅ After: gRPC Stream"]
A2["Android App"] -->|"open stream"| B2["BFF"]
B2 -->|"push on change"| A2
end
The .proto files live in a dedicated Git repository, separate from both the backend and the mobile app. This repository is the single source of truth for the API contract, containing service definitions, message types, and all shared models.
We use buf for linting and format checking. The buf.yaml configuration enforces rules like no import cycles, consistent enum zero values, and proper formatting. Every merge request runs these checks automatically in CI.
The repository follows Semantic Versioning through a CHANGELOG.md. Every change (whether a new endpoint, a new field, or a deprecation) gets a version bump and a changelog entry.
When a change is merged to master, the CI pipeline triggers code generation:
Go bindings: The pipeline publishes Go bindings as a versioned Go module. The backend team bumps the dependency version in go.mod to consume the new contract.
Kotlin bindings: A script clones the mobile repository, replaces all existing .proto files, adapts them for the Android build system (adding Java package options and adjusting import paths), commits to a new branch, and opens a Merge Request automatically via the GitLab API. The mobile team reviews through their standard code review process.
This automation eliminates manual file copying and ensures the mobile team is immediately notified of any contract change.
flowchart LR
A["📁 Proto Repository<br/>(merge to master)"] --> B["CI Pipeline<br/>(buf lint + breaking check)"]
B --> C["Go Module<br/>(versioned)"]
B --> D["Kotlin script"]
C -->|"go.mod bump"| E["🖥️ Backend (Go)"]
D -->|"auto MR via GitLab API"| F["📱 Mobile (Kotlin)"]
When a driver opens a server stream, the request goes to a specific BFF instance, which holds an in-memory map of active streams. The difficulty is that the Message Bus, using round-robin assignment, might send an event relevant to a driver to an instance that does not hold that driver’s active connection.
We solved this using fan-out: the broker sends a copy of every event to all subscribed instances. Each instance then:
flowchart LR
MB[("Message Bus")] -->|"fan-out"| I1["BFF Instance 1<br/>✅ stream active"]
MB -->|"fan-out"| I2["BFF Instance 2<br/>❌ discard + ACK"]
MB -->|"fan-out"| I3["BFF Instance 3<br/>❌ discard + ACK"]
I1 -->|"gRPC push"| D["📱 Driver App"]
By localizing connection state and performing the check in memory, we achieve efficient horizontal scalability.
The standard gRPC keepalive mechanism relies on HTTP/2 PING frames to verify connection health. This fails in production because proxies like Nginx and Envoy sit in front of the server.
The problem: When the server sent a PING frame, the proxy intercepted it and responded with an ACK instead of forwarding it to the client. The server incorrectly believed the connection was healthy even when the client was disconnected. Relying on server-side gRPC messages as pings was also insufficient because Nginx buffered them.
We implemented a custom, application-level keepalive by upgrading our unidirectional server stream to a bidirectional stream. This approach uses regular gRPC messages (Ping and Pong) that proxies forward reliably:
Ping messages and expects a Pong reply. If no reply is received within a timeout, the connection is terminated.Ping messages and replies with a Pong. If no Ping is received within the expected interval, the server terminates the connection.We use 10-second intervals for ping frequency, with a rate limit of 10 pings per 10-second window per connection to prevent potential DoS attacks from authenticated clients.
sequenceDiagram
participant C as Client
participant P as Proxy (Nginx/Envoy)
participant S as Server
Note over C,S: ❌ Native HTTP/2 PING (fails through proxy)
S->>P: HTTP/2 PING frame
P-->>S: ACK (proxy intercepts)
Note over C: Never receives PING — proxy hides disconnection
Note over C,S: ✅ Custom Ping/Pong messages (works)
C->>P: Ping message
P->>S: Ping message
S->>P: Pong message
P->>C: Pong message
On the mobile side, the proto definition drives the entire implementation. A typical service looks like this:
service ExampleService {
rpc Example(ExampleRequest) returns (ExampleResponse);
rpc ExampleStream(stream ExampleStreamRequest) returns (stream ExampleStreamResponse);
}
message ExampleStreamRequest {
oneof request {
PingRequest ping_request = 1;
}
}
message ExampleStreamResponse {
oneof response {
StateChanged state_change = 1;
ServiceAssigned service_assigned = 2;
}
}
Once the project compiles, the grpc-kotlin plugin generates a Kotlin DSL builder. Making a call is straightforward:
suspend fun myCall(myId: String) {
val request = exampleRequest {
this.id = myId
}
val result = stub.example(request)
result.map { content ->
...
}
}
For bidirectional connections, we add the stream keyword to both request and response in the proto definition. The generated code returns a Flow instead of a suspend fun:
public fun exampleStream(requests: Flow<ExampleStreamRequest>): Flow<ExampleStreamResponse>
We emit data through a request flow and collect responses from the response flow:
private val streamRequestFlow = MutableSharedFlow<ExampleStreamRequest>()
fun myStream(): Flow<StreamDto> {
return stub.exampleStream(streamRequestFlow)
.catch { ex ->
emit(ExampleStreamResponse.getDefaultInstance())
}.map { response ->
when {
response.hasStateChange() -> StreamDto.StateChangedDto(...)
response.hasServiceAssigned() -> StreamDto.ServiceAssignedDto(...)
else -> StreamDto.Unknown
}
}
}
The grpc-kotlin plugin generates not only the client implementation but also a server base class. This means we can override service methods and return controlled responses directly in tests:
class TestExampleService : ExampleServiceGrpcKt.ExampleServiceCoroutineImplBase() {
private var response: ExampleResponse = ExampleResponse.getDefaultInstance()
fun setResponse(response: ExampleResponse) {
this.response = response
}
override suspend fun example(
request: ExampleRequest
): ExampleResponse {
return response
}
}
To redirect requests to this test implementation, we create an in-process server and managed channel. This is a real gRPC channel running locally, not a mock or fake:
object TestManagedChannel {
private val exampleService = TestExampleService()
private var channel: ManagedChannel? = null
private var server: Server? = null
fun setUp(): ManagedChannel {
val serverName = UUID.randomUUID().toString()
val serviceRegistry = MutableHandlerRegistry().apply {
addService(exampleService)
}
server = InProcessServerBuilder
.forName(serverName)
.fallbackHandlerRegistry(serviceRegistry)
.directExecutor()
.build()
.start()
channel = InProcessChannelBuilder
.forName(serverName)
.directExecutor()
.build()
return channel!!
}
fun getChannel(): ManagedChannel = channel
?: throw IllegalStateException("Channel not initialized. Call setUp() first.")
fun setExampleResponse(response: ExampleResponse) {
exampleService.setResponse(response)
}
fun tearDown() {
channel?.shutdown()
server?.shutdown()
channel = null
server = null
}
}
This channel can be injected in unit or integration tests:
class ExampleTest {
private lateinit var exampleRemoteDataSource: ExampleRemoteDataSourceImpl
@Before
fun setup() {
TestManagedChannel.setUp()
exampleRemoteDataSource = ExampleRemoteDataSourceImpl(
channel = TestManagedChannel.getChannel()
)
}
@After
fun tearDown() {
TestManagedChannel.tearDown()
}
@Test
fun exampleTest() = runTest {
val expectedValue = ...
TestManagedChannel.setExampleResponse(...)
val result = exampleRemoteDataSource.myCall(...)
assertEquals(expectedValue, result)
}
}
Robust observability is essential for long-lived stream connections. We monitor metrics critical to stream management, visualized on Grafana dashboards and feeding into Prometheus alerts.
In a Proof of Concept running 1,500 concurrent streams across three pods, the system demonstrated efficiency:
| Metric | Value |
|---|---|
| Max CPU Usage | 11% (12.0 millicores) |
| Max Memory Usage | 40% (37.7 MiB) |
Memory usage is primarily attributed to goroutines, as one connection equals one goroutine in Go.
Alerts are configured for:
Our API started unversioned (what we now call v0). As the service model evolved, we migrated to v1 (e.g., GetServices became GetServicesV1), with the original v0 RPC still present in the proto for backward compatibility. We treat backward compatibility as a hard constraint:
[deprecated = true], generating compiler warnings on both sides.reserved keyword to prevent field number reuse, which would cause data corruption.GetServicesV2) rather than modifying the existing one. Both versions coexist until the old one can be retired.A buf breaking check validates changes against master before pushing, catching accidental breaking changes early.
This conservative approach means both teams can move at their own pace. The backend can ship a new field without waiting for the mobile team to adopt it, and the mobile team can upgrade when ready.
| Category | Pros | Cons |
|---|---|---|
| Technology | High efficiency from HTTP/2 and Protobuf binary format | Standard gRPC HTTP/2 PING frames fail behind common proxies like Nginx/Envoy |
| Performance | Great horizontal scalability via localized connection state and fan-out | One connection = one goroutine leads to higher memory consumption in Go |
| Real-Time | Instant updates, eliminating the 60-second polling lag | Requires implementing a custom bidirectional Ping/Pong to replace broken native keepalive |
| Architecture | Leverages existing Go and gRPC stack and tooling | Requires maintaining the deprecated polling mechanism as a critical fallback |
gRPC bidirectional streaming delivered what we needed: sub-second push delivery, efficient resource usage, and a clean contract between backend and mobile. But the path there was not as straightforward as the gRPC documentation suggests. Two problems will hit anyone running gRPC streams behind a proxy at scale: keepalive frames are silently eaten by Nginx and Envoy, and a naive fan-out architecture will route events to the wrong instance. Both are solvable, but neither is free.
If you are considering this architecture, here is what we would tell you upfront: gRPC streaming is worth it if you need real-time push and control the proto contract end-to-end. It is not worth it if your update frequency is low enough that polling is acceptable, or if you cannot afford the operational overhead of custom keepalives and fan-out routing. In our case, the logistics scenario (drivers managing live parcel assignments in the field) made the tradeoff obvious. For a lower-frequency use case, Server-Sent Events over REST would have been simpler.
The lessons above are symptoms of a single underlying truth: long-lived streams are a different operational model from request-response. Plan for that from day one.
Senior Software Engineer
Senior Software Engineer