Skip to main content

GridLink Architecture

GridLink is an EV Fleet Charging Management Platform built for reliability and scale.

Platform Overview

Design Principles

API / Worker Separation

The platform separates synchronous API requests from asynchronous background work:

ComponentResponsibilityScaling Strategy
API ServiceHandle client requests, validate input, return responsesHorizontal scaling behind load balancer
Worker ProcessesLong-running tasks, optimization, batch operationsScale based on queue depth

This separation ensures API response times remain fast regardless of background workload.

Dedicated WebSocket Infrastructure

Charger connections require special handling:

  • Persistent connections - Each charger maintains a long-lived WebSocket
  • Auto-scaling - WebSocket instances scale independently based on connection count
  • Sticky sessions - Load balancer routes charger to same instance for connection lifetime
  • Graceful handoff - During deploys, connections drain to new instances without message loss

Message Queue for Resilience

Asynchronous communication via message queue provides:

  • Decoupling - Services operate independently
  • Buffering - Spikes in load are absorbed by the queue
  • Retry logic - Failed operations are retried automatically
  • Guaranteed delivery - Messages persist until acknowledged

Key workflows using the queue:

  • Charging schedule optimization requests
  • Bulk charger commands
  • Webhook delivery
  • Report generation

Caching Strategy

Multi-layer caching reduces database load:

Cache TypeUse CaseTTL
Session cacheActive charging sessionsReal-time
Configuration cacheCharger settings, rate structuresMinutes
Query cacheDashboard aggregationsSeconds

Reliability Patterns

Health Checks & Circuit Breakers

  • All services expose health endpoints
  • Load balancers remove unhealthy instances automatically
  • External service calls use circuit breakers to prevent cascade failures

Data Consistency

  • Database transactions for critical operations
  • Idempotency keys for API mutations
  • Event sourcing for charging session state

Offline Tolerance

When connectivity is lost:

  • Chargers queue messages locally, replay on reconnect
  • Platform handles out-of-order message delivery
  • Transactions are reconstructed from queued events

Deployment Model

LayerInfrastructure
EdgeCDN for static assets, DDoS protection
Load BalancingApplication LB for API, Network LB for WebSockets
ComputeContainer orchestration with auto-scaling
DatabaseManaged relational database with read replicas
CacheManaged in-memory store with clustering
QueueManaged message broker with persistence

All components run in private subnets. Only load balancers are internet-facing.

High Availability

  • Multi-AZ deployment for all stateful components
  • Automated failover for database
  • Zero-downtime deployments via rolling updates
  • Connection draining for WebSocket instances