GridLink Architecture
GridLink is an EV Fleet Charging Management Platform built for reliability and scale.
Platform Overview
Design Principles
API / Worker Separation
The platform separates synchronous API requests from asynchronous background work:
| Component | Responsibility | Scaling Strategy |
|---|---|---|
| API Service | Handle client requests, validate input, return responses | Horizontal scaling behind load balancer |
| Worker Processes | Long-running tasks, optimization, batch operations | Scale based on queue depth |
This separation ensures API response times remain fast regardless of background workload.
Dedicated WebSocket Infrastructure
Charger connections require special handling:
- Persistent connections - Each charger maintains a long-lived WebSocket
- Auto-scaling - WebSocket instances scale independently based on connection count
- Sticky sessions - Load balancer routes charger to same instance for connection lifetime
- Graceful handoff - During deploys, connections drain to new instances without message loss
Message Queue for Resilience
Asynchronous communication via message queue provides:
- Decoupling - Services operate independently
- Buffering - Spikes in load are absorbed by the queue
- Retry logic - Failed operations are retried automatically
- Guaranteed delivery - Messages persist until acknowledged
Key workflows using the queue:
- Charging schedule optimization requests
- Bulk charger commands
- Webhook delivery
- Report generation
Caching Strategy
Multi-layer caching reduces database load:
| Cache Type | Use Case | TTL |
|---|---|---|
| Session cache | Active charging sessions | Real-time |
| Configuration cache | Charger settings, rate structures | Minutes |
| Query cache | Dashboard aggregations | Seconds |
Reliability Patterns
Health Checks & Circuit Breakers
- All services expose health endpoints
- Load balancers remove unhealthy instances automatically
- External service calls use circuit breakers to prevent cascade failures
Data Consistency
- Database transactions for critical operations
- Idempotency keys for API mutations
- Event sourcing for charging session state
Offline Tolerance
When connectivity is lost:
- Chargers queue messages locally, replay on reconnect
- Platform handles out-of-order message delivery
- Transactions are reconstructed from queued events
Deployment Model
| Layer | Infrastructure |
|---|---|
| Edge | CDN for static assets, DDoS protection |
| Load Balancing | Application LB for API, Network LB for WebSockets |
| Compute | Container orchestration with auto-scaling |
| Database | Managed relational database with read replicas |
| Cache | Managed in-memory store with clustering |
| Queue | Managed message broker with persistence |
All components run in private subnets. Only load balancers are internet-facing.
High Availability
- Multi-AZ deployment for all stateful components
- Automated failover for database
- Zero-downtime deployments via rolling updates
- Connection draining for WebSocket instances