Software Engineer, Production Infrastructure \& Marketplace Forecasting, 11/2021 - Present
Zone Aware Routing: Improved intra availability zone routing with ROI of $2 million.
- Reduced inter-AZ production traffic by 40%, saving $2M annually on data transfer costs.
- Migrated 1,416 microservices serving 500K+ requests/second to Envoy load balancing subsets.
- Deprecated error prone load balancing components in favor of configuring load balancing subsets in Envoy.
- Wrote design spec and grafana/kibana dashboards. Communicated with customer teams to debug load balancing edge cases.
Demand Forecasting Improvements: Scaled ML pipeline to support 30x increase in model granularity for marketplace pricing.
- Implemented compression and pruning across data pipeline, reducing over-the-network bandwidth by 80%.
- Collaborated with data scientists to validate model performance, ensuring accuracy (MAPE < 10%) throughout pipeline optimization.
- Implemented model inference handlers managing data processing, imputation, and post-inference aggregation.
No More Yaml (NoMoYa): Decreased time-to-deploy networking settings from 15 minutes to less than 30 seconds.
- Implemented new configuration API server (GoLang) and user interface (TypeScript + React) handling circuit breakers, health check endpoints, traffic migrations, and network dependency allow lists.
- Improved team operations through self-service SEV mitigation, preventing context switching from team members.
Control Plane Backend Sharding: Collaborated with tech lead to simplify endpoint discovery.
- Transitioned from leader-elected writers to independent writers, enhancing service reliability and simplifying the deployment pipeline.
- Implemented new data layer on control plane frontend and backend with 0 downtime, and a 99.95% mesh availability SLA.
- Parallelized service discovery queue reducing endpoint query latency by roughly 50%.
Operations \& Leadership
- Part of interview revamp working group. Developing new interview standards to address changes in the interview space related to LLMs.
- Participated in debugging and mitigating more than 100 incidents through analyzing various kibana and prometheus queries, and SSHing into hosts themselves to validate networking components.
- Performed technical deep dives on load balancing, did 30+ candidate interviews, actively participated in team planning, and acted as mentor for both interns and new-hires.