Notion migrated from Spark on EMR to Ray, cutting embedding costs 80% and improving query latency 10x. Uber and Salesforce shared similar AI infrastructure winsNotion migrated from Spark on EMR to Ray, cutting embedding costs 80% and improving query latency 10x. Uber and Salesforce shared similar AI infrastructure wins

Notion Slashes AI Embedding Costs 80% After Ditching Spark for Ray

2026/04/10 00:48
3분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

Notion Slashes AI Embedding Costs 80% After Ditching Spark for Ray

James Ding Apr 09, 2026 16:48

Notion migrated from Spark on EMR to Ray, cutting embedding costs 80% and improving query latency 10x. Uber and Salesforce shared similar AI infrastructure wins.

Notion Slashes AI Embedding Costs 80% After Ditching Spark for Ray

Notion has slashed its AI embedding pipeline costs by more than 80% after migrating from Apache Spark to Ray, the distributed computing framework backed by Anyscale. The productivity software company also achieved 10x improvements in query latency while consolidating three separate jobs per region into one.

The migration details emerged at Ray Day Seattle on April 9, 2026, where ML engineers from Notion, Uber, Salesforce, and Apple shared hard-won lessons about scaling AI infrastructure.

What Notion Actually Changed

Mickey Liu, a software engineer on Notion's search platform team, walked through the overhaul. Their original setup used a three-step Spark pipeline running on Amazon EMR: data chunking, third-party API calls for embedding generation, and writes to a vector store.

The pain points were predictable but severe. Double compute costs. Third-party API rate limits throttling throughput. Debugging nightmares when failures occurred across tools—driver and executor logs weren't even persisted in YARN.

The new architecture streams Kafka data directly into a Ray cluster handling CPU chunking, GPU embedding generation, and vector store writes in a single pipeline. No intermediate S3 handoffs. What started as the backend for a Q&A feature in 2023 now powers all of Notion AI and custom agents.

Uber and Salesforce Report Similar Gains

Uber's Peng Zhang detailed how their Michelangelo ML platform evolved from TensorFlow/Horovod to Ray with PyTorch. The standout move: separating CPU data-loading nodes from GPU training nodes in a heterogeneous cluster design. Result? GPU utilization jumped 20%, and training time dropped roughly 50% in select pipelines.

Salesforce tackled a different beast—summarizing documents up to 200,000 tokens long (roughly a short novel) with P95 latency under 15 seconds. Their team used Ray to chunk documents and run parallel inference across a distributed actor pool with vLLM, then merge results. They landed on 1-2 GPU data parallelism as the sweet spot after running scaling experiments directly on Ray.

Why This Matters Beyond These Companies

Robert Nishihara, Ray's co-creator and Anyscale co-founder, opened the event by framing the core problem: AI infrastructure keeps getting harder. Multimodal data processing, reinforcement learning workloads, and multi-node LLM inference are pushing existing tools past their limits.

Every speaker landed on the same conclusion from different angles—their previous tooling ran out of road.

Apple engineers Charlie Chen and Haocheng Bian highlighted foundation model training challenges: massive unstructured data, billion-plus parameters, and sparse architectures like Mixture of Experts. Traditional engines fail because data pipelines and training frameworks run in separate environments with no shared context.

What's Next

Ray Day Seattle kicked off Anyscale's 2026 "Ray on the Road" tour—eight cities across three countries. The company is also running invite-only customer roundtables at each stop to preview their product roadmap.

For teams hitting similar walls with Spark or other distributed frameworks, Notion's full technical writeup is available on their engineering blog under "Two Years of Vector Search at Notion." The 80% cost reduction and 10x latency improvement offer a concrete benchmark for anyone evaluating similar migrations.

Image source: Shutterstock
  • ai infrastructure
  • ray
  • machine learning
  • enterprise tech
  • cost optimization
시장 기회
레이디움 로고
레이디움 가격(RAY)
$0.6697
$0.6697$0.6697
+2.58%
USD
레이디움 (RAY) 실시간 가격 차트
면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

USD1 Genesis: 0 Fees + 12% APR

USD1 Genesis: 0 Fees + 12% APRUSD1 Genesis: 0 Fees + 12% APR

New users: stake for up to 600% APR. Limited time!