Scala Days 2013 / Spark Streaming: Fast Distributed Stream Processing with a High-Level API

Please RSVP for instructions on how to join the event.


Spark Streaming is a new extension to the Spark cluster computing framework that enables high-speed, fault-tolerant stream processing through a high-level Scala API. It builds on a new execution model called "discretized streams" to provide exactly-once processing without the heavy cost of transactions required by previous systems (e.g. Storm), allowing it to process significantly higher rates of data per node while still recovering from faults in seconds. It also greatly simplifies stream programming by providing a set of high-level operators (e.g. maps, filters, and windows) in Scala. Perhaps the most exciting feature of Spark Streaming, however, is that it combines seamlessly with Spark's interactive and batch processing features, allowing ad-hoc queries on stream state and programs that combine streaming and historical data. Spark Streaming scales linearly to 100 nodes and has been used to build applications including session-level metrics reporting and online machine learning.


No outline is available

Content is not yet available

Tathagata Das (tathagata.das)


No biography is available.

For a complete view of this profile, including education, work experience and developer information, you need to be logged in and have a subscription.

Tathagata's upcoming trainings

No events

Tathagata's past online trainings

Tathagata's blog posts

IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper