Being reactive in distributed systems is critical, but what does that really look like at scale with Terabytes or Petabytes of data ingestion per day, or what does it mean in applications and deployment architecture?
There is a need to simplify. How can we build resilient, self healing systems that run at massive scale which don't lose data, support rigorous requirements, in the chaos of big data, partial failures, split brain, and eventual consistency? How would you build awareness and intelligence into your systems if 'everything fails all the time' was a starting point?
This talk looks at the problems differently, with reactive strategies and technologies that collaborate and how they help achieve more stable, self-aware systems.
Helena has been building large-scale, reactive, distributed cloud-based systems for many years, distributed big data systems for the last four, choosing Scala, Akka and Kafka for the core of all. She will discuss simplification of big data architecture, data flows, and a collaborative set of supporting technologies.
In this hands on talk and demonstration I'll give a very short introduction to stream processing and then dive into writing code and demonstrating the features in Apache Flink that make truly robust stream processing possible. We'll focus on correctness and robustness in stream processing.
During this live demo we'll be developing a realtime analytics application and modifying it on the fly based on the topics we're working though. We'll exercise Flink's unique features, demonstrate fault-recovery, clearly explain and demonstrate why Event Time is such an important concept in robust stateful stream processing and talk about and demonstrate the features you need in a stream processor to do robust stateful stream processing in production.
We'll also use a realtime analytics dashboard to visualize the results we're computing in realtime. This will allow us to easily see the effects of the code we're developing as we go along.
Some of the topics covered will be:
The success of Apache Spark is bringing developers to Scala.
For Big Data, the JVM uses memory inefficiently, causing significant GC challenges. Spark's project "Tungsten" is fixing these problems with custom data layouts and code generation.
In this talk, we'll see what we've learned from Spark, ongoing improvements, and what we should do to improve Scala and the JVM for Big Data.
“I don’t need stream processing because I don’t have streaming data.” - Anonymous
In general people relate to streams as data models which are naturally streaming (infinite asynchronous messages) or relate to realtime big data processing. In this talk Nitesh Kant, will try to break this myth by emphasizing the fact that streams exists everywhere, be it data read from sockets, protocols like HTTP or microservice composition. He will explain how extending this ubiquitous interaction model into applications can result in simpler, resilient and maintainable systems.
You will learn, how to start thinking “streaming first” through concrete examples and how adopting this mental model makes writing application easier.