Loading…
This event has ended. Visit the official site or create your own event on Sched.
View analytic
Wednesday, October 5 • 11:20am - 12:10pm
Stream Processing with Apache Flink in Zalando's World of Microservices

Sign up or log in to save this to your schedule and see who's attending!

In this talk we present Zalando's microservices architecture, introduce Saiki – our next generation data integration and distribution platform on AWS and show how we employ stream processing for near-real time business intelligence.

Zalando is one of the largest online fashion retailers in Europe. In order to secure our future growth and remain competitive in this dynamic market, we are transitioning from a monolithic to a microservices architecture and from a hierarchical to an agile organization.

We first have a look at how business intelligence processes have been working inside Zalando for the last years and present our current approach - Saiki. It is a scalable, cloud-based data integration and distribution infrastructure that makes data from our many microservices readily available for analytical teams.

We no longer live in a world of static data sets, but are instead confronted with an endless stream of events that constantly inform us about relevant happenings from all over the enterprise. The processing of these event streams enables us to do near-real time business intelligence. In this context we have evaluated Apache Flink vs. Apache Spark in order to choose the right stream processing framework. Given our requirements, we decided to use Flink as part of our technology stack, alongside with Kafka and Elasticsearch.

With these technologies we are currently working on two use cases: a near real-time business process monitoring solution and streaming ETL.

Monitoring our business processes enables us to check if technically the Zalando platform works. It also helps us analyze data streams on the fly, e.g. order velocities, delivery velocities and to control service level agreements.

On the other hand, streaming ETL is used to relinquish resources from our relational data warehouse, as it struggles with increasingly high loads. In addition to that, it also reduces the latency and facilitates the platform scalability.

Finally, we have an outlook on our future use cases, e.g. near-real time sales and price monitoring. Another aspect to be addressed is to lower the entry barrier of stream processing for our colleagues coming from a relational database background.

Speakers
avatar for Javier Lopez

Javier Lopez

Big Data Engineer, Zalando
Javier is a Colombian Engineer from the National University of Colombia. During his bachelor studies he focused on Software Engineering, Telecommunication technologies and Business Intelligence. After working more than 7 years as Software Engineer in different industries (Education, Banking, Entrepreneurship, among others) he decided to change career paths and started working as a Business Intelligence Engineer. Javier has worked 4+ years in... Read More →
avatar for Mihail Vieru

Mihail Vieru

Big Data Engineer, Zalando
Mihail is passionate about designing and implementing highly scalable, performant and robust data processing solutions. He enjoys continuously learning and working with cutting edge technologies. Mihail earned a Master's degree from the Humboldt University of Berlin, Germany, where he specialized on Big Data Analytics Systems, Data Warehousing and Software Engineering. As part of his studies, he worked on an optimization component for... Read More →


Wednesday October 5, 2016 11:20am - 12:10pm
Texas Ballroom