Network monitoring at scale is a big data problem that requires a big data platform¹. This post describes how to perform real-time analysis while storing network flows metadata in a data lake on AWS S3 using Spark on Databricks.

Photo by NASA on Unsplash

Background

Network visibility and analysis relies on data from the network. These days it is not uncommon for enterprise networks to carry very large amount of data in the network traffic. While packet capture has been the tool of choice for a long time, it is obvious that it is too costly to use as the preferred way of performing data analysis…

Marco Graziano

Engineer, apprentice renaissance man. I am the founder of technology start-ups in Palo Alto.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store