The way many companies in the Chicago tech scene leverage data to achieve success may surprise you. We asked six of them — utilizing data to do everything from recommending cars to solving internal hiccups — about how their use of data created a recent win for their business and how it’s used to measure overall success.
When working with people’s money, a company’s data needs to be kept very up to date and well-organized. Data Team Lead,
Paul Whalen and his crew of engineers at investment firm PEAK6 are on a mission to modernize the company’s critical data, helping employees throughout the business do their jobs more efficiently.
At PEAK6, the availability of data is critical to the success of every piece of software and business function. It is largely the data development team’s responsibility to ensure that data sourced from external vendors is available promptly and accurately for any internal consumer. This team is currently in the process of modernizing how we ingest data, moving from one-a-day batch jobs based on the MS SQL server toward a real-time data streaming pipeline heavily reliant on Apache Kafka. We made this decision for a few reasons: it enables real-time processing using general purpose programming languages, but more importantly, it decouples the source and processing of data from its resting state and end users. This is critical because the access patterns of different users for the same type of data can vary greatly, and storage techniques for a certain type of data might need to be different for different usages of the data.
"We’ve enhanced internal Python and Java libraries used by a wide variety of users to actually expose tracking data about their usage.”
Since the data dev team is the provider of such a wide variety of data, it’s up to every other team — from traders to analytics to operations — to make good use of it. But because there is such a wide variety of data at PEAK6, it can be hard to track which specific data points are being used productively and should therefore be maintained or enhanced by the data dev team, and a few efforts have emerged to tackle this challenge. We’ve enhanced internal Python and Java libraries used by a wide variety of users to actually expose tracking data about their usage — both how much a certain function or data point is used, and the actual flow of the data — so we can track what raw data was used as an input for a more complicated calculation.