Startup offers a big-data storyteller

17.10.2014

The service can also be a potentially superior alternative to the venerable desktop PowerPoint presentation, in that it can be linked to live sources of data, and can be easily updated by multiple contributors. A storyboard can also be widely shared, with the owner retaining control of the presentation through a single, canonical copy.

Shahani-Mulligan formed ClearStory Data in 2011 to tackle the thorny issue of combining multiple, disparate data sources in a way that they can be easily used. In research with potential enterprise customers, the company found that 74 percent of the organizations want to blend data from more than four data sources. Today, when a team collaborates with multiple data sources they often must resort to sharing information over email or by spreadsheet.

The commercial service, which debuted last year, provided an easy way for users to build charts from numerous sources of data, which can be an arduous process if the data comes in different formats. The product builds on Apache Spark, a data analysis platform that can work with multiple streams of data.

To use the service, the user uploads one or more sources of data -- such as a CSV file, spreadsheet or relational database -- or provides an API link to a live data source. The service then offers a number of ways to combine and visualize the data or do some basic mathematical operations, such as finding averages.

In order to ease the ingestion of data from multiple sources, the company built what it calls a data inference and profiling engine, which can make many basic assumptions about how a new set of data should best be formatted. The service also builds a set of metadata, covering aspects such as time, or location.

Zur Startseite