These days, most large organizations have a plan for big data integration (see Figure 1), that is, to collect and analyze their big data assets from many sources: For instance, e-commerce businesses have the tools to sort through CRM databases for order logs, customer correspondence, and delivery information, and can pair that data with historical weather records to assess how the temperature impacts when customers are most likely to order certain products, or how changes in weather have historically impacted delivery schedules.
One area of interest for many companies is time-series data, which refers to data related to specific points or durations of time. These include:
- Real-valued data - Rational floating point items, such as time-stamped readings from a sensor.
- Continuous data - Data that can have any value within a range (e.g. events that occurred in the evening).
- Discrete data, such as calendar dates and holidays.
- Discrete symbolic data, such as document indexes and directories.
Time-series data can be analyzed through both structured queries to assess how two discrete data points relate to one another, and through unstructured analysis that aggregate a spectrum of data that doesn’t lend itself to simple parameters.
The Industrial Internet of Things has made it possible to accumulate large volumes of streaming time-series data: Consider, for instance, a seismic data survey for oil and gas exploration. A typical seismic survey can collect as much as 35 petabytes of data, which can be used to develop highly accurate 3D maps. The goal is to gather and analyze this data as quickly and accurately as possible.
Big data, including time-series data, can be analyzed through the Hadoop Distributed File System (HDFS) with a traditional or NoSQL database management system, but because such systems typically rely on batch processing of data—in which data is clustered and processed in a time span ranging from several minutes to hours—this type of system is not ideal for assessing real-time, streaming data.
Instead, it can be beneficial to use an Object Database Management System (ODBMS), which can store data about real-world “objects,” such as physical locations, keeping relationships between data points intact, so that such data does not need to be queried separately for each use.
Using an ODBMS can save up to one-third of the development effort over a traditional or NoSQL database management system, and can also dramatically reduce the storage overheads needed to maintain indices and links between objects.
Learn more about the benefits of an ODBMS for time-series data by downloading our white paper.