ArchitectureData EngineeringBig DataHadoop

Design and Architecture considerations for storing time series data

Key design considerations for storing time series data, including access pattern analysis, windowed storage strategies, and the trade-offs between granularity and performance.

16 June 2014 · 2 min read

Know Your Access Patterns in Advance

Query scope: Will you analyse a full day of data or just one hour? Documenting the use cases that will drive data access is highly recommended before choosing a storage model.
Granularity: The level of detail required by client applications directly influences the underlying data model.
Ingestion frequency: Identify how fast data is produced by the source system. Are there multiple data points every second?

Windowed Storage

Although we may need to persist all time series data, more often than not we do not need to store each data point as a separate record in the database.

Most time series problems share similar characteristics. The predominant challenges arise when we need to scale the system, and evolving schemas add another dimension of complexity. The problems show similar patterns with only variations in the data model.

If we define a time window and store all readings for that period as an array, we can significantly reduce the number of records persisted in the database, improving overall performance.

Example: Stock tick information is generated once per second for each stock — roughly 86,000 ticks per stock per day. Storing each tick as a separate row makes access time prohibitive, so we can group five minutes, one hour, or one full day of readings into a single vector record.

The benefits of storing information in larger chunks are clear: far fewer lookups into the NoSQL store to fetch data for a specific time range. However, there is a trade-off:

Window too small: Excessive read/write operations.
Window too large: Durability concerns — you risk losing data in the event of a system failure.

You need to balance both forces.

No One-Size-Fits-All

There is no universal template for time series storage. Each problem is different; fine-tune the system based on your requirements and access patterns.

If access patterns change in the future, you may need to re-index or recalculate the array size to optimise your queries. Each time series application is custom-made — you can apply best practices, but you cannot simply import a data modelling template from one problem to another.

References

Architecture Data Engineering Big Data Hadoop

Disclosure: Ideas and analysis are my own. AI assisted with drafting and editing.