About Data Streams

Feb 15, 2024 · 2 min read · aws data ·

What are Data Streams

Data streams refer to continuous flows of data that are generated continuously over time
Data streams are continuous and potentially infinite in nature.
Data streams can originate from various sources such as:
- sensors
- social media feeds
- financial transactions
- website clickstreams
- etc.

Continuous Flow: Data streams are continuous and never-ending
High Volume: Data streams often involve a high volume of data being generated in real-time.
Variety: Data streams can contain diverse types of data:
- structured data (e.g records with fixed format)
- semi-structured (e.g data elements expressed in JSON or XML, but with no strict schema)
- unstructured data (text documents, images, vide, audio etc.)
Velocity: Data streams have a high velocity, meaning that data is generated and needs to be processed rapidly to derive insights in near real-time.
Real-time Processing: Due to the continuous nature of data streams and the need for timely insights, processing and analysis of data streams often occur in real-time or near real-time.
Dynamic: Data streams can be dynamic in nature, with data characteristics such as volume, velocity, and variety potentially changing over time.

A Stream Processing platform (like AWS Kinesis) would partition stream data into Shards.
Each shard receives a sequence of data records that are directed to it by using a Partition Key.
Shards are processes concurrently by various AWS compute services (EC2, AWS Lambda, EKS/ECS, EMR)