Amazon Redshift Architecture and Its Components
Leader Nodes
It keeps all the important information about tables and how the cluster is set up.
It decides the fastest way to run your query and sends the plan to all compute nodes.
It acts as the middleman between your SQL tools and the compute nodes.
It breaks the work into smaller pieces and sends them to the right slices to process.
It reads your SQL query, understands it, and figures out the best way to run it.
It collects the partial results coming from compute nodes and combines them into the final answer.
It makes sure many users can run queries at the same time without conflicts and keeps data consistent.
It stores all important information about tables, schemas, and cluster configuration.
It spreads the work evenly across compute nodes so no node gets overloaded.
It monitors the health of the cluster and handles node failures or issues.
It acts as the single connection point for all SQL tools and applications.
It breaks your SQL query into smaller tasks that can run in parallel.
It controls how data is distributed across nodes — by key, evenly, or fully copied.
It ensures all compute nodes work together smoothly and stay in sync during query execution.
Compute Nodes
Compute nodes store all the actual user data in the Redshift cluster.
They receive tasks from the leader node and process their assigned portion of the data.
Each compute node has its own CPU, memory, and storage to perform heavy workloads.
They run query operations in parallel and send results back to the leader node.
Compute nodes divide their work across slices for even faster processing.
They handle scanning, filtering, joining, and aggregating data locally.
The performance of the cluster increases as more compute nodes are added.
Compute nodes work independently but stay coordinated through the leader node.
They ensure large datasets can be processed quickly through parallel execution.
All user tables and column data physically reside on compute nodes, not on the leader node.
Node Slices
Each compute node is divided into smaller workers called slices.
Every slice gets its own portion of the node’s CPU, memory, and storage.
Slices work independently and in parallel to process data faster.
When data is loaded, rows are spread across slices based on the distribution key.
Each slice handles only its assigned chunk of data, reducing workload per slice.
More slices mean more parallelism, which improves query performance.
The number of slices depends on the node type (larger nodes = more slices).
Slices perform tasks like scanning, filtering, and joining data locally.
They send their partial results back to the compute node, which forwards them to the leader node.
Slices ensure that Redshift uses every bit of hardware inside a compute node efficiently.
Amazon Redshift Architecture:
What Happens When a User Runs a Simple SELECT Query
The leader node receives the SELECT query from the user.
It parses the SQL to understand what the user wants.
It builds a query tree (a logical structure of the query).
The optimizer checks statistics and finds the fastest way to run the query.
It creates the execution plan (the step‑by‑step blueprint).
The execution engine turns the plan into compiled C++ code.
Compute nodes run the code in parallel, each working on its portion of the data.
The leader node combines the results and returns the final output to the user.
No comments:
Post a Comment