Thursday, December 11, 2025

Redis Interview Keypoints

Redis Interview Keypoints


Architecture & Execution

✅ 1. Redis executes commands in a single thread “Redis uses an event loop to process all client requests sequentially, ensuring atomicity without locks.”

✅ 2. Redis uses non‑blocking I/O multiplexing “Redis leverages epoll/kqueue/select to handle thousands of concurrent connections efficiently.”

✅ 3. Redis guarantees atomic operations “Because commands are executed sequentially, Redis ensures atomicity without explicit transactions.”

✅ 4. Redis supports multiple data structures “Strings, lists, sets, sorted sets, hashes, streams, bitmaps, and hyperloglogs are natively supported.”

✅ 5. Redis optimizes small collections with encodings “Structures like lists and hashes use ziplist/intset encodings to save memory.”

✅ 6. Redis supports pub/sub messaging “Clients can publish and subscribe to channels for real‑time message distribution.”

✅ 7. Redis supports Lua scripting “Lua scripts run atomically inside Redis, enabling complex operations without race conditions.”

✅ 8. Redis supports transactions with MULTI/EXEC “Commands queued in MULTI are executed atomically when EXEC is called.”

✅ 9. Redis supports pipelines “Clients can batch multiple commands to reduce round‑trip latency.”

✅ 10. Redis supports streams for event data “Streams provide append‑only log structures with consumer groups for processing.”

Persistence

✅ 11. Redis supports RDB snapshots “RDB creates point‑in‑time dumps of the dataset into dump.rdb.”

✅ 12. Redis supports AOF logging “AOF logs every write operation into appendonly.aof for durability.”

✅ 13. Redis supports hybrid persistence “RDB + AOF can be enabled together for balance between speed and durability.”

✅ 14. Redis forks child processes for persistence “On BGSAVE or BGREWRITEAOF, Redis forks a child to handle disk writes.”

✅ 15. Redis supports configurable snapshot intervals “save 900 1 means snapshot every 900 seconds if at least 1 key changed.”

✅ 16. Redis supports appendfsync policies “Options are always, everysec, or no, balancing durability vs performance.”

✅ 17. Redis supports AOF rewrite “BGREWRITEAOF compacts the log by rewriting it in the background.”

✅ 18. Redis supports truncated AOF recovery “aof-load-truncated yes allows Redis to start even if AOF is incomplete.”

✅ 19. Redis supports stop‑writes on persistence error “stop-writes-on-bgsave-error yes prevents data loss if snapshotting fails.”

✅ 20. Redis persistence files are stored in dir “Both RDB and AOF files are written to the configured data directory.”


Replication

✅ 21. Redis supports master‑replica replication “Replicas asynchronously copy data from the master.”

✅ 22. Redis supports partial resynchronization “Replicas can catch up using replication backlog without full resync.”

✅ 23. Redis supports replica promotion “SLAVEOF NO ONE promotes a replica to master.”

✅ 24. Redis supports replica authentication “masterauth config ensures replicas authenticate to the master.”

✅ 25. Redis supports min‑replicas‑to‑write “Writes only succeed if enough replicas acknowledge them.”

✅ 26. Redis supports min‑replicas‑max‑lag “Defines maximum lag allowed before writes are stopped.”

✅ 27. Redis supports diskless replication “repl-diskless-sync yes streams data directly without temporary files.”

✅ 28. Redis supports replication backlog buffer “Backlog stores recent writes for replicas to catch up after disconnects.”

✅ 29. Redis supports chained replication “Replicas can replicate from other replicas, not just the master.”

✅ 30. Redis supports replica read‑only mode “By default, replicas serve read queries but reject writes.”

High Availability (Sentinel & Cluster)

✅ 31. Redis Sentinel monitors masters “Sentinel detects failures and promotes replicas automatically.”

✅ 32. Redis Sentinel reconfigures clients “Clients are updated with new master info after failover.”

✅ 33. Redis Sentinel supports quorum “Failover requires majority agreement among Sentinels.”

✅ 34. Redis Cluster shards data “Cluster divides 16,384 hash slots across masters.”

✅ 35. Redis Cluster supports replicas per master “Each master has replicas for redundancy.”

✅ 36. Redis Cluster supports automatic rebalancing “Slots can be migrated between nodes to balance load.”

✅ 37. Redis Cluster supports resharding “Keys can be moved between slots during scaling.”

✅ 38. Redis Cluster supports gossip protocol “Nodes exchange state info via gossip messages.”

✅ 39. Redis Cluster supports failover “Replicas are promoted if a master fails.”

✅ 40. Redis Cluster requires full slot coverage “By default, cluster stops serving if slots are missing.”

Memory Management

✅ 41. Redis enforces maxmemory limits “maxmemory sets a hard RAM cap for the dataset.”

✅ 42. Redis supports eviction policies “Options include allkeys-lru, volatile-ttl, and noeviction.”

✅ 43. Redis supports maxmemory‑samples “Defines how many keys are sampled for eviction decisions.”

✅ 44. Redis supports memory fragmentation monitoring “INFO memory shows fragmentation ratio for tuning.”

✅ 45. Redis supports lazy freeing “lazyfree-lazy-eviction frees memory asynchronously.”

✅ 46. Redis supports memory allocator tuning “Redis can use jemalloc or libc malloc for memory management.”

✅ 47. Redis supports memory usage command “MEMORY USAGE key shows memory consumed by a key.”

✅ 48. Redis supports memory stats “MEMORY STATS provides allocator and fragmentation details.”

✅ 49. Redis supports memory doctor “MEMORY DOCTOR suggests fixes for memory fragmentation.”

✅ 50. Redis supports eviction notifications “Clients can subscribe to keyspace events for evictions.”



Performance & Monitoring

✅ 51. Redis supports INFO command “Provides runtime stats: memory, clients, persistence, replication, CPU, keyspace.”

✅ 52. Redis supports SLOWLOG “Captures queries slower than threshold for tuning.”

✅ 53. Redis supports latency monitor “Tracks operations exceeding configured latency thresholds.”

✅ 54. Redis supports MONITOR command “Streams all commands in real time for debugging.”

✅ 55. Redis supports CLIENT LIST “Shows connected clients with IP, state, and flags.”

✅ 56. Redis supports CLIENT KILL “Terminates misbehaving or unauthorized client connections.”

✅ 57. Redis supports CONFIG GET/SET “Allows runtime inspection and modification of config parameters.”

✅ 58. Redis supports DEBUG commands “DEBUG SEGFAULT or OBJECT commands help diagnose issues.”

✅ 59. Redis supports command stats “INFO commandstats shows per‑command call counts and latencies.”

✅ 60. Redis supports keyspace stats “INFO keyspace shows keys per DB and TTL distribution.”


Security

✅ 61. Redis supports requirepass “Enforces password authentication for clients.”

✅ 62. Redis supports ACLs (Redis 6+) “ACLs define users, commands, and key patterns allowed.”

✅ 63. Redis supports ACL WHOAMI “Shows which user is currently authenticated.”

✅ 64. Redis supports ACL LIST “Lists all ACL rules configured.”

✅ 65. Redis supports rename‑command “Disables or renames dangerous commands like FLUSHALL.”

✅ 66. Redis supports protected‑mode “Enabled by default, prevents unsafe external access.”

✅ 67. Redis supports TLS encryption “Redis can encrypt client‑server traffic with TLS.”

✅ 68. Redis supports firewall binding “bind restricts Redis to specific IP addresses.”

✅ 69. Redis supports port configuration “Default port is 6379, configurable via port.”

✅ 70. Redis supports AUTH command “Clients authenticate with password or ACL user credentials.”


Advanced Features

✅ 71. Redis supports modules “Modules extend Redis with custom data types and commands.”

✅ 72. Redis supports GEO commands “Stores and queries geospatial data using sorted sets.”

✅ 73. Redis supports BIT operations “Bitmaps allow efficient storage and manipulation of binary data.”

✅ 74. Redis supports HyperLogLog “Provides approximate cardinality estimation with low memory usage.”

✅ 75. Redis supports Bloom filters via modules “Modules like RedisBloom add probabilistic data structures.”

✅ 76. Redis supports JSON via modules “RedisJSON allows storing and querying JSON documents.”

✅ 77. Redis supports graph data via modules “RedisGraph enables graph queries using Cypher syntax.”

✅ 78. Redis supports time series via modules “RedisTimeSeries provides efficient time series storage and queries.”

✅ 79. Redis supports search via modules “RediSearch adds full‑text search and secondary indexing.”

✅ 80. Redis supports AI inference via modules “RedisAI integrates ML models for in‑database inference.”

Operational Practices


✅ 81. Redis supports online configuration changes “CONFIG SET allows parameters to be updated at runtime without restarting the server.”

✅ 82. Redis supports runtime inspection of configs “CONFIG GET retrieves current configuration values for verification.”

✅ 83. Redis supports persistence checks “INFO persistence shows last save time, AOF status, and background save progress.”

✅ 84. Redis supports backup via file copy “Copying dump.rdb or appendonly.aof provides a consistent backup snapshot.”

✅ 85. Redis supports restore by file placement “Placing RDB/AOF files back in the data directory restores the dataset on restart.”

✅ 86. Redis supports keyspace notifications “Clients can subscribe to events like set, expire, or evict for monitoring.”

✅ 87. Redis supports database selection “SELECT <db> switches between logical databases (default is DB 0).”

✅ 88. Redis supports flushing databases “FLUSHDB clears one DB, FLUSHALL clears all DBs — dangerous in production.”

✅ 89. Redis supports migration commands “MIGRATE moves keys between instances atomically with copy or replace options.”

✅ 90. Redis supports renaming keys “RENAME changes a key’s name; RENAMENX only if new name doesn’t exist.”

Advanced Administration

✅ 91. Redis supports cluster resharding tools “redis-cli --cluster reshard moves slots between nodes during scaling.”

✅ 92. Redis supports cluster health checks “redis-cli --cluster check validates slot coverage and node states.”

✅ 93. Redis supports sentinel failover testing “SENTINEL failover <master> forces a manual failover for testing.”

✅ 94. Redis supports sentinel monitoring commands “SENTINEL masters and SENTINEL slaves list monitored nodes.”

✅ 95. Redis supports client pause “CLIENT PAUSE temporarily blocks clients for controlled failover or maintenance.”

✅ 96. Redis supports command renaming for safety “Critical commands can be renamed or disabled to prevent accidental misuse.”

✅ 97. Redis supports eviction statistics “INFO stats shows key eviction counts for monitoring memory pressure.”

✅ 98. Redis supports key expiration checks “TTL key shows remaining time before a key expires.”

✅ 99. Redis supports cluster slot mapping “CLUSTER KEYSLOT key shows which slot a key belongs to.”

✅ 100. Redis supports cluster key migration “CLUSTER GETKEYSINSLOT retrieves keys in a specific slot for migration.”

Redis DBA Daily Commands

Redis DBA Daily Commands

1. Service Management

sudo systemctl start redis → Start the Redis service if it’s stopped.
sudo systemctl stop redis → Stop Redis safely (use before maintenance).
sudo systemctl restart redis → Restart Redis after config changes.
systemctl status redis → Check if Redis is running and view recent logs.

2. Connectivity & Basic Ops

redis-cli → Open the Redis command-line interface.
redis-cli ping → Quick health check; returns PONG if Redis is alive.
redis-cli SET mykey "hello" → Store a key/value pair.
redis-cli GET mykey → Retrieve the value of a key.

3. Monitoring & Health

redis-cli INFO → Full server stats (memory, clients, persistence, replication).
redis-cli INFO memory → Focused memory usage report.
redis-cli CLIENT LIST → Show all connected clients (IP, state, etc.).
redis-cli MONITOR → Stream all commands in real time (use carefully in prod).
redis-cli SLOWLOG get 10 → Show the 10 most recent slow queries.

4. Persistence & Backup

redis-cli SAVE → Force an immediate RDB snapshot (blocking).
redis-cli BGSAVE → Trigger background snapshot (non-blocking).
redis-cli INFO persistence → Check persistence status (last save, AOF enabled).

Backup files:
dump.rdb → RDB snapshot file.
appendonly.aof → AOF log file.

5. Memory & Key Management

redis-cli DBSIZE → Count how many keys are in the database.
redis-cli KEYS '*' → List all keys (avoid in large DBs, can block).
redis-cli DEL mykey → Delete a specific key.
redis-cli EXPIRE mykey 60 → Set a 60‑second TTL on a key.

6. Replication & High Availability

redis-cli INFO replication → Show replication role (master/replica) and status.
redis-cli SLAVEOF NO ONE → Promote a replica to master.
redis-cli SLAVEOF <master_ip> <master_port> → Make this node a replica of another.

7. Cluster Administration (if enabled)

redis-cli -c CLUSTER NODES → List all cluster nodes and their roles.
redis-cli -c CLUSTER INFO → Cluster health and slot coverage.
redis-cli --cluster check <ip:port> → Validate cluster configuration.

8. Security & Access

redis-cli -a <password> → Authenticate with password.
redis-cli ACL LIST → Show all ACL rules (Redis 6+).
redis-cli ACL WHOAMI → Show which user you’re logged in as.

Redis Daily Health‑Check Runbook

1. Service Status
systemctl status redis
Confirms Redis service is running and shows recent logs.

2. Connectivity
redis-cli ping
Quick test; should return PONG if Redis is alive.

3. Memory & Resource Usage
redis-cli INFO memory
Check total memory used, peak memory, fragmentation ratio.
Watch for memory close to system limits.

4. Persistence
redis-cli INFO persistence
Verify RDB/AOF status, last save time, and whether background saves are succeeding.

5. Replication / HA
redis-cli INFO replication
Shows if this node is master or replica.
Check replica lag and connected replicas.

6. Clients
redis-cli CLIENT LIST | wc -l
Count connected clients.
Useful to detect spikes or stuck connections.

7. Slow Queries
redis-cli SLOWLOG get 5
Review the last 5 slow queries.
Helps identify performance bottlenecks.

8. Keyspace Stats
redis-cli INFO keyspace
Shows number of keys per database and how many have TTLs.
Useful for monitoring growth and expiration behavior.

9. General Server Info
redis-cli INFO server
Check Redis version, uptime, and role.
Confirms you’re running the expected build.

10. Optional Cluster Checks (if enabled)
redis-cli -c CLUSTER INFO
redis-cli -c CLUSTER NODES
Verify cluster health, slot coverage, and node states.

✅ Daily Routine Summary

Service status → Is Redis running?
Ping test → Is it responsive?
Memory check → Is usage healthy?
Persistence check → Are RDB/AOF saves working?
Replication check → Are replicas in sync?
Client count → Any unusual spikes?
Slow queries → Any performance issues?
Keyspace stats → Growth and TTL behavior.
Server info → Version, uptime, role.
Cluster info → Slot coverage and node health (if clustered).

Thursday, December 4, 2025

Kubernetes: Complete Feature Summary for Executive Decision‑Makers

Kubernetes: Complete Feature Summary for Executive Decision‑Makers


Kubernetes is a full‑scale platform that modernizes how applications are deployed, scaled, secured, and operated. It delivers value across eight major capability areas, each directly tied to business outcomes.

1. Reliability & High Availability

Self‑healing containers
Automatic failover
Rolling updates & instant rollbacks
Health checks (liveness/readiness probes)
Multi‑node clustering
ReplicaSets for redundancy
Business impact: Keeps applications online, reduces outages, and improves customer experience.

2. Scalability & Performance

Horizontal Pod Autoscaling (HPA)
Vertical Pod Autoscaling (VPA)
Cluster Autoscaler
Built‑in load balancing
Resource quotas & limits
Business impact: Handles traffic spikes automatically and optimizes resource usage.

3. Security & Compliance

Role‑Based Access Control (RBAC)
Network Policies
Secrets encryption
Pod Security Standards
Image scanning & signing
Namespace isolation
Audit logging
Business impact: Strengthens security posture and supports compliance requirements.

4. Automation & DevOps Enablement

CI/CD integration
GitOps workflows
Automated deployments & rollbacks
Declarative configuration
Infrastructure as Code (IaC)
Business impact: Accelerates delivery, reduces manual errors, and standardizes operations.

5. Environment Standardization

Namespaces for dev/test/prod
Consistent container images
ConfigMaps & Secrets for environment configs
Multi‑OS container support (CentOS, Ubuntu, Debian, etc.)
Business impact: Eliminates “works on my machine” issues and improves developer productivity.

6. Cost Optimization

Efficient bin‑packing
Autoscaling to reduce idle resources
Spot instance support
Multi‑cloud flexibility
High container density
Business impact: Lowers infrastructure costs and prevents over‑provisioning.

7. Multi‑Cloud & Hybrid Cloud Flexibility

Runs on AWS, Azure, GCP, on‑prem, or hybrid
No vendor lock‑in
Disaster recovery across regions
Edge computing support
Business impact: Future‑proofs the organization and enables global deployments.

8. Observability & Monitoring

Metrics (Prometheus, Metrics Server)
Logging (ELK, Loki)
Tracing (Jaeger, OpenTelemetry)
Dashboards (Grafana, Lens)
Business impact: Improves visibility, speeds up troubleshooting, and supports data‑driven decisions.

Wednesday, November 26, 2025

Kubernetes Knowledge Transfer Pack

Kubernetes Knowledge Transfer Pack


1. Namespaces
Definition: Logical partitions in a cluster, used to separate environments or teams.
Commands:
kubectl get namespaces
kubectl create namespace dev-team
kubectl delete namespace dev-team

2. Pods
Definition: Smallest deployable unit in Kubernetes, wraps one or more containers.

Commands:
kubectl get pods
kubectl get pods --all-namespaces
kubectl describe pod <pod-name>
kubectl delete pod <pod-name>

3. Containers
Definition: Actual running processes inside pods (Docker/containerd images).

Commands:
kubectl logs <pod-name> -c <container-name>
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh

4. Deployments
Definition: Controller that manages pods, scaling, and rolling updates.

Commands:
kubectl create deployment nginx-deploy --image=nginx
kubectl scale deployment nginx-deploy --replicas=5
kubectl get deployments
kubectl delete deployment nginx-deploy

5. Services
Definition: Provides stable networking to pods.
Types: ClusterIP, NodePort, LoadBalancer.

Commands:
kubectl expose deployment nginx-deploy --port=80 --target-port=80 --type=ClusterIP
kubectl get svc
kubectl delete svc nginx-deploy

6. ConfigMaps
Definition: Store non‑confidential configuration data.

Commands:
kubectl create configmap app-config --from-literal=ENV=prod
kubectl get configmaps
kubectl describe configmap app-config

7. Secrets
Definition: Store sensitive data (passwords, tokens).

Commands:
kubectl create secret generic db-secret --from-literal=DB_PASSWORD=banking123
kubectl get secrets
kubectl describe secret db-secret

8. Volumes & Storage
Definition: Persistent storage for pods.

Commands:
kubectl get pv
kubectl get pvc --all-namespaces

9. StatefulSets
Definition: Manage stateful apps (databases, Kafka).

Commands:
kubectl apply -f redis-statefulset.yaml
kubectl get statefulsets

10. DaemonSets
Definition: Ensures one pod runs on every node (logging, monitoring).

Commands:
kubectl get daemonsets -n kube-system

11. Jobs & CronJobs
Job: Runs pods until completion.
CronJob: Runs jobs on a schedule.

Commands:
kubectl create job pi --image=perl -- perl -Mbignum=bpi -wle 'print bpi(2000)'
kubectl get jobs
kubectl create cronjob hello --image=busybox --schedule="*/1 * * * *" -- echo "Hello World"
kubectl get cronjobs

12. Ingress
Definition: Manages external HTTP/HTTPS access to services.

Commands:
kubectl apply -f ingress.yaml
kubectl get ingress

🏗 Kubernetes Architecture

Control Plane Components

API Server → Entry point for all requests.

etcd → Cluster state database.

Controller Manager → Ensures desired state.

Scheduler → Assigns pods to nodes.

Node Components

Kubelet → Agent ensuring containers run.

Kube-proxy → Networking rules.

Container Runtime → Runs containers (Docker, containerd).

Add‑ons
CoreDNS → DNS service discovery.

CNI Plugin (Flannel/Calico) → Pod networking.

Metrics Server → Resource monitoring.

📊 Monitoring & Health Commands

kubectl get nodes -o wide
kubectl get pods --all-namespaces -w
kubectl get events --all-namespaces --sort-by=.metadata.creationTimestamp
kubectl top nodes
kubectl top pods
systemctl status kubelet
systemctl status containerd

Start and Stop Kubernetes Services

Start and Stop Kubernetes Services


Proper Stop/Start Cycle

1. Stop services:

sudo systemctl stop kubelet
sudo systemctl stop containerd

2. Verify stopped:

systemctl status kubelet
systemctl status containerd

How to Confirm They’re Really Dow

# Check kubelet process
ps -ef | grep kubelet

# Check container runtime process
ps -ef | grep containerd

# List running containers (if using containerd)
sudo crictl ps

# If using Docker runtime
sudo docker ps

3. Start services again:

sudo systemctl start containerd
sudo systemctl start kubelet

4. Verify Recovery

After a minute or two, check again:

kubectl get nodes
kubectl get pods -n kube-system
kubectl get componentstatuses


How to Build Multi‑Environment Pods with CentOS, Ubuntu, Debian, and More

How to Build Multi‑Environment Pods with CentOS, Ubuntu, Debian, and More


Step 1: Create Namespaces

kubectl create namespace mqmdev
kubectl create namespace mqmtest
kubectl create namespace mqmprod

Step 2: Create Pods (example with 7 OS containers)

Save YAML files (mqmdev.yaml, mqmtest.yaml, mqmprod.yaml) with multiple containers inside each pod.

Example for mqmtest:

apiVersion: v1
kind: Pod
metadata:
name: mqmtest-pod
namespace: mqmtest
spec:
containers:
- name: centos-container
image: centos:7
command: ["/bin/bash", "-c", "sleep infinity"]
- name: redhat-container
image: registry.access.redhat.com/ubi8/ubi
command: ["/bin/bash", "-c", "sleep infinity"]
- name: ubuntu-container
image: ubuntu:22.04
command: ["/bin/bash", "-c", "sleep infinity"]
- name: debian-container
image: debian:stable
command: ["/bin/bash", "-c", "sleep infinity"]
- name: fedora-container
image: fedora:latest
command: ["/bin/bash", "-c", "sleep infinity"]
- name: oraclelinux-container
image: oraclelinux:8
command: ["/bin/bash", "-c", "sleep infinity"]
- name: alpine-container
image: alpine:latest
command: ["/bin/sh", "-c", "sleep infinity"]

Apply:

kubectl apply -f mqmdev.yaml
kubectl apply -f mqmtest.yaml
kubectl apply -f mqmprod.yaml

Step 3: Check Pod Status

kubectl get pods -n mqmdev
kubectl get pods -n mqmtest
kubectl get pods -n mqmprod

Step 4: List Container Names Inside a Pod

kubectl get pod mqmtest-pod -n mqmtest -o jsonpath="{.spec.containers[*].name}"

Output example:

centos-container redhat-container ubuntu-container debian-container fedora-container oraclelinux-container alpine-container

Step 5: Connect to a Particular Container

Use kubectl exec with -c <container-name>:

# CentOS
kubectl exec -it mqmtest-pod -n mqmtest -c centos-container -- /bin/bash

# Red Hat UBI
kubectl exec -it mqmtest-pod -n mqmtest -c redhat-container -- /bin/bash

# Ubuntu
kubectl exec -it mqmtest-pod -n mqmtest -c ubuntu-container -- /bin/bash

# Debian
kubectl exec -it mqmtest-pod -n mqmtest -c debian-container -- /bin/bash

# Fedora
kubectl exec -it mqmtest-pod -n mqmtest -c fedora-container -- /bin/bash

# Oracle Linux
kubectl exec -it mqmtest-pod -n mqmtest -c oraclelinux-container -- /bin/bash

# Alpine (use sh instead of bash)
kubectl exec -it mqmtest-pod -n mqmtest -c alpine-container -- /bin/sh

Step 6: Verify OS Inside Container

Once inside, run:
cat /etc/os-release

This confirms which OS environment you’re connected to.

✅ Summary

Create namespaces → mqmdev, mqmtest, mqmprod.
Apply pod YAMLs with 7 containers each.
Check pod status → kubectl get pods -n <namespace>.
List container names → kubectl get pod <pod> -n <namespace> -o jsonpath=....
Connect to container → kubectl exec -it ... -c <container-name> -- /bin/bash.
Verify OS → cat /etc/os-release.

Nodes in Kubernetes: The Unsung Heroes of Container Orchestration

Nodes in Kubernetes: The Unsung Heroes of Container Orchestration


What is a Node in Kubernetes?
A Node is a worker machine in Kubernetes. It can be a physical server or a virtual machine in the cloud. Nodes are where your pods (and therefore your containers) actually run.

Think of it like this:

Cluster = a team of machines.
Node = one machine in that team.
Pod = a unit of work scheduled onto a node.
Container = the actual application process inside the pod.


🔹 Node Components
Each node runs several critical services:

Kubelet → Agent that talks to the control plane and ensures pods are running.
Container Runtime → Runs containers (Docker, containerd, CRI‑O).
Kube‑proxy → Manages networking rules so pods can communicate with each other and with services.

🔹 Types of Nodes

Control Plane Node → Runs cluster management components (API server, etcd, scheduler, controller manager).
Worker Node → Runs user workloads (pods and containers).

🔹 Example: Checking Nodes

When you run:

[root@centosmqm ~]# kubectl get nodes
NAME        STATUS   ROLES           AGE   VERSION
centosmqm   Ready    control-plane   28h   v1.30.14

Explanation of Output:
NAME → centosmqm → the hostname of your node.
STATUS → Ready → the node is healthy and can accept pods.
ROLES → control-plane → this node is acting as the master/control plane, not a worker.
AGE → 28h → the node has been part of the cluster for 28 hours.
VERSION → v1.30.14 → the Kubernetes version running on this node.

👉 In this example, your cluster currently has one node (centosmqm), and it is the control plane node. If you added worker nodes, they would also appear in this list with roles like <none> or worker.

🔹 Commands to Work with Nodes

# List all nodes
kubectl get nodes

# Detailed info about a node
kubectl describe node centosmqm

# Show nodes with more details (IP, OS, version)
kubectl get nodes -o wide

✅ Summary

A Node is the machine (VM or physical) that runs pods.
Nodes can be control plane (managing the cluster) or worker nodes (running workloads).
Your example shows a single control-plane node named centosmqm, which is healthy and running Kubernetes v1.30.14.

Thursday, November 13, 2025

How to Copy an AMI Across AWS Regions: A Step-by-Step Guide

 How to Copy an AMI Across AWS Regions: A Step-by-Step Guide


Step-by-Step Guide: Copying an AMI to a Different AWS Region:

Step 1: Create an AMI from Your EC2 Instance

Go to the EC2 Dashboard in the AWS Console.
Select your instance → click Actions → Image and templates → Create image.
Enter a name and description.
Choose whether to reboot the instance (select NoReboot to avoid downtime).
Click Create image.

Step 2: Wait for the AMI to Become Available

Navigate to AMIs in the EC2 Dashboard.
Monitor the status until it changes to Available.

Step 3: Copy the AMI to Another Region

In the AMIs section, select your AMI.
Click Actions → Copy AMI.
Choose the destination region.
Optionally rename the AMI and configure encryption.
Click Copy AMI.

Step 4: Switch to the Destination Region

Change your region in the AWS Console to the target region.
Go to AMIs and confirm the copied AMI is listed and available.

Step 5: Launch an EC2 Instance from the Copied AMI

Select the copied AMI → click Launch instance.
Configure instance details, storage, and security groups.
Launch the instance.

Friday, November 7, 2025

Creating EC2 Launch Template for Auto Scaling Web Server

Creating EC2 Launch Template for Auto Scaling Web Server


Step 1: Create a Launch Template

Go to EC2 Dashboard → Launch Templates → Create launch template

Fill in:

Name: WebServerTemplate
AMI: Amazon Linux 2 (or your preferred AMI)
Instance Type: t2.micro/t3.micro (Free Tier Eligible)
Key Pair: Select your key
Security Group: Must allow HTTP (port 80)

User Data:

#!/bin/bash
# Update the package index
sudo su
yum update -y

# Install Apache HTTP server
yum install -y httpd

# Install Stress Command
yum install -y stress

# Start the httpd service
systemctl start httpd

# Optional: Create a simple index.html page
echo "<h1>Hello from EC201</h1>" > /var/www/html/index.html

# Enable httpd to start on boot
chkconfig httpd on

✅ Enable Auto Scaling guidance At the bottom of the form, check the box labeled “Provide guidance for Auto Scaling”. This ensures the template is optimized for use with Auto Scaling Groups.

Click Create launch template

Thursday, October 23, 2025

How to Change the Backup Location in pgBackRest for PostgreSQL 15

How to Change the Backup Location in pgBackRest for PostgreSQL 15


This guide walks you through updating pgBackRest to use a new backup directory — in this case, /u01/pgbackrest.

📁 Step 1: Create the New Backup Directory

mkdir -p /u01/pgbackrest
chown postgres:postgres /u01/pgbackrest
chmod 750 /u01/pgbackrest

⚙️ Step 2: Update pgBackRest Configuration

Edit the config file:

vi /etc/pgbackrest.conf

Update the [global] section to use the new path:

ini

[global]
repo1-path=/u01/pgbackrest
repo1-retention-full=2
repo1-retention-diff=4
log-level-console=info
log-level-file=debug
log-path=/var/log/pgbackrest

[pg15]
pg1-path=/u01/postgres/pgdata_15/data
pg1-port=5432
pg1-user=postgres

🔐 Step 3: Set Permissions on Config File

chmod 640 /etc/pgbackrest.conf
chown postgres:postgres /etc/pgbackrest.conf

🧱 Step 4: Recreate the Stanza

Since the backup location changed, you need to recreate the stanza:
sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 stanza-create

🔍 Step 5: Verify the New Location

Run a check to confirm everything is working:

sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 check

✅ You should see a message like:

Code
INFO: WAL segment ... successfully archived to '/u01/pgbackrest/archive/pg15/...'

💾 Step 6: Run a Test Backup

sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 --type=full backup

📊 Step 7: Confirm Backup Storage

Check that backups are stored in the new location:

ls -l /u01/pgbackrest/pg15/backup

🧹 Optional: Clean Up Old Backup Location

If you no longer need the old backups:

bash

rm -rf /var/lib/pgbackrest

⚠️ Only do this if you're sure the new backups are working and complete.

pgBackRest Setup and PostgreSQL 15 Backup Guide

pgBackRest Setup and PostgreSQL 15 Backup Guide


This guide walks you through installing, configuring, backing up, restoring, and automating PostgreSQL 15 backups using pgBackRest.

Installation

# Add PostgreSQL Yum repo

sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# Disable built-in PostgreSQL module

sudo dnf -qy module disable postgresql

# Install pgBackRest

sudo dnf install -y pgbackrest --nobest


# Verify installation

which pgbackrest

# Output: /usr/bin/pgbackrest

1. Create Backup Directory

mkdir -p /u01/pgbackrest
chown postgres:postgres /u01/pgbackrest
chmod 750 /u01/pgbackrest

2. Configure pgBackRest

Backup and edit the config file:

cp /etc/pgbackrest.conf /etc/pgbackrest.conf_bkp

vi /etc/pgbackrest.conf

[global]
repo1-path=/u01/pgbackrest
repo1-retention-full=2
repo1-retention-diff=4
log-level-console=info
log-level-file=debug
log-path=/var/log/pgbackrest

[pg15]
pg1-path=/u01/postgres/pgdata_15/data
pg1-port=5432
pg1-user=postgres

3. Set Permissions

chmod 640 /etc/pgbackrest.conf
chown postgres:postgres /etc/pgbackrest.conf

4. Confirm PostgreSQL Is Running

ps aux | grep postgres

Look for:

postgres ... -D /u01/postgres/pgdata_15/data

5. Test PostgreSQL Connection

sudo -u postgres psql -h /tmp -p 5432 -d postgres

6. Create Stanza

sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 --log-level-console=info stanza-create

7. Configure WAL Archiving

vi /u01/postgres/pgdata_15/data/postgresql.conf
archive_mode = on
archive_command = 'pgbackrest --stanza=pg15 archive-push %p'

Reload PostgreSQL:

sudo -u postgres pg_ctl reload -D /u01/postgres/pgdata_15/data

8. Verify Stanza and Archiving

sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 --log-level-console=info check

9. Run Full Backup

sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 --type=full backup

10. View Backup Info

sudo -u postgres pgbackrest --stanza=pg15 info

PostgreSQL Test Database Setup

# Log in

sudo -u postgres psql

# Create Database

CREATE DATABASE test_db;

\c test_db

# Create Table

CREATE TABLE employees (id SERIAL PRIMARY KEY, name VARCHAR(50) NOT NULL, position VARCHAR(50),salary NUMERIC(10, 2));

# Insert Sample Data

INSERT INTO employees (name, position, salary) VALUES
('Ayaan Siddiqui', 'Manager', 75000.00),
('Zara Qureshi', 'Sr. Developer', 60000.00),
('Imran Sheikh', 'Jr. Developer', 50000.00);

# Query Table

SELECT * FROM employees;

# Exit

\q

Backup Types

Full Backup:

sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 --type=full backup

Differential Backup:

\c test_db

INSERT INTO employees (name, position, salary) VALUES
('Fatima Ansari', 'DBA', 70000.00),
('Yusuf Khan', 'Sr. Developer', 50000.00),
('Nadia Rahman', 'Jr. Developer', 40000.00);

SELECT * FROM employees;

sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 --type=diff backup

Incremental Backup:

\c test_db

INSERT INTO employees (name, position, salary) VALUES
('Bilal Ahmed', 'DEO', 34000.00),
('Sana Mirza', 'System Admin', 45000.00),
('Tariq Hussain', 'Accountant', 50000.00);

SELECT * FROM employees;

sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 --type=incr backup

Verify Backups:

sudo -u postgres pgbackrest --stanza=pg15 info

Restore Options:

Option 1: Partial Restore

systemctl stop postgresql
sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 --delta restore
systemctl start postgresql
systemctl status postgresql

Option 2: Full Clean Restore

systemctl stop postgresql
mv /var/lib/postgresql/15/main /var/lib/postgresql/15/main_backup_$(date +%F)
mkdir /var/lib/postgresql/15/main
chown -R postgres:postgres /var/lib/postgresql/15/main
chmod 700 /var/lib/postgresql/15/main
sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 restore
systemctl start postgresql
systemctl status postgresql

Verify Restore:

sudo -u postgres psql -c "SELECT datname FROM pg_database;"

tail -f /var/log/pgbackrest/pgbackrest.log

Automate Backups with Cron:

crontab -u postgres -e

Add:

# Full Backup - Sunday 2 AM
0 2 * * 0 pgbackrest --stanza=pg15 --type=full backup

# Incremental Backup - Mon-Sat 2 AM
0 2 * * 1-6 pgbackrest --stanza=pg15 --type=incr backup

Monitor and Test:

Monitor Logs

tail -f /var/log/pgbackrest/pgbackrest.log

journalctl -u postgresql

Test Recovery Plan:

Regularly test backup restoration
Document recovery steps

Troubleshooting:

sudo -u postgres env PGHOST=/tmp pgbackrest --stanza=pg15 check




Sunday, August 31, 2025

Step-by-Step: Sanity Check for Cascading Replication in PostgreSQL

Step-by-Step: Sanity Check for Cascading Replication in PostgreSQL


Cascading replication allows PostgreSQL replicas to stream data not just from the primary, but from other replicas as well. This setup is ideal for scaling read operations and distributing replication load.

In this guide, we’ll walk through a simple sanity check to ensure your 4-node cascading replication setup is working as expected.

🧩 Assumptions
We’re working with a 4-node setup:

Node 1 is the Primary
Node 2 replicates from Node 1
Node 3 replicates from Node 2
Node 4 replicates from Node 3

Replication is already configured and running, and you have access to psql on all nodes.

Step 1: Create reptest Database on Primary (Node 1)

Open a terminal and connect to PostgreSQL:

psql -U postgres

Create the database and switch to it:

sql

CREATE DATABASE reptest;

\c reptest

Step 2: Create a Sample Table

Inside the reptest database, create a simple table:

sql

CREATE TABLE sanity_check (id SERIAL PRIMARY KEY,message TEXT,created_at TIMESTAMP DEFAULT now());

Step 3: Insert Sample Data

Add a few rows to test replication:

sql

INSERT INTO sanity_check (message) VALUES ('Hello from Node 1'), ('Replication test'), ('Cascading is cool!');

Step 4: Wait a Few Seconds

Give replication a moment to propagate. Cascading replication may introduce slight delays, especially for downstream nodes like Node 4.

Step 5: Verify Data on Nodes 2, 3, and 4

On each replica node, connect to the reptest database:
psql -U postgres -d reptest

Then run:
sql
SELECT * FROM sanity_check;

You should see the same rows that were inserted on the primary.

✅ Conclusion

If the data appears correctly on all replicas, your cascading replication setup is working as expected. This simple sanity check can be repeated periodically or automated to ensure ongoing replication health.

PostgreSQL Cascading Replication: Full 4-Node Setup Guide

PostgreSQL Cascading Replication: Full 4-Node Setup Guide

Cascading replication allows PostgreSQL replicas to stream data not only from the primary node but also from other replicas. This setup is ideal for scaling read operations and distributing replication load across multiple servers.

🗂️ Node Architecture and Port Assignment

Node 1 (Primary) — Port 5432 — Accepts read/write operations and streams to Node 2
Node 2 (Standby 1) — Port 5433 — Receives from Node 1 and streams to Node 3.
Node 3 (Standby 2) — Port 5434 — Receives from Node 2 and streams to Node 4.
Node 4 (Standby 3) — Port 5435 — Receives from Node 3.

IP assignments:

Node 1 → 172.19.0.21
Node 2 → 172.19.0.22
Node 3 → 172.19.0.23
Node 4 → 172.19.0.24

⚙️ Prerequisites

PostgreSQL installed on all four nodes (same version recommended).
Network connectivity between nodes.
A replication user (repuser) with REPLICATION privilege created on the primary.
PostgreSQL data directories initialized for each node.

Step 1: Configure the Primary Server (Node 1, Port 5432)

🔹 Modify postgresql.conf

listen_addresses = '*'
port = 5432
wal_level = replica
max_wal_senders = 10
wal_keep_size = 512MB
hot_standby = on

🔹 Update pg_hba.conf to allow replication from Node 2

Generic:
host replication repuser <Node2_IP>/32 trust

Actual:
host replication repuser 172.19.0.22/32 trust

🔹 Reload PostgreSQL Configuration

pg_ctl reload
Or from SQL:
sql
SELECT pg_reload_conf();

🔹 Start Node 1

pg_ctl -D /path/to/node1_data -o '-p 5432' -l logfile start

Actual:
pg_ctl -D /u01/postgres/pgdata_15/data -o '-p 5432' -l logfile start

🔹 Create Replication User

sql
CREATE USER repuser WITH REPLICATION ENCRYPTED PASSWORD 'your_password';

🔹 Verify Node 1 is Not in Recovery

sql
SELECT pg_is_in_recovery();

Step 2: Set Up Standby 1 (Node 2, Port 5433)

🔹 Take a Base Backup from Node 1

Generic:
pg_basebackup -h <Node1_IP> -p 5432 -D /path/to/node2_data -U repuser -Fp -Xs -P -R

Actual:
pg_basebackup -h 172.19.0.21 -p 5432 -D /u01/postgres/pgdata_15/data -U repuser -Fp -Xs -P -R

If you get an error:
pg_basebackup: error: directory "/u01/postgres/pgdata_15/data" exists but is not empty

Clear the directory:
rm -rf /u01/postgres/pgdata_15/data/*

🔹 Modify postgresql.conf on Node 2

listen_addresses = '*'
port = 5433
hot_standby = on
primary_conninfo = 'host=172.19.0.21 port=5432 user=repuser password=your_password application_name=node2'

🔹 Start Node 2

Generic:
pg_ctl -D /path/to/node2_data -o '-p 5433' -l logfile start

Actual:
pg_ctl -D /u01/postgres/pgdata_15/data -o '-p 5433' -l logfile start

🔹 Verify Replication on Node 1

sql
SELECT * FROM pg_stat_replication WHERE application_name = 'node2';

Step 3: Set Up Standby 2 (Node 3, Port 5434)

🔹 Allow Replication from Node 3 in Node 2’s pg_hba.conf

Generic:
host replication repuser <Node3_IP>/32 trust

Actual:
host replication repuser 172.19.0.23/32 trust

🔹 Reload Node 2

pg_ctl -D /u01/postgres/pgdata_15/data reload

🔹 Take a Base Backup from Node 2

Generic:
pg_basebackup -h <Node2_IP> -p 5433 -D /path/to/node3_data -U repuser -Fp -Xs -P -R

Actual:
pg_basebackup -h 172.19.0.22 -p 5433 -D /u01/postgres/pgdata_15/data -U repuser -Fp -Xs -P -R

🔹 Modify postgresql.conf on Node 3

listen_addresses = '*'
port = 5434
hot_standby = on
primary_conninfo = 'host=172.19.0.22 port=5433 user=repuser password=your_password application_name=node3'

🔹 Start Node 3

Generic:
pg_ctl -D /path/to/node3_data -o '-p 5434' -l logfile start

Actual:
pg_ctl -D /u01/postgres/pgdata_15/data -o '-p 5434' -l logfile start

🔹 Verify Replication on Node 2

sql
SELECT * FROM pg_stat_replication;

Step 4: Set Up Standby 3 (Node 4, Port 5435)

🔹 Allow Replication from Node 4 in Node 3’s pg_hba.conf

Generic:
host replication repuser <Node4_IP>/32 trust

Actual:
host replication repuser 172.19.0.24/32 trust

🔹 Reload Node 3

pg_ctl -D /u01/postgres/pgdata_15/data reload

🔹 Take a Base Backup from Node 3

Generic:
pg_basebackup -h <Node3_IP> -p 5434 -D /path/to/node4_data -U repuser -Fp -Xs -P -R

Actual:
pg_basebackup -h 172.19.0.23 -p 5434 -D /u01/postgres/pgdata_15/data -U repuser -Fp -Xs -P -R

🔹 Modify postgresql.conf on Node 4

listen_addresses = '*'
port = 5435
hot_standby = on
primary_conninfo = 'host=172.19.0.23 port=5434 user=repuser password=your_password application_name=node4'

🔹 Start Node 4

Generic:
pg_ctl -D /path/to/node4_data -o '-p 5435' -l logfile start

Actual:
pg_ctl -D /u01/postgres/pgdata_15/data -o '-p 5435' -l logfile start

🔹 Verify Replication on Node 3

sql
SELECT * FROM pg_stat_replication;

✅ Final Verification & Monitoring

🔹 Check Replication Status

On Node 1:
sql
SELECT * FROM pg_stat_replication;

On Nodes 2 and 3:
sql

SELECT * FROM pg_stat_wal_receiver;
SELECT * FROM pg_stat_replication;

On Node 4:

sql
SELECT pg_is_in_recovery();
SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS byte_lag;

🔹 Confirm Listening Ports

bash
sudo netstat -plnt | grep postgres
You should see ports 5432, 5433, 5434, and 5435 active.

🔹 Monitor WAL Activity

sql
SELECT * FROM pg_stat_wal;

🔹 Measure Replication Lag

On Primary:
sql

SELECT application_name, client_addr,pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS byte_lag FROM pg_stat_replication;

Tuesday, August 26, 2025

The Docker Copy Machine: How to Clone a Container and Save Hours of Setup Time

The Docker Copy Machine: How to Clone a Container and Save Hours of Setup Time


Have you ever spent precious time meticulously setting up a Docker container—installing packages, configuring software, and tuning the system—only to need an identical copy and dread starting from scratch?

If you’ve found yourself in this situation, there's a faster way. Docker provides a straightforward method to "clone" or "template" a fully configured container. This is perfect for creating development, staging, and testing environments that are perfectly identical, or for rapidly scaling out services like PostgreSQL without reinstalling everything each time.

In this guide, I’ll show you the simple, practical commands to create a master copy of your container and spin up duplicates in minutes, not hours.

The Problem: Why Not Just Use a Dockerfile?

A Dockerfile is fantastic for reproducible, version-controlled builds. But sometimes, you’ve already done the work inside a running container. You've installed software, run complex configuration scripts, or set up a database schema. Manually translating every step back into a Dockerfile can be tedious and error-prone.

The solution? Treat your perfectly configured container as a gold master image.

Our Practical, Command-Line Approach

Let's walk through the process. Imagine your initial container, named BOX01, is a CentOS system with PostgreSQL already installed and configured.

Step 1: Create Your Master Image with docker commit

The magic command is docker commit. It takes a running (or stopped) container and saves its current state—all the changes made to the filesystem—as a new, reusable Docker image.

# Format: docker commit [CONTAINER_NAME] [NEW_IMAGE_NAME]

docker commit BOX01 centos-postgres-master

What this does: It packages up everything inside BOX01 (except the volumes, more on that later) and creates a new local Docker image named centos-postgres-master.

Verify it worked:

docker images

You should see centos-postgres-master listed among your images.

Step 2: Spin Up Your Clones

Now, you can create as many copies as you need from your new master image. The key is to give each new container a unique name, hostname, and volume to avoid conflicts.

# Create your first clone

docker run -it -d --privileged=true --restart=always --network MQMNTRK -v my-vol102:/u01 --ip 172.19.0.21 -h HOST02 --name BOX02 centos-postgres-master /usr/sbin/init

# Create a second clone

docker run -it -d --privileged=true --restart=always --network MQMNTRK -v my-vol103:/u01 --ip 172.19.0.22 -h HOST03 --name BOX03 centos-postgres-master /usr/sbin/init

Repeat this pattern for as many copies as you need (BOX04, HOST04, etc.). In under a minute, you'll have multiple identical containers running.

⚠️ Important Consideration: Handling Data Volumes

This is the most critical caveat. The docker commit command does not capture data stored in named volumes (like my-vol101).

The Good: Your PostgreSQL software and configuration are baked into the image.

The Gotcha: The actual database data lives in the volume, which is separate.

Your options for data:

Fresh Data: Use a new, empty volume for each clone (as shown above). This is ideal for creating new, independent database instances.

Copy Data: If you need to duplicate the data from the original container, you must copy the volume contents. Here's a quick way to do it:

# Backup data from the original volume (my-vol101)

docker run --rm -v my-vol101:/source -v $(pwd):/backup alpine tar cvf /backup/backup.tar -C /source .

# Restore that data into a new volume (e.g., my-vol102)

docker run --rm -v my-vol102:/target -v $(pwd):/backup alpine bash -c "tar xvf /backup/backup.tar -C /target"

When Should You Use This Method?

Rapid Prototyping: Quickly duplicate a complex environment for testing.
Legacy or Complex Setups: When the installation process is complicated and not easily scripted in a Dockerfile.
Learning and Experimentation: Easily roll back to a known good state by committing before you try something risky.

When Should You Stick to a Dockerfile?

Production Builds: Dockerfiles are transparent, reproducible, and version-controlled.
CI/CD Pipelines: Automated builds from a Dockerfile are a standard best practice.
When Size Matters: Each commit can create a large image layer. Dockerfiles can be optimized to produce smaller images.

Conclusion

The docker commit command is Docker's built-in "copy machine." It’s an incredibly useful tool for developers who need to work quickly and avoid repetitive setup tasks. While it doesn't replace the need for well-defined Dockerfiles in a production workflow, it solves a very real problem for day-to-day development and testing.

So next time you find yourself with a perfectly configured container, don't rebuild—clone it!

Next Steps: Try creating your master image and a clone. For an even more manageable setup, look into writing a simple docker-compose.yml file to define all your container parameters in one place.

Thursday, August 21, 2025

Simplifying PostgreSQL Restarts: How to Choose the Right Mode for Your Needs

Simplifying PostgreSQL Restarts: How to Choose the Right Mode for Your Needs


Three Restart Modes in PostgreSQL

1. smart mode (-m smart) - Gentle & Safe

Waits patiently for all clients to disconnect on their own
Completes all transactions normally before shutting down
No data loss - everything commits properly
Best for production during active hours
Slowest method but most graceful

2. fast mode (-m fast) - Quick & Controlled

Rolls back active transactions immediately
Disconnects all clients abruptly but safely
Restarts quickly without waiting
No data corruption - maintains database integrity
Perfect for maintenance windows
Medium risk - some transactions get aborted

3. immediate mode (-m immediate) - Emergency Only

Forceful shutdown like a crash
No waiting - kills everything immediately
May require recovery on next startup
Risk of temporary inconsistencies
Only for emergencies when nothing else works
PostgreSQL will automatically recover using WAL logs

💡 Key Points to Remember:

All three modes are safe for the database itself
Data integrity is maintained through WAL (Write-Ahead Logging)
Choose based on your situation: gentle vs fast vs emergency
Fast mode is completely normal for planned maintenance
Always warn users before using fast or immediate modes

🚀 When to Use Each:

Use smart mode when:
Database is in production use
You can afford to wait
No transactions should be interrupted

Use fast mode when:
You need to restart quickly
During maintenance windows
Some transaction rollback is acceptable

Use immediate mode when:
Database is unresponsive
Emergency situations only
You have no other choice

All modes will bring your database back safely - they just differ in how gently they treat connected clients and active transactions!

Step-by-Step: Converting Postgre Sync Replication to Async in Production

Step-by-Step: Converting Postgre Sync Replication to Async in Production


Simple Steps

📋 Prerequisites

Postgre 9.1 or higher.
Superuser access to the primary database.
Existing replication setup.

🚀 Quick Change Steps (No Restart Required)

Step 1: Connect to Primary Database
p -U postgres -h your-primary-server

Step 2: Check Current Sync Status
SELECT name, setting FROM pg_settings WHERE name IN ('synchronous_commit', 'synchronous_standby_names');

Step 3: Change to Asynchronous Mode
-- Disable synchronous commit
ALTER SYSTEM SET synchronous_commit = 'off';

-- Clear synchronous standby names
ALTER SYSTEM SET synchronous_standby_names = '';

-- Reload configuration (no restart needed)

SELECT pg_reload_conf();

Step 4: Verify the Change
-- Confirm settings changed

SELECT name, setting FROM pg_settings WHERE name IN ('synchronous_commit', 'synchronous_standby_names');

-- Check replication status (should show 'async')

SELECT application_name, sync_state, state FROM pg_stat_replication;

📝 Configuration File Method (Alternative)

Edit postgre.conf on Primary:
sudo nano /etc/postgre/14/main/postgre.conf
Change these lines:
conf
synchronous_commit = off
synchronous_standby_names = ''

Reload Configuration:
sudo systemctl reload postgres
# or
sudo pg_ctl reload

🧪 Testing the Change

Test 1: Performance Check
-- Time an insert operation
\timing on
INSERT INTO test_table (data) VALUES ('async test');
\timing off

Test 2: Verify Async Operation
-- Check replication lag
SELECT application_name, sync_state, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS bytes_lag FROM pg_stat_replication;

Test 3: Data Consistency Check

# On primary:
p -c "SELECT COUNT(*) FROM your_table;"

# On standby:
p -h standby-server -c "SELECT COUNT(*) FROM your_table;"

⚠️ Important Considerations

Before Changing:

Understand the risks: Async replication may cause data loss during failover
Check business requirements: Ensure async meets your data consistency needs
Monitor performance: Async should improve write performance but may increase replication lag

After Changing:

Monitor replication lag regularly
Set up alerts for significant lag increases
Test failover procedures to understand potential data loss scenarios
Review application behavior - some apps may need adjustments

🔄 Reverting to Synchronous Mode

If you need to switch back:

ALTER SYSTEM SET synchronous_commit = 'on';
ALTER SYSTEM SET synchronous_standby_names = 'your_standby_name';

SELECT pg_reload_conf();

🎯 When to Use This Change

Good candidates for Async:

Read-heavy workloads where write performance matters
Non-critical data that can tolerate minor data loss
High-volume logging or analytics data
Geographically distributed systems with high latency

Keep Synchronous For:

Financial transactions.
User account data.
Critical business operations.
Systems requiring zero data loss.

💡 Pro Tips

Use a hybrid approach: Some systems support both sync and async simultaneously.
Monitor continuously: Use tools like pg_stat_replication to watch lag times.
Test thoroughly: Always test replication changes in a staging environment first.
Document the change: Keep records of why and when you changed replication modes.
The change from sync to async takes effect immediately and requires no downtime, making it a relatively low-risk operation that can significantly improve write performance for appropriate use cases.

The Real-Time Database Dilemma: When to Use Synchronous vs Asynchronous Replication in PostgreSQL

The Real-Time Database Dilemma: When to Use Synchronous vs Asynchronous Replication in PostgreSQL


Synchronous Replication

✅ Benefits:

Zero Data Loss: Guarantees that data is safely written to both primary and standby before confirming success.
Strong Consistency: All servers have exactly the same data at all times.
Automatic Failover: Standby servers are always up-to-date and ready to take over instantly.
Perfect for Critical Data: Ideal for financial transactions, user accounts, and mission-critical information.

❌ Drawbacks:

Slower Performance: Write operations wait for standby confirmation, increasing latency.
Availability Risk: If standby goes down, primary may become unavailable or slow.
Higher Resource Usage: Requires more network bandwidth and standby server resources.
Complex Setup: More configuration and maintenance required.

⚡ Asynchronous Replication

✅ Benefits:

Faster Performance: Write operations complete immediately without waiting for standby.
Better Availability: Primary continues working even if standby servers disconnect.
Long-Distance Friendly: Works well across geographical distances with higher latency.
Simpler Management: Easier to configure and maintain.

❌ Drawbacks:

Potential Data Loss: Recent transactions may be lost if primary fails before replication completes.
Data Lag: Standby servers might be slightly behind the primary.
Weaker Consistency: Temporary mismatches between primary and standby possible.
Failover Complexity: May require manual intervention to ensure data consistency.

🏭 Production Environment Preferences

📊 For Most Production Systems:

Mixed Approach: Use synchronous for critical data, asynchronous for less critical data.
Tiered Architecture: Synchronous for primary standby, asynchronous for additional read replicas.
Monitoring Essential: Track replication lag and performance metrics continuously.

⚡ For Real-Time Systems:

Synchronous Preferred: When data consistency is more important than raw speed.
Financial Systems: Banking, payments, and transactions require synchronous replication.
E-commerce: Order processing and inventory management often need synchronous guarantees.

🚀 When to Choose Async:

High-Write Systems: Applications with heavy write workloads (logging, analytics).
Read-Heavy Applications: When you need multiple read replicas for performance.
Geographic Distribution: Systems spanning multiple regions with higher latency.
Non-Critical Data: Caching, session data, or temporary information.

🎯 Key Recommendations

Start with Async for most applications, then move to sync for critical components.
Monitor Replication Lag constantly - anything beyond few seconds needs attention.
Test Failover Regularly - ensure your standby can take over when needed.
Use Connection Pooling to help manage the performance impact of synchronous replication.
Consider Hybrid Approaches - some databases support both modes simultaneously.

🔍 Real-World Example

Social Media Platform:

Synchronous: User accounts, direct messages, financial transactions.
Asynchronous: Activity feeds, notifications, analytics, likes.

E-commerce Site:

Synchronous: Orders, payments, inventory updates.
Asynchronous: Product recommendations, user reviews, search indexing.

Monday, August 18, 2025

Step-by-Step Guide to PostgreSQL Streaming Replication (Primary-Standby Setup)

Step-by-Step Guide to PostgreSQL Streaming Replication (Primary-Standby Setup)


Here's the properly organized step-by-step guide for setting up PostgreSQL 12/15 streaming replication between Primary (172.19.0.20) and Standby (172.19.0.30):


✅ Pre-Requisites

Two servers with PostgreSQL 12/15.3 installed
Database cluster initialized on both using initdb (service not started yet)

IP configuration:

Primary: 172.19.0.20
Standby: 172.19.0.30

🔧 Primary Server Configuration

Backup config file:

bash

cd /u01/postgres/pgdata_15/data
cp postgresql.conf postgresql.conf_bkp_$(date +%b%d_%Y)


Edit postgresql.conf:

bash

vi postgresql.conf

Set these parameters:

listen_addresses = '*'
port = 5432
wal_level = replica
archive_mode = on
archive_command = 'cp %p /u01/postgres/pgdata_15/data/archive_logs/%f'
max_wal_senders = 10
wal_keep_segments = 50

Start PostgreSQL:

bash

/u01/postgres/pgdata_15/bin/pg_ctl -D /u01/postgres/pgdata_15/data -l logfile start

Create replication user:

sql

CREATE USER repl_user WITH REPLICATION ENCRYPTED PASSWORD 'strongpassword';

Configure pg_hba.conf:

bash

cp pg_hba.conf pg_hba.conf_bkp_$(date +%b%d_%Y)

vi pg_hba.conf

Add:
host replication repl_user 172.19.0.30/32 md5

Reload config:

bash

/u01/postgres/pgdata_15/bin/pg_ctl -D /u01/postgres/pgdata_15/data -l logfile reload

🔄 Standby Server Configuration

Clear data directory:

bash

rm -rf /u01/postgres/pgdata_15/data/*

Run pg_basebackup:

bash

pg_basebackup -h 172.19.0.20 -U repl_user -p 5432 -D /u01/postgres/pgdata_15/data -Fp -Xs -P -R

Start PostgreSQL:

bash

pg_ctl -D /u01/postgres/pgdata_15/data -l logfile start

Verify logs:

bash

tail -f /u01/postgres/pgdata_15/data/logfile

Look for:

text

started streaming WAL from primary at 0/3000000 on timeline 1

🔍 Verification Steps

On Primary:

sql

CREATE DATABASE reptest;

\c reptest

CREATE TABLE test_table (id int);

INSERT INTO test_table VALUES (1);

On Standby:

sql

\c reptest

SELECT * FROM test_table; -- Should return 1 row

Check processes:

Primary:

bash

ps -ef | grep postgres

Look for walsender process

Standby:

Look for walreceiver process

Network test:

bash

nc -zv 172.19.0.20 5432


📌 Key Notes

-Replace paths/versions according to your setup
-Ensure firewall allows port 5432 between servers
For Oracle Linux 6.8, adjust service commands if needed
-Monitor logs (logfile) on both servers for errors


Detailed steps:

✅ Pre-Requisites

Two servers with PostgreSQL 12 or PostgreSQL 15.3 installed.
Database cluster initialized on both using initdb (but do not start the service yet).

IPs configured:

Primary: 172.19.0.20
Standby: 172.19.0.30

Make some necessary changes in postgresql.conf file on the primary server.

[postgres@HOST01 data]$ pwd

/u01/postgres/pgdata_15/data

[postgres@HOST01 data]$ cp postgresql.conf postgresql.conf_bkp_Aug18_2025

[postgres@HOST01 data]$ vi postgresql.conf


listen_addresses = '*'
port = 5432
wal_level = hot_standby
archive_mode = on
archive_command = 'cp %p /u01/postgres/pgdata_15/data/archive_logs/%f'
max_wal_senders = 10

After making necessary changes in the postgresql.conf file. Let’s start the database server on primary server using below command.

[postgres@HOST01 data]$ /u01/postgres/pgdata_15/bin/pg_ctl -D /u01/postgres/pgdata_15/data -l logfile start

Once the database server is started, invoke psql as super user i.e. postgres and create a role for replication.

CREATE USER repl_user WITH REPLICATION ENCRYPTED PASSWORD 'strongpassword';

Allow connection to created user from Standby Server by seting pg_hba.conf

[postgres@HOST01 data]$ pwd

/u01/postgres/pgdata_15/data

[postgres@HOST01 data]$ ls

archive_logs pg_commit_ts pg_ident.conf pg_replslot pg_stat_tmp PG_VERSION postgresql.conf

base pg_dynshmem pg_logical pg_serial pg_subtrans pg_wal postgresql.conf_old

global pg_hba.conf pg_multixact pg_snapshots pg_tblspc pg_xact postmaster.opts

logfile pg_hba.conf_bkp_Aug18_2025 pg_notify pg_stat pg_twophase postgresql.auto.conf postmaster.pid


[postgres@HOST01 data]$ cp pg_hba.conf pg_hba.conf_bkp_Aug18_2025

[postgres@HOST01 data]$ vi pg_hba.conf

host replication repl_user 172.19.0.30/32 md5

Reload the setting of the cluster.

[postgres@HOST01 data]$ /u01/postgres/pgdata_15/bin/pg_ctl -D /u01/postgres/pgdata_15/data -l logfile stop

[postgres@HOST01 data]$ /u01/postgres/pgdata_15/bin/pg_ctl -D /u01/postgres/pgdata_15/data -l logfile start

(OR)

[postgres@HOST01 data]$ /u01/postgres/pgdata_15/bin/pg_ctl -D /u01/postgres/pgdata_15/data -l logfile reload

We made changes to the configuration of the primary server. Now we’ll use pg_basebackup to get the data directory i.e. /u01/postgres/pgdata_15/data to Standby Server.

Standby Server Configurations.

[postgres@HOST02 ~]$ pg_basebackup -h 172.19.0.20 -U repl_user -p 5432 -D /u01/postgres/pgdata_15/data -Fp -Xs -P -R

Password:

pg_basebackup: error: directory "/u01/postgres/pgdata_15/data" exists but is not empty

Navigate to data directory and empty it.

cd /u01/postgres/pgdata_15/data

rm -rf *

[postgres@HOST02 ~]$ pg_basebackup -h 172.19.0.20 -U repl_user -p 5432 -D /u01/postgres/pgdata_15/data -Fp -Xs -P -R

Password:

38761/38761 kB (100%), 1/1 tablespace

Now start the database server at standby side using below command.


[postgres@HOST02 ~]$ pg_ctl -D /u01/postgres/pgdata_15/data -l logfile start

waiting for server to start.... done

server started

Check the logfile to gather some more information.

[postgres@HOST02 data]$ tail -f logfile

2025-08-18 21:12:09.487 UTC [11349] LOG: listening on IPv4 address "0.0.0.0", port 5432
2025-08-18 21:12:09.487 UTC [11349] LOG: listening on IPv6 address "::", port 5432
2025-08-18 21:12:09.491 UTC [11349] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-08-18 21:12:09.498 UTC [11352] LOG: database system was interrupted; last known up at 2025-08-18 21:11:13 UTC
2025-08-18 21:12:09.513 UTC [11352] LOG: entering standby mode
2025-08-18 21:12:09.518 UTC [11352] LOG: redo starts at 0/2000028
2025-08-18 21:12:09.519 UTC [11352] LOG: consistent recovery state reached at 0/2000100
2025-08-18 21:12:09.520 UTC [11349] LOG: database system is ready to accept read-only connections
2025-08-18 21:12:09.557 UTC [11353] LOG: started streaming WAL from primary at 0/3000000 on timeline 1

Let’s now check the replication. Invoke psql and perform some operations.

Primary:

create database qadar_reptest;
\c qadar_reptest;
create table mqm_reptst_table (serial int);
insert into mqm_reptst_table values (1);

Standby:

\c qadar_reptest;
select count(*) from mqm_reptst_table;

Check the background process involved in streaming replication:

Primary:

[postgres@HOST01 data]$ ps -ef|grep postgres

root 11329 160 0 19:40 pts/1 00:00:00 su - postgres
postgres 11330 11329 0 19:40 pts/1 00:00:04 -bash
root 11608 11496 0 21:07 pts/2 00:00:00 su - postgres
postgres 11609 11608 0 21:07 pts/2 00:00:00 -bash
postgres 11635 11609 0 21:07 pts/2 00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=tail /usr/bin/tail -f logfile
postgres 11639 1 0 21:08 ? 00:00:00 /u01/postgres/pgdata_15/bin/postgres -D /u01/postgres/pgdata_15/data
postgres 11640 11639 0 21:08 ? 00:00:00 postgres: checkpointer
postgres 11641 11639 0 21:08 ? 00:00:00 postgres: background writer
postgres 11643 11639 0 21:08 ? 00:00:00 postgres: walwriter
postgres 11644 11639 0 21:08 ? 00:00:00 postgres: autovacuum launcher
postgres 11645 11639 0 21:08 ? 00:00:00 postgres: archiver last was 000000010000000000000002.00000028.backup
postgres 11646 11639 0 21:08 ? 00:00:00 postgres: logical replication launcher
postgres 11663 11639 0 21:12 ? 00:00:00 postgres: walsender repl_user 172.19.0.30(38248) streaming 0/3000060
postgres 11671 11330 0 21:15 pts/1 00:00:00 ps -ef
postgres 11672 11330 0 21:15 pts/1 00:00:00 grep --color=auto postgres

Standby:

[postgres@HOST02 ~]$ ps -ef|grep postgres

root 11300 119 0 20:59 pts/1 00:00:00 su - postgres
postgres 11301 11300 0 20:59 pts/1 00:00:00 -bash
postgres 11349 1 0 21:12 ? 00:00:00 /u01/postgres/pgdata_15/bin/postgres -D /u01/postgres/pgdata_15/data
postgres 11350 11349 0 21:12 ? 00:00:00 postgres: checkpointer
postgres 11351 11349 0 21:12 ? 00:00:00 postgres: background writer
postgres 11352 11349 0 21:12 ? 00:00:00 postgres: startup recovering 000000010000000000000003
postgres 11353 11349 0 21:12 ? 00:00:00 postgres: walreceiver streaming 0/3000148
postgres 11356 11301 0 21:16 pts/1 00:00:00 ps -ef
postgres 11357 11301 0 21:16 pts/1 00:00:00 grep --color=auto postgres


Check the network connectivity betweeen the servers.

Standby:

[postgres@HOST02 ~] yum install nc
[postgres@HOST02 ~]$ nc -zv 172.19.0.20 5432
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 172.19.0.20:5432.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.

ERROR with "Failed to set locale, defaulting to C" on Centos at the docker environment (yum install)

ERROR with "Failed to set locale, defaulting to C" on Centos at the docker environment (yum install)


Fix:

As root user, then do steps:

sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-*
sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*

yum update -y

Thursday, May 15, 2025

Top 30 Oracle Performance Issues and Fixes for Banking Experts

Top 30 Oracle Performance Issues and Fixes for Banking Experts


Category 1: Transaction and Locking Issues

1. Long-Running Transactions
Short Description: A batch update on a customer table takes hours, blocking users.
Impact: Delays and timeouts for other users.
Solution: Optimize the transaction and reduce locking.

Steps:
Check V$SESSION_LONGOPS: SELECT SID, OPNAME, SOFAR, TOTALWORK FROM V$SESSION_LONGOPS WHERE OPNAME = 'Transaction';
Find blocking sessions: SELECT SID, BLOCKING_SESSION FROM V$SESSION WHERE STATUS = 'ACTIVE';
Analyze SQL: EXPLAIN PLAN FOR <your_sql>; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Add index: CREATE INDEX idx_cust_id ON customer(cust_id);
Use batches: UPDATE customer SET balance = balance + 100 WHERE cust_id BETWEEN 1 AND 1000; COMMIT;
Monitor: V$SESSION_LONGOPS.

2. Enqueue Waits
Short Description: TX enqueue waits from row-level lock contention.
Impact: Delays in committing transactions.
Solution: Resolve locking conflicts.

Steps:
Identify waits: SELECT EVENT, P1, P2 FROM V$SESSION_WAIT WHERE EVENT LIKE 'enq: TX%';
Find blockers: SELECT SID, BLOCKING_SESSION FROM V$SESSION WHERE BLOCKING_SESSION IS NOT NULL;
Review SQL: SELECT SQL_TEXT FROM V$SQL WHERE SQL_ID IN (SELECT SQL_ID FROM V$SESSION WHERE SID = <blocking_sid>);
Kill session if safe: ALTER SYSTEM KILL SESSION '<sid>,<serial#>' IMMEDIATE;
Optimize app: Commit more often.
Monitor: V$ENQUEUE_STAT.

3. Deadlock Situations
Short Description: Deadlocks during concurrent updates.
Impact: Transaction failures for users.
Solution: Resolve deadlocks.

Steps:
Check alert log: SELECT MESSAGE_TEXT FROM V$DIAG_ALERT_EXT WHERE MESSAGE_TEXT LIKE '%deadlock%';
Trace session: ALTER SESSION SET EVENTS '10046 trace name context forever, level 12';
Identify SQL: SELECT SQL_TEXT FROM V$SQL WHERE SQL_ID IN (SELECT SQL_ID FROM V$SESSION WHERE SID = <sid>);
Redesign app: Update in consistent order.
Add retry logic in code.
Monitor: V$SESSION.

Category 2: Resource Contention Issues

4. High CPU Utilization
Short Description: CPU spikes to 95% from inefficient SQL.
Impact: Slows all operations.
Solution: Tune high-CPU SQL.

Steps:
Find top users: SELECT SID, USERNAME, VALUE/100 AS CPU_SECONDS FROM V$SESSTAT WHERE STATISTIC# = (SELECT STATISTIC# FROM V$STATNAME WHERE NAME = 'CPU used by this session') ORDER BY VALUE DESC;
Get SQL: SELECT SQL_TEXT FROM V$SQL WHERE SQL_ID IN (SELECT SQL_ID FROM V$SESSION WHERE SID = <sid>);
Check plan: EXPLAIN PLAN FOR <sql>; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Add index or rewrite SQL.
Use Resource Manager: BEGIN DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE('PLAN1', 'GROUP1', 'Limit CPU', CPU_P1 => 50); END;
Monitor: V$SYSSTAT.

5. I/O Bottlenecks
Short Description: Slow db file sequential read due to disk delays.
Impact: Slow query responses.
Solution: Optimize I/O distribution.

Steps:
Check waits: SELECT EVENT, TOTAL_WAITS, TIME_WAITED FROM V$SYSTEM_EVENT WHERE EVENT LIKE 'db file%read';
Find hot files: SELECT FILE#, READS, WRITES FROM V$FILESTAT;
Move to SSD: ALTER TABLESPACE data MOVE DATAFILE 'old_path' TO '/ssd_path/data01.dbf';
Increase cache: ALTER SYSTEM SET DB_CACHE_SIZE = 1G;
Rebuild indexes: ALTER INDEX idx_name REBUILD;
Monitor: V$FILESTAT.

6. Latch Contention
Short Description: library cache latch contention from high parses.
Impact: Slows SQL execution.
Solution: Reduce parsing.

Steps:
Check waits: SELECT NAME, GETS, MISSES FROM V$LATCH WHERE NAME = 'library cache';
Find sessions: SELECT SID, EVENT FROM V$SESSION_WAIT WHERE EVENT LIKE 'latch%';
Review SQL: SELECT SQL_TEXT FROM V$SQL WHERE PARSE_CALLS > 100;
Use bind variables.
Increase pool: ALTER SYSTEM SET SHARED_POOL_SIZE = 500M;
Verify: V$LATCH.

7. Buffer Cache Contention
Short Description: Contention on cache buffers chains from hot blocks.
Impact: Slows data access.
Solution: Distribute load.

Steps:
Check contention: SELECT NAME, WAITS FROM V$LATCH WHERE NAME = 'cache buffers chains';
Find hot blocks: SELECT OBJECT_NAME, BLOCK# FROM V$SEGMENT_STATISTICS WHERE STATISTIC_NAME = 'physical reads' ORDER BY VALUE DESC;
Partition table: ALTER TABLE accounts PARTITION BY RANGE (account_id) (PARTITION p1 VALUES LESS THAN (1000), PARTITION p2 VALUES LESS THAN (MAXVALUE));
Rebuild indexes: ALTER INDEX idx_accounts REBUILD;
Increase cache: ALTER SYSTEM SET DB_CACHE_SIZE = 2G;
Verify: V$LATCH.

Category 3: Memory and Storage Issues

8. Library Cache Misses
Short Description: High hard parses from unshared SQL.
Impact: Increases CPU usage.
Solution: Minimize hard parses.

Steps:
Check ratio: SELECT NAMESPACE, GETS, GETHITRATIO FROM V$LIBRARYCACHE;
Find SQL: SELECT SQL_TEXT, PARSE_CALLS FROM V$SQL ORDER BY PARSE_CALLS DESC;
Use bind variables: SELECT * FROM accounts WHERE id = :1;
Set sharing: ALTER SYSTEM SET CURSOR_SHARING = FORCE;
Flush pool: ALTER SYSTEM FLUSH SHARED_POOL;
Monitor: V$LIBRARYCACHE.

9. Log Buffer Space Waits
Short Description: log buffer space waits from slow redo log writes.
Impact: Slows commits.
Solution: Optimize redo handling.

Steps:
Check waits: SELECT NAME, VALUE FROM V$SYSSTAT WHERE NAME LIKE 'redo%';
Increase buffer: ALTER SYSTEM SET LOG_BUFFER = 10M;
Add log group: ALTER DATABASE ADD LOGFILE GROUP 3 ('/path/redo03.log') SIZE 100M;
Check switches: SELECT GROUP#, ARCHIVED, STATUS FROM V$LOG;
Adjust checkpoint: ALTER SYSTEM SET FAST_START_MTTR_TARGET = 300;
Monitor: V$SYSSTAT.

10. Undo Tablespace Issues
Short Description: snapshot too old errors from insufficient undo space.
Impact: Transaction failures.
Solution: Optimize undo management.

Steps:
Check usage: SELECT TABLESPACE_NAME, BYTES_USED FROM V$UNDOSTAT;
Resize space: ALTER DATABASE DATAFILE '/path/undo01.dbf' RESIZE 500M;
Set retention: ALTER SYSTEM SET UNDO_RETENTION = 3600;
Add datafile: ALTER TABLESPACE UNDO_TBS ADD DATAFILE '/path/undo02.dbf' SIZE 200M;
Find queries: SELECT SID, SQL_TEXT FROM V$SESSION JOIN V$SQL ON V$SESSION.SQL_ID = V$SQL.SQL_ID WHERE STATUS = 'ACTIVE';
Monitor: V$UNDOSTAT.

11. Memory Allocation Issues
Short Description: SGA/PGA misconfiguration causes swapping.
Impact: Slows database performance.
Solution: Tune memory settings.

Steps:
Check SGA: SELECT POOL, NAME, BYTES FROM V$SGASTAT;
Review PGA: SELECT NAME, VALUE FROM V$PGASTAT WHERE NAME LIKE 'total PGA%';
Increase SGA: ALTER SYSTEM SET SGA_TARGET = 4G;
Adjust PGA: ALTER SYSTEM SET PGA_AGGREGATE_TARGET = 1G;
Check OS swapping (e.g., vmstat on Unix).
Monitor: V$SGASTAT and V$PGASTAT.

Category 4: Query and Execution Issues

12. SQL Plan Regression
Short Description: Query slows after statistics refresh.
Impact: Degraded report performance.
Solution: Stabilize the plan.

Steps:
Find SQL: SELECT SQL_ID, EXECUTIONS, ELAPSED_TIME FROM V$SQL WHERE ELAPSED_TIME > 1000000;
Compare plans: SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<sql_id>'));
Create baseline: DECLARE l_plan PLS_INTEGER; BEGIN l_plan := DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE('<sql_id>'); END;
Pin plan: ALTER SESSION SET OPTIMIZER_USE_SQL_PLAN_BASELINES = TRUE;
Recompute stats: EXEC DBMS_STATS.GATHER_TABLE_STATS('SCHEMA', 'TABLE', CASCADE => TRUE);
Monitor: V$SQL.

13. Parallel Execution Overhead
Short Description: Excessive parallel queries consume CPU.
Impact: Slows other operations.
Solution: Control parallel execution.

Steps:
Check usage: SELECT SQL_ID, PX_SERVERS_EXECUTED FROM V$SQL WHERE PX_SERVERS_EXECUTED > 0;
Limit servers: ALTER SYSTEM SET PARALLEL_MAX_SERVERS = 16;
Adjust percent: ALTER SESSION SET PARALLEL_MIN_PERCENT = 50;
Rewrite SQL: Remove PARALLEL hint.
Monitor CPU: V$SYSSTAT.
Verify: V$PX_PROCESS.

14. Index Contention
Short Description: enq: TX - index contention during inserts.
Impact: Delays in DML operations.
Solution: Reduce contention.

Steps:
Identify waits: SELECT EVENT, P1, P2 FROM V$SESSION_WAIT WHERE EVENT LIKE 'enq: TX - index%';
Find index: SELECT OBJECT_NAME FROM DBA_OBJECTS WHERE OBJECT_ID = <P2_value>;
Partition index: ALTER INDEX idx_trans PARTITION BY RANGE (trans_date) (PARTITION p1 VALUES LESS THAN (TO_DATE('2025-01-01', 'YYYY-MM-DD')), PARTITION p2 VALUES LESS THAN (MAXVALUE));
Rebuild index: ALTER INDEX idx_trans REBUILD;
Monitor: V$SESSION_WAIT.
Adjust app: Reduce insert frequency.

Category 5: Logging and Archiving Issues

15. Checkpoint Inefficiency
Short Description: Frequent checkpoints cause I/O spikes.
Impact: Performance dips during peaks.
Solution: Optimize checkpoint frequency.

Steps:
Check waits: SELECT EVENT, TOTAL_WAITS FROM V$SYSTEM_EVENT WHERE EVENT LIKE 'log file switch%';
Increase log size: ALTER DATABASE DROP LOGFILE GROUP 1; ALTER DATABASE ADD LOGFILE GROUP 1 ('/path/redo01.log') SIZE 200M;
Adjust target: ALTER SYSTEM SET FAST_START_MTTR_TARGET = 600;
Add log group: ALTER DATABASE ADD LOGFILE GROUP 4 ('/path/redo04.log') SIZE 200M;
Monitor switches: SELECT GROUP#, ARCHIVED, STATUS FROM V$LOG;
Verify: V$SYSTEM_EVENT.

16. Archive Log Generation Lag
Short Description: Archiving lags, risking downtime.
Impact: Database hangs due to full destinations.
Solution: Improve archive management.

Steps:
Check status: SELECT DEST_ID, STATUS, DESTINATION FROM V$ARCHIVE_DEST_STATUS;
Increase space: Add disk to /archivelog_path.
Add destination: ALTER SYSTEM SET LOG_ARCHIVE_DEST_2 = 'LOCATION=/new_path';
Force archive: ALTER SYSTEM ARCHIVE LOG CURRENT;
Adjust processes: ALTER SYSTEM SET LOG_ARCHIVE_MAX_PROCESSES = 8;
Monitor: V$ARCHIVE_DEST_STATUS.

Category 6: Temporary and Cluster Issues

17. Temp Tablespace Contention
Short Description: High direct path read/write waits from temp overload.
Impact: Slows sorts and joins.
Solution: Optimize temp usage.

Steps:
Check usage: SELECT TABLESPACE_NAME, BYTES_USED FROM V$TEMPSTAT;
Add files: ALTER TABLESPACE TEMP ADD TEMPFILE '/path/temp02.dbf' SIZE 500M;
Increase PGA: ALTER SYSTEM SET PGA_AGGREGATE_TARGET = 2G;
Optimize SQL: Add indexes.
Monitor: V$TEMPSTAT.
Verify: V$SESSION_WAIT.

18. RAC Interconnect Issues
Short Description: gc buffer busy waits in RAC from slow interconnect.
Impact: Degrades cluster performance.
Solution: Optimize interconnect traffic.

Steps:
Check waits: SELECT EVENT, TOTAL_WAITS FROM GV$SYSTEM_EVENT WHERE EVENT LIKE 'gc buffer busy%';
Verify performance: SELECT INSTANCE_NAME, VALUE FROM GV$SYSSTAT WHERE NAME = 'gc blocks lost';
Increase bandwidth: Work with network team.
Adjust fusion: ALTER SYSTEM SET _GC_AFFINITY_LIMIT = 50;
Partition tables: Distribute block sharing.
Monitor: GV$SYSTEM_EVENT.

Category 7: Application and Sequence Issues

19. Application-Level Block Contention
Short Description: Sequence generator contention slows inserts.
Impact: Delays in transaction processing.
Solution: Optimize sequence usage.

Steps:
Find hot blocks: SELECT OBJECT_NAME, VALUE FROM V$SEGMENT_STATISTICS WHERE STATISTIC_NAME = 'physical reads' ORDER BY VALUE DESC;
Check settings: SELECT SEQUENCE_NAME, CACHE_SIZE FROM USER_SEQUENCES WHERE SEQUENCE_NAME = 'SEQ_NAME';
Increase cache: ALTER SEQUENCE SEQ_NAME CACHE 1000;
Consider NOCACHE if needed.
Partition table: Distribute inserts.
Monitor: V$SEGMENT_STATISTICS.

20. High Network Latency
Short Description: High SQL*Net message from client waits from network delays.
Impact: Slow responses for remote users.
Solution: Reduce round-trips.

Steps:
Identify waits: SELECT EVENT, TOTAL_WAITS FROM V$SESSION_EVENT WHERE EVENT LIKE 'SQL*Net%';
Trace session: EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE(SESSION_ID => <sid>, SERIAL_NUM => <serial#>, WAITS => TRUE);
Review trace file (in udump directory).
Optimize SQL: Use FORALL in PL/SQL.
Improve bandwidth: Work with network team.
Monitor: V$SESSION_EVENT.

21. Unexpected Query Delay
Short Description: A daily query that finishes in 30 seconds is now running over 2 hours.
Impact: Delays critical reports and transactions.
Solution: Diagnose and optimize the query.

Steps:
Identify the query: SELECT SQL_ID, SQL_TEXT FROM V$SQL WHERE SQL_TEXT LIKE '%<keyword>%' ORDER BY LAST_ACTIVE_TIME DESC;
Check execution time: SELECT SID, ELAPSED_TIME FROM V$SESSION WHERE SQL_ID = '<sql_id>';
Compare plans: SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<sql_id>'));
Check for locks or waits: SELECT EVENT FROM V$SESSION_WAIT WHERE SID = <sid>;
Add index or hint: CREATE INDEX idx_col ON table(col); or SELECT /*+ INDEX(table idx_col) */ * FROM table;
Monitor: V$SESSION and re-run to confirm.

22. Post-Upgrade Memory Inefficiency
Short Description: After upgrading to Oracle 19c, SGA settings cause high memory usage.
Impact: Slow performance due to excessive paging.
Solution: Reconfigure memory parameters.

Steps:
Check current SGA: SELECT POOL, NAME, BYTES FROM V$SGASTAT;
Review AWR for memory waits: SELECT * FROM DBA_HIST_SYSTEM_EVENT WHERE EVENT_NAME LIKE '%memory%';
Reset SGA: ALTER SYSTEM SET SGA_TARGET = 3G;
Adjust PGA: ALTER SYSTEM SET PGA_AGGREGATE_TARGET = 800M;
Validate with OS tools (e.g., top or vmstat).
Monitor: V$SGASTAT post-adjustment.

23. Post-Migration Query Performance Drop
Short Description: After migrating to a new server, a key query runs 50% slower.
Impact: Delays in critical banking reports.
Solution: Optimize post-migration configuration.

Steps:
Compare plans: SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<sql_id>')); on old and new.
Check stats: SELECT TABLE_NAME, LAST_ANALYZED FROM USER_TABLES WHERE TABLE_NAME = '<table>';
Gather stats: EXEC DBMS_STATS.GATHER_TABLE_STATS('SCHEMA', '<table>', CASCADE => TRUE);
Adjust I/O: ALTER TABLESPACE data MOVE DATAFILE 'old_path' TO '/new_path';
Test query: Run with SQL_TRACE enabled.
Monitor: V$SQL for elapsed time.

24. Post-Upgrade Archive Lag
Short Description: After upgrading to 19c, archive log generation slows.
Impact: Increased risk of database stalls.
Solution: Tune archiving post-upgrade.

Steps:
Check status: SELECT DEST_ID, STATUS FROM V$ARCHIVE_DEST_STATUS;
Review redo log size: SELECT GROUP#, BYTES FROM V$LOG;
Increase log size: ALTER DATABASE ADD LOGFILE GROUP 5 ('/path/redo05.log') SIZE 300M;
Adjust processes: ALTER SYSTEM SET LOG_ARCHIVE_MAX_PROCESSES = 10;
Clear old logs: ALTER SYSTEM SWITCH LOGFILE;
Monitor: V$ARCHIVE_DEST_STATUS.

25. Post-Migration Cluster Latency
Short Description: After migrating to RAC, interconnect latency increases.
Impact: Slows cluster-wide operations.
Solution: Optimize RAC configuration.

Steps:
Check waits: SELECT EVENT, TOTAL_WAITS FROM GV$SYSTEM_EVENT WHERE EVENT LIKE 'gc%';
Review interconnect: SELECT INST_ID, NAME, VALUE FROM GV$SYSSTAT WHERE NAME LIKE 'gc%lost';
Adjust network: Work with team to reduce latency.
Tune parameters: ALTER SYSTEM SET _GC_POLICY_TIME = 100;
Redistribute data: ALTER TABLE accounts REORGANIZE PARTITION;
Monitor: GV$SYSTEM_EVENT.

26. Excessive Cursor Caching
Short Description: Post-upgrade, too many cursors are cached, slowing session performance.
Impact: Increased memory usage and session delays.
Solution: Adjust cursor settings.

Steps:
Check cursor usage: SELECT VALUE FROM V$SYSSTAT WHERE NAME = 'session cursor cache hits';
Review parameter: SHOW PARAMETER open_cursors;
Increase limit: ALTER SYSTEM SET OPEN_CURSORS = 1000;
Clear cache: ALTER SYSTEM FLUSH SHARED_POOL;
Test application performance.
Monitor: V$SESSTAT for cursor hits.

27. Post-Migration Data Skew
Short Description: After migration, data distribution causes uneven query performance.
Impact: Slows down specific queries on skewed tables.
Solution: Rebalance data distribution.

Steps:
Identify skewed tables: SELECT TABLE_NAME, NUM_ROWS FROM USER_TABLES WHERE NUM_ROWS > 1000000 ORDER BY NUM_ROWS DESC;
Check histograms: SELECT COLUMN_NAME, HISTOGRAM FROM USER_TAB_COLS WHERE TABLE_NAME = '<table>';
Gather stats with histograms: EXEC DBMS_STATS.GATHER_TABLE_STATS('SCHEMA', '<table>', METHOD_OPT => 'FOR ALL COLUMNS SIZE 254');
Partition table if needed: ALTER TABLE accounts PARTITION BY HASH (account_id) PARTITIONS 4;
Test query: Run with EXPLAIN PLAN.
Monitor: V$SQL for performance.

28. Post-Upgrade Index Fragmentation
Short Description: After upgrading, indexes are fragmented, slowing DML.
Impact: Increased I/O and slower updates.
Solution: Rebuild fragmented indexes.

Steps:
Check fragmentation: SELECT INDEX_NAME, DEL_LF_ROWS, LF_ROWS FROM INDEX_STATS;
Analyze index: ANALYZE INDEX idx_name VALIDATE STRUCTURE;
Rebuild index: ALTER INDEX idx_name REBUILD;
Verify space: SELECT INDEX_NAME, BLEVEL FROM USER_INDEXES;
Test DML performance.
Monitor: V$SEGMENT_STATISTICS.

29. Post-Migration Network Overhead
Short Description: After migration, network latency increases query times.
Impact: Slows remote client operations.
Solution: Optimize network settings.

Steps:
Check network waits: SELECT EVENT, TOTAL_WAITS FROM V$SESSION_EVENT WHERE EVENT LIKE 'SQL*Net%';
Trace session: EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE(SESSION_ID => <sid>, WAITS => TRUE);
Review trace (in udump directory).
Adjust SQL for batching: Use FORALL in PL/SQL.
Collaborate with network team to reduce latency.
Monitor: V$SESSION_EVENT.

30. Post-Upgrade Session Wait Surge
Short Description: After upgrading, session waits spike due to new defaults.
Impact: Degrades overall system responsiveness.
Solution: Tune wait-related parameters.

Steps:
Check waits: SELECT EVENT, TOTAL_WAITS FROM V$SYSTEM_EVENT ORDER BY TOTAL_WAITS DESC;
Review AWR: SELECT * FROM DBA_HIST_SYSTEM_EVENT WHERE EVENT_NAME LIKE '%wait%';
Adjust parameters: ALTER SYSTEM SET _SMALL_TABLE_THRESHOLD = 100;
Increase resources: ALTER SYSTEM SET PROCESSES = 1000;
Test with load: Run typical workload.
Monitor: V$SYSTEM_EVENT.