Quay lại Blog
Cloud & InfrastructureCập nhật: 8 tháng 7, 202519 phút đọc

Cloud Data Platform Architecture: AWS vs GCP vs Azure

So sánh toàn diện 3 cloud platforms cho Data Platform - AWS (Redshift, Glue, S3), GCP (BigQuery, Dataflow, GCS), Azure (Synapse, Data Factory, ADLS). Bao gồm architecture diagrams, cost comparison, decision framework, và case studies từ Vietnamese market.

Nguyễn Minh Tuấn

Nguyễn Minh Tuấn

Principal Data Architect

Three cloud platforms (AWS, GCP, Azure) comparison showing data warehouse, data lake, ETL, and BI components for each platform with architecture diagrams and decision points
#AWS#GCP#Azure#Cloud Architecture#BigQuery#Redshift#Synapse#Data Platform#Cloud Comparison#Multi-Cloud

TL;DR

Choosing cloud for Data Platform = One of most critical decisions. Each cloud has strengths:

Quick Recommendation:

  • Startup/Scale-up with analytics focusGCP (BigQuery best-in-class, simpler pricing)
  • Enterprise với AWS ecosystemAWS (most mature, broadest services)
  • Microsoft shop (Office 365, Power BI)Azure (tight integration)

Component Comparison:

ComponentAWSGCPAzure
Data WarehouseRedshiftBigQuery ⭐Synapse Analytics
Data LakeS3GCSADLS Gen2
ETL/ELTGlueDataflowData Factory
OrchestrationMWAA (Airflow)Cloud ComposerData Factory
BIQuickSightLookerPower BI ⭐
MLSageMakerVertex AIAzure ML
StreamingKinesisPub/Sub + DataflowEvent Hubs

Cost Comparison (same 10TB workload):

  • AWS: ~$5,500/month (complex pricing)
  • GCP: ~$4,200/month (simpler, transparent)
  • Azure: ~$5,000/month (good for existing licenses)

Vietnamese Market:

  • Startups: 60% choose GCP (BigQuery, ease of use)
  • Enterprises: 30% AWS, 10% Azure

Case Studies:

  • Vietnamese fintech: GCP BigQuery → 5x faster queries, 40% cost reduction vs Redshift
  • E-commerce: AWS (existing infra) → comprehensive ecosystem
  • Enterprise: Azure (Microsoft stack) → seamless Power BI integration

Bài này sẽ deep-dive từng cloud với architecture diagrams, pricing details, và decision framework.


1. Why Cloud-Native Data Platform?

1.1. On-Premise Pain Points

Traditional setup (Oracle, SQL Server on-premise):

Problems:
❌ Upfront cost: $500K+ hardware investment
❌ Provisioning time: 3-6 months to procure servers
❌ Over-provisioning: Buy for peak capacity (90% idle most of time)
❌ Maintenance: DBAs spend 40% time on patching, backups
❌ Scaling: Hard limits, can't handle traffic spikes
❌ Disaster recovery: Complex, expensive redundancy

Example: Vietnamese bank

  • Bought $2M Oracle Exadata in 2015
  • Used 30% capacity on average
  • Takes 6 months to add capacity
  • DR site costs another $1M

1.2. Cloud Advantages

✅ Scalability: Scale from 1GB to 1PB in minutes ✅ Pay-as-you-go: No upfront costs, pay for what you use ✅ Managed services: No server management, auto-patching ✅ Global: Deploy across multiple regions easily ✅ Innovation: New features released continuously ✅ Disaster recovery: Built-in redundancy, multi-region replication

Cost comparison:

On-Premise (3 years TCO):
  Hardware: $500K
  Software licenses: $300K
  Data center: $150K
  Staff (3 DBAs): $540K
  Total: $1.49M

Cloud (3 years, average $5K/month):
  Monthly cost: $5K × 36 = $180K
  Staff (1 engineer, less maintenance): $180K
  Total: $360K

Savings: $1.13M (76% reduction)

Note: Cloud cheaper for most workloads, BUT can get expensive if not optimized (see Bài 32: Cost Optimization upcoming).


2. AWS: The Mature Giant

2.1. Overview

Launched: 2006 (first cloud provider) Market share: 32% globally (Gartner 2024) Strengths: Broadest services (200+), most mature, largest community Weaknesses: Complex, steeper learning curve, pricing complicated

2.2. Data Platform Components

Architecture:

┌─────────────────────────────────────────────────────────┐
│              AWS Data Platform Architecture             │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Data Sources                                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐            │
│  │ RDS      │  │ APIs     │  │ SaaS     │            │
│  │(Postgres)│  │          │  │ (Shopify)│            │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘            │
│       │             │              │                   │
│       └─────────────┴──────────────┘                   │
│                     │                                   │
│                     ▼                                   │
│  ┌─────────────────────────────────────────┐           │
│  │ Ingestion Layer                         │           │
│  │  - DMS (Database Migration Service)     │           │
│  │  - Kinesis (Streaming)                  │           │
│  │  - Lambda (Event-driven)                │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│                    ▼                                    │
│  ┌─────────────────────────────────────────┐           │
│  │ Data Lake (S3)                          │           │
│  │  - Raw zone: s3://raw/                  │           │
│  │  - Processed: s3://processed/           │           │
│  │  - Analytics: s3://analytics/           │           │
│  │  - Format: Parquet, Delta Lake          │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│        ┌───────────┼───────────┐                       │
│        │           │           │                       │
│        ▼           ▼           ▼                       │
│  ┌─────────┐ ┌──────────┐ ┌──────────┐               │
│  │ Glue    │ │ EMR      │ │ Lambda   │               │
│  │ (ETL)   │ │ (Spark)  │ │(Serverless)              │
│  └────┬────┘ └────┬─────┘ └────┬─────┘               │
│       │           │            │                       │
│       └───────────┴────────────┘                       │
│                   │                                     │
│                   ▼                                     │
│  ┌─────────────────────────────────────────┐           │
│  │ Data Warehouse (Redshift)               │           │
│  │  - Clusters: ra3.4xlarge nodes          │           │
│  │  - Auto-scaling                         │           │
│  │  - Redshift Spectrum (query S3)         │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│        ┌───────────┴───────────┐                       │
│        │                       │                       │
│        ▼                       ▼                       │
│  ┌─────────┐            ┌──────────┐                  │
│  │QuickSight│            │ Tableau  │                  │
│  │  (BI)   │            │(3rd party)│                  │
│  └─────────┘            └──────────┘                  │
│                                                         │
│  Orchestration: MWAA (Managed Airflow)                 │
│  Governance: Lake Formation, Glue Data Catalog         │
│  Security: IAM, KMS encryption                         │
└─────────────────────────────────────────────────────────┘

2.3. Key Services Deep-Dive

Amazon Redshift (Data Warehouse)

Pros:

  • ✅ Columnar storage (fast analytics)
  • ✅ Massively parallel processing (MPP)
  • ✅ Redshift Spectrum: Query S3 directly (no loading)
  • ✅ Auto-scaling: Add nodes on demand
  • ✅ Mature ecosystem

Cons:

  • ❌ Requires cluster management (not fully serverless)
  • ❌ Pauses/resumes manually (or via scheduler)
  • ❌ Complex pricing (compute + storage separate)
  • ❌ Slower than BigQuery for ad-hoc queries

Pricing:

ra3.4xlarge node:
  - $3.26/hour = $2,347/month
  - 128GB RAM, 32 vCPUs
  - Managed storage: $0.024/GB/month

Example cluster (2 nodes):
  Compute: $2,347 × 2 = $4,694/month
  Storage (10TB): $10,000 × $0.024 = $240/month
  Total: ~$4,934/month

AWS Glue (ETL)

Pros:

  • ✅ Serverless (no infrastructure)
  • ✅ Auto-scaling
  • ✅ Python/Spark based
  • ✅ Data Catalog (metadata repository)

Cons:

  • ❌ Learning curve (PySpark knowledge needed)
  • ❌ Debugging difficult (limited logs)
  • ❌ Cold starts (5-10 min for large jobs)

Pricing: $0.44 per DPU-hour (Data Processing Unit)

Amazon S3 (Data Lake)

Pros:

  • ✅ 99.999999999% durability (11 nines)
  • ✅ Unlimited scalability
  • ✅ Lifecycle policies (auto-archive to Glacier)
  • ✅ Versioning, encryption

Pricing:

Storage tiers:
  - S3 Standard: $0.023/GB/month (frequent access)
  - S3 Infrequent Access: $0.0125/GB/month
  - S3 Glacier: $0.004/GB/month (archive)

10TB example:
  Standard: 10,000 × $0.023 = $230/month
  With lifecycle (50% to IA after 30 days):
    5,000 × $0.023 + 5,000 × $0.0125 = $178/month

2.4. When to Choose AWS

Already on AWS: Existing EC2, RDS → easier to stay ✅ Need broad services: AWS has most services (IoT, ML, etc.) ✅ Enterprise contracts: Discounts via EDP (Enterprise Discount Program) ✅ Hybrid cloud: AWS Outposts for on-premise integration ✅ Regulatory: Some Vietnamese banks require AWS (proven compliance)


3. GCP: The Analytics Powerhouse

3.1. Overview

Launched: 2008 (Google App Engine) Market share: 11% globally Strengths: Best analytics (BigQuery), simpler pricing, innovation Weaknesses: Smaller ecosystem than AWS, less enterprise features

3.2. Data Platform Components

Architecture:

┌─────────────────────────────────────────────────────────┐
│              GCP Data Platform Architecture             │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Data Sources                                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐            │
│  │Cloud SQL │  │ APIs     │  │ SaaS     │            │
│  │(Postgres)│  │          │  │          │            │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘            │
│       │             │              │                   │
│       └─────────────┴──────────────┘                   │
│                     │                                   │
│                     ▼                                   │
│  ┌─────────────────────────────────────────┐           │
│  │ Ingestion Layer                         │           │
│  │  - Pub/Sub (Streaming)                  │           │
│  │  - Cloud Functions (Event-driven)       │           │
│  │  - Dataflow (Batch + Streaming)         │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│                    ▼                                    │
│  ┌─────────────────────────────────────────┐           │
│  │ Data Lake (GCS - Cloud Storage)         │           │
│  │  - Buckets: gs://raw/, gs://processed/  │           │
│  │  - Format: Parquet, Avro                │           │
│  │  - Lifecycle: Auto-archive after 30d    │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│                    ▼                                    │
│  ┌─────────────────────────────────────────┐           │
│  │ Transformation (Dataflow)               │           │
│  │  - Apache Beam (Python/Java)            │           │
│  │  - Auto-scaling workers                 │           │
│  │  - Batch + Streaming unified            │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│                    ▼                                    │
│  ┌─────────────────────────────────────────┐           │
│  │ Data Warehouse (BigQuery) ⭐            │           │
│  │  - Serverless (no clusters!)            │           │
│  │  - Slots auto-allocated                 │           │
│  │  - Petabyte-scale                       │           │
│  │  - ML built-in (BigQuery ML)            │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│        ┌───────────┴───────────┐                       │
│        │                       │                       │
│        ▼                       ▼                       │
│  ┌─────────┐            ┌──────────┐                  │
│  │ Looker  │            │ Metabase │                  │
│  │  (BI)   │            │(3rd party)│                  │
│  └─────────┘            └──────────┘                  │
│                                                         │
│  Orchestration: Cloud Composer (Managed Airflow)       │
│  Governance: Data Catalog, Dataplex                    │
│  Security: IAM, CMEK encryption                        │
└─────────────────────────────────────────────────────────┘

3.3. Key Services Deep-Dive

BigQuery (Data Warehouse) ⭐ Best-in-Class

Pros:

  • Fully serverless: No clusters, no nodes, no management
  • Blazing fast: Dremel architecture, 1 PB scan in seconds
  • Transparent pricing: $5/TB scanned (on-demand)
  • SQL standard: Easy to learn
  • Built-in ML: BigQuery ML (train models in SQL)
  • Auto-scaling: Infinitely scalable

Cons:

  • ❌ Cost spiral if not careful (scan-based pricing)
  • ❌ Limited procedural SQL (no loops, complex logic)
  • ❌ Partitioning required for large tables

Pricing:

On-Demand:
  $5 per TB scanned
  Example: 100GB query = $0.50

  Monthly estimate:
    1TB scanned/day × 30 days = 30TB
    30TB × $5 = $150/month

Flat-Rate (Reserved slots):
  $2,000/month = 100 slots (fixed cost, unlimited queries)

  Break-even: 400TB scanned/month
  Good for: Heavy query workloads

Storage:
  Active: $0.02/GB/month
  Long-term (90+ days): $0.01/GB/month

  10TB storage: $200/month

Google Cloud Storage (Data Lake)

Pros:

  • ✅ Simple pricing (clearer than S3)
  • ✅ Global by default (no regions to choose for standard)
  • ✅ Strong integration với BigQuery

Pricing:

Storage classes:
  - Standard: $0.020/GB/month
  - Nearline (30+ days): $0.010/GB/month
  - Coldline (90+ days): $0.004/GB/month
  - Archive (365+ days): $0.0012/GB/month

10TB example (Standard): $200/month

Dataflow (ETL)

Pros:

  • ✅ Apache Beam (portable, run on Spark/Flink too)
  • ✅ Unified batch + streaming
  • ✅ Auto-scaling

Cons:

  • ❌ Complex for simple ETL (overkill)
  • ❌ Beam learning curve

Alternative: Many use dbt on BigQuery instead (simpler for SQL transformations)

3.4. When to Choose GCP

Analytics-first: BigQuery là best data warehouse ✅ Startups: Simpler, faster to get started ✅ Data science heavy: Vertex AI excellent for ML ✅ Want simplicity: Less moving parts than AWS ✅ Budget transparency: Clearer pricing

Vietnamese market: 60% startups choose GCP (ease of use + BigQuery)


4. Azure: The Microsoft Ecosystem

4.1. Overview

Launched: 2010 (Windows Azure) Market share: 23% globally Strengths: Microsoft integration (Office 365, AD), Power BI, hybrid cloud Weaknesses: Catch-up mode (後起), less data-specific innovation

4.2. Data Platform Components

Architecture:

┌─────────────────────────────────────────────────────────┐
│             Azure Data Platform Architecture            │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Data Sources                                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐            │
│  │Azure SQL │  │ APIs     │  │ Dynamics │            │
│  │ Database │  │          │  │   365    │            │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘            │
│       │             │              │                   │
│       └─────────────┴──────────────┘                   │
│                     │                                   │
│                     ▼                                   │
│  ┌─────────────────────────────────────────┐           │
│  │ Ingestion Layer                         │           │
│  │  - Event Hubs (Streaming)               │           │
│  │  - Azure Functions (Event-driven)       │           │
│  │  - Data Factory (Batch)                 │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│                    ▼                                    │
│  ┌─────────────────────────────────────────┐           │
│  │ Data Lake (ADLS Gen2)                   │           │
│  │  - Hierarchical namespace               │           │
│  │  - ACLs (POSIX-style)                   │           │
│  │  - Integration với Azure AD             │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│                    ▼                                    │
│  ┌─────────────────────────────────────────┐           │
│  │ Transformation (Data Factory/Synapse)   │           │
│  │  - Mapping Data Flows                   │           │
│  │  - Spark pools                          │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│                    ▼                                    │
│  ┌─────────────────────────────────────────┐           │
│  │ Data Warehouse (Synapse Analytics)      │           │
│  │  - Dedicated SQL pools                  │           │
│  │  - Serverless SQL pools                 │           │
│  │  - Spark pools (unified)                │           │
│  └─────────────────┬───────────────────────┘           │
│                    │                                    │
│        ┌───────────┴───────────┐                       │
│        │                       │                       │
│        ▼                       ▼                       │
│  ┌─────────┐            ┌──────────┐                  │
│  │Power BI │⭐          │ Tableau  │                  │
│  │  (BI)   │            │(3rd party)│                  │
│  └─────────┘            └──────────┘                  │
│                                                         │
│  Orchestration: Data Factory, Synapse Pipelines        │
│  Governance: Purview                                   │
│  Security: Azure AD, Key Vault                         │
└─────────────────────────────────────────────────────────┘

4.3. Key Services Deep-Dive

Azure Synapse Analytics (Data Warehouse)

Pros:

  • ✅ Unified analytics (SQL + Spark in one place)
  • ✅ Serverless SQL pools (pay-per-query like BigQuery)
  • ✅ Tight Power BI integration
  • ✅ Data Explorer (time-series analytics)

Cons:

  • ❌ Complex (many components to learn)
  • ❌ Performance varies (not as fast as BigQuery)
  • ❌ Dedicated pools expensive

Pricing:

Serverless SQL Pool:
  $5 per TB scanned (same as BigQuery)

Dedicated SQL Pool (DW100c):
  $1.20/hour = $864/month
  (Small warehouse, 100 compute units)

DW500c (typical production):
  $6/hour = $4,320/month

ADLS Gen2 (Data Lake)

Pros:

  • ✅ Hierarchical namespace (like filesystem)
  • ✅ POSIX ACLs (fine-grained permissions)
  • ✅ Cheaper than competitors

Pricing:

Hot tier: $0.0184/GB/month
Cool tier (30+ days): $0.01/GB/month
Archive: $0.00099/GB/month

10TB example: $184/month

Power BIBest BI Tool

Pros:

  • ✅ Best visualization tool (voted #1 by Gartner)
  • ✅ Excel-like familiarity (easy for business users)
  • ✅ Tight Office 365 integration
  • ✅ Embedded analytics

Pricing:

Power BI Pro: $10/user/month
Power BI Premium: $20/user/month (more features)

For teams:
  10 users × $10 = $100/month (cheap!)

4.4. When to Choose Azure

Microsoft shop: Office 365, Active Directory, .NET stack ✅ Power BI requirement: Best BI tool, tightly integrated ✅ Hybrid cloud: Azure Stack for on-premise ✅ Enterprise agreements: Existing Microsoft licenses (savings) ✅ Vietnam government: Some prefer Azure (Microsoft presence)


5. Side-by-Side Comparison

5.1. Feature Matrix

FeatureAWSGCPAzure
Ease of Use⭐⭐⭐⭐⭐⭐⭐⭐⭐
Documentation⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Warehouse Performance⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
ML Capabilities⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
BI Integration⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Pricing Transparency⭐⭐⭐⭐⭐⭐⭐⭐⭐
Service Breadth⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Vietnam RegionsSingapore, JapanSingaporeSingapore

5.2. Cost Comparison (Same Workload)

Workload: E-commerce startup

  • 10TB storage (data lake)
  • 5TB active warehouse data
  • 30TB query scan/month
  • 50 DAG runs/day (ETL)
ComponentAWSGCPAzure
Storage (10TB lake)$230$200$184
Warehouse$4,934 (Redshift 2-node)$150 (BigQuery on-demand)$864 (Serverless SQL)
ETL$300 (Glue)$200 (Dataflow)$250 (Data Factory)
Orchestration$50 (MWAA)$40 (Composer)Included
BI$50 (QuickSight)$300 (Looker, 5 users)$100 (Power BI, 10 users)
Total/Month$5,564$890 🏆$1,398

Note: GCP significantly cheaper for this analytics-heavy workload (BigQuery on-demand magic).

Different workload (less queries, more storage):

  • 50TB storage
  • 10TB warehouse
  • 5TB query scan/month (light usage)
ComponentAWSGCPAzure
Storage (50TB)$1,150$1,000$920
Warehouse$4,934$150$864
ETL$300$200$250
Total$6,384$1,350$2,034

Takeaway: GCP wins for analytics, AWS competitive for storage-heavy, Azure good for Microsoft ecosystem.


6. Decision Framework

6.1. Choose AWS If...

✅ Already heavily invested in AWS ecosystem
✅ Need broadest service catalog (IoT, ML, etc.)
✅ Enterprise với EDP discount program
✅ Specific services only on AWS (e.g., Aurora)
✅ Team expertise in AWS
✅ Hybrid cloud với AWS Outposts
✅ Vietnamese enterprises (familiar, compliant)

6.2. Choose GCP If...

✅ Analytics is core use case (BigQuery best)
✅ Startup/scale-up (fast, simple)
✅ Want transparent pricing
✅ Heavy data science (Vertex AI excellent)
✅ Modern stack (dbt, Fivetran, Looker)
✅ Small team (less ops overhead)
✅ Cost-conscious (often cheapest)

6.3. Choose Azure If...

✅ Microsoft shop (Office 365, AD, .NET)
✅ Power BI is requirement (best BI tool)
✅ Existing Microsoft licenses (cost savings)
✅ Hybrid cloud với Azure Stack
✅ Enterprise agreements with Microsoft
✅ Government/regulated industries (Microsoft compliance)

6.4. Decision Tree

Start
  │
  ├─ Using Office 365 / Power BI? → YES → Azure
  │
  ├─ Analytics-first, small team? → YES → GCP
  │
  ├─ Already on AWS, large ecosystem? → YES → AWS
  │
  └─ Greenfield, no constraints?
       │
       ├─ Budget < $5K/month → GCP (often cheapest)
       ├─ Need ML/AI heavy → GCP (Vertex AI best)
       ├─ Need breadth of services → AWS
       └─ Team Microsoft-familiar → Azure

7. Real-World Case Studies

7.1. Vietnamese Fintech: GCP BigQuery

Company: Lending platform (500K users)

Before: AWS Redshift

  • 2-node ra3.4xlarge cluster
  • Cost: $5,000/month
  • Query performance: 30-60 seconds for complex queries
  • Maintenance: 20% engineer time on cluster management

Migration to GCP BigQuery:

  • Serverless (no cluster management)
  • Cost: $2,500/month (50% reduction)
  • Query performance: 5-10 seconds (5x faster)
  • Maintenance: 0% time (fully managed)

Why BigQuery won:

  • No cluster sizing decisions
  • Auto-scaling for traffic spikes
  • BigQuery ML for churn prediction (no separate ML infra)
  • Transparent pricing (easy to forecast)

Quote from CTO:

"BigQuery transformed our analytics. Queries that took minutes now take seconds. Engineers focus on insights, not infrastructure."

7.2. E-Commerce: Staying on AWS

Company: Fashion e-commerce (2M customers)

Stack: All AWS

  • Redshift for data warehouse
  • S3 for data lake
  • EMR for Spark jobs
  • QuickSight for BI

Why stayed on AWS:

  • Existing infra: 50+ EC2 instances, RDS databases
  • Team expertise: 5 engineers AWS-certified
  • EDP discount: 20% discount via enterprise agreement
  • Ecosystem: Tight integration với existing services
  • Switching cost: 6 months + $200K migration effort

Performance: Good enough (queries < 1 minute acceptable)

Quote:

"We considered BigQuery, but migration cost + team retraining outweighed benefits. AWS meets our needs."

7.3. Enterprise: Azure for Microsoft Stack

Company: Manufacturing company (3,000 employees)

Before: On-premise SQL Server

  • $500K upfront hardware
  • 3 DBAs maintaining
  • Limited scalability

Migration to Azure:

  • Synapse Analytics for warehouse
  • Power BI for BI (already used)
  • Azure Data Factory for ETL

Why Azure:

  • Power BI: Already company-wide (1000+ users)
  • Active Directory: Seamless SSO
  • Microsoft licenses: Enterprise agreement savings
  • Hybrid: Some data still on-premise (Azure Stack)
  • Training: Team familiar with Microsoft tools

Cost: $8,000/month (vs $1.5M TCO on-premise over 3 years)

Quote:

"Azure was natural choice. Power BI integration alone worth it - business users self-serve analytics now."


8. Multi-Cloud Strategy

8.1. When Multi-Cloud Makes Sense

Pros:

  • Avoid vendor lock-in: Can switch if pricing/service changes
  • Best-of-breed: Use BigQuery (GCP) + Power BI (Azure)
  • Redundancy: Disaster recovery across clouds
  • Geographic: Some regions only on certain clouds

Cons:

  • Complexity: 2-3x operational overhead
  • Cost: Data transfer between clouds expensive
  • Expertise: Team needs to know multiple platforms
  • Integration: Harder to connect services

8.2. Hybrid Example

Company: Scale-up using best of each

GCP:
  - BigQuery (data warehouse) ← Core analytics
  - Vertex AI (ML models)
  - Cloud Storage (data lake)

Azure:
  - Power BI (BI tool) ← Business users
  - Active Directory (SSO)

AWS:
  - EC2 (application servers) ← Legacy infra

Glue layer:
  - Fivetran: Sync data to BigQuery
  - Power BI: Connect to BigQuery (connector)
  - Terraform: Manage all 3 clouds

Cost: 10% higher than single-cloud, but flexibility worth it

8.3. Recommendation

For most Vietnamese companies: Single cloud (simpler)

  • Startups → GCP
  • Enterprises → AWS or Azure (depending on ecosystem)

Multi-cloud only if:

  • Large enterprise (>500 employees)
  • Specific requirements (certain service only on X cloud)
  • Mergers/acquisitions (inherit different clouds)

9. Vietnam-Specific Considerations

9.1. Data Residency

Law: Decree 72/2013, Cybersecurity Law 24/2018

  • Some data (banking, telecom) must stay in Vietnam

Cloud options:

  • AWS: Singapore region (closest, 10ms latency)
  • GCP: Singapore region
  • Azure: Singapore region
  • Note: No Vietnam regions yet (2025)

Workaround: Some enterprises use local hosting (Viettel Cloud, VNG Cloud) for regulated data, cloud for analytics.

9.2. Pricing (USD vs VND)

Cloud bills in USD → Exchange rate risk

Example:

  • Contract in Jan 2023: $5,000/month = 115M VND (rate 23,000)
  • By Dec 2023: $5,000/month = 125M VND (rate 25,000)
  • 10M VND increase (8.7%) despite same usage

Mitigation:

  • Use GCP committed use discounts (lock USD price)
  • Budget in VND with 10% buffer
  • Monitor exchange rates

9.3. Support

AWS:

  • ✅ Vietnam office (Hanoi, HCMC)
  • ✅ Vietnamese support team
  • ✅ Local AWS user groups

GCP:

  • ✅ Vietnam office (HCMC)
  • ✅ Growing Vietnamese community
  • ✅ Startup programs (credits)

Azure:

  • ✅ Microsoft Vietnam (long presence)
  • ✅ Strong enterprise support
  • ✅ Power BI community

All 3 have good Vietnam presence - support not differentiator.


Kết Luận

No "best" cloud - depends on your needs.

Quick Recommendations:

Your SituationRecommendation
Startup, analytics-heavyGCP (BigQuery + simplicity)
Enterprise, AWS ecosystemAWS (breadth + maturity)
Microsoft shop, Power BIAzure (integration)
Budget-constrainedGCP (often cheapest)
Need ML/AIGCP (Vertex AI best)
Broadest servicesAWS (200+ services)

Key Takeaways:

  1. GCP: Best for analytics (BigQuery unmatched), simpler, often cheaper
  2. AWS: Most mature, broadest services, enterprise-friendly
  3. Azure: Best for Microsoft ecosystem, Power BI integration
  4. Cost: GCP often cheapest for analytics workloads
  5. Vietnam market: GCP popular with startups (60%), AWS with enterprises (30%)
  6. Multi-cloud: Only for large enterprises (complexity cost)

Next Steps:

  • ✅ Assess current infrastructure (on-premise vs cloud)
  • ✅ Define requirements (analytics vs storage vs ML)
  • ✅ Try free tiers (GCP $300, AWS 12 months free, Azure $200)
  • ✅ Đọc Cost Optimization to control cloud spend (Bài 32 upcoming)
  • ✅ Đọc Lakehouse Architecture for modern data lake (Bài 33)

Need help? Carptech has implemented Data Platforms on all 3 clouds for Vietnamese companies. Book consultation to discuss your cloud strategy.


Related Posts:

  • Coming: Cost Optimization cho Data Platform (Bài 32)
  • Coming: Lakehouse Architecture (Bài 33)
  • Coming: Serverless Data Architecture (Bài 34)
  • Data Platform 101 - Foundation concepts

Có câu hỏi về Data Platform?

Đội ngũ chuyên gia của Carptech sẵn sàng tư vấn miễn phí về giải pháp phù hợp nhất cho doanh nghiệp của bạn. Đặt lịch tư vấn 60 phút qua Microsoft Teams hoặc gửi form liên hệ.

✓ Miễn phí 100% • ✓ Microsoft Teams • ✓ Không cam kết dài hạn