TL;DR
Choosing cloud for Data Platform = One of most critical decisions. Each cloud has strengths:
Quick Recommendation:
- Startup/Scale-up with analytics focus → GCP (BigQuery best-in-class, simpler pricing)
- Enterprise với AWS ecosystem → AWS (most mature, broadest services)
- Microsoft shop (Office 365, Power BI) → Azure (tight integration)
Component Comparison:
| Component | AWS | GCP | Azure |
|---|---|---|---|
| Data Warehouse | Redshift | BigQuery ⭐ | Synapse Analytics |
| Data Lake | S3 | GCS | ADLS Gen2 |
| ETL/ELT | Glue | Dataflow | Data Factory |
| Orchestration | MWAA (Airflow) | Cloud Composer | Data Factory |
| BI | QuickSight | Looker | Power BI ⭐ |
| ML | SageMaker | Vertex AI | Azure ML |
| Streaming | Kinesis | Pub/Sub + Dataflow | Event Hubs |
Cost Comparison (same 10TB workload):
- AWS: ~$5,500/month (complex pricing)
- GCP: ~$4,200/month (simpler, transparent)
- Azure: ~$5,000/month (good for existing licenses)
Vietnamese Market:
- Startups: 60% choose GCP (BigQuery, ease of use)
- Enterprises: 30% AWS, 10% Azure
Case Studies:
- Vietnamese fintech: GCP BigQuery → 5x faster queries, 40% cost reduction vs Redshift
- E-commerce: AWS (existing infra) → comprehensive ecosystem
- Enterprise: Azure (Microsoft stack) → seamless Power BI integration
Bài này sẽ deep-dive từng cloud với architecture diagrams, pricing details, và decision framework.
1. Why Cloud-Native Data Platform?
1.1. On-Premise Pain Points
Traditional setup (Oracle, SQL Server on-premise):
Problems:
❌ Upfront cost: $500K+ hardware investment
❌ Provisioning time: 3-6 months to procure servers
❌ Over-provisioning: Buy for peak capacity (90% idle most of time)
❌ Maintenance: DBAs spend 40% time on patching, backups
❌ Scaling: Hard limits, can't handle traffic spikes
❌ Disaster recovery: Complex, expensive redundancy
Example: Vietnamese bank
- Bought $2M Oracle Exadata in 2015
- Used 30% capacity on average
- Takes 6 months to add capacity
- DR site costs another $1M
1.2. Cloud Advantages
✅ Scalability: Scale from 1GB to 1PB in minutes ✅ Pay-as-you-go: No upfront costs, pay for what you use ✅ Managed services: No server management, auto-patching ✅ Global: Deploy across multiple regions easily ✅ Innovation: New features released continuously ✅ Disaster recovery: Built-in redundancy, multi-region replication
Cost comparison:
On-Premise (3 years TCO):
Hardware: $500K
Software licenses: $300K
Data center: $150K
Staff (3 DBAs): $540K
Total: $1.49M
Cloud (3 years, average $5K/month):
Monthly cost: $5K × 36 = $180K
Staff (1 engineer, less maintenance): $180K
Total: $360K
Savings: $1.13M (76% reduction)
Note: Cloud cheaper for most workloads, BUT can get expensive if not optimized (see Bài 32: Cost Optimization upcoming).
2. AWS: The Mature Giant
2.1. Overview
Launched: 2006 (first cloud provider) Market share: 32% globally (Gartner 2024) Strengths: Broadest services (200+), most mature, largest community Weaknesses: Complex, steeper learning curve, pricing complicated
2.2. Data Platform Components
Architecture:
┌─────────────────────────────────────────────────────────┐
│ AWS Data Platform Architecture │
├─────────────────────────────────────────────────────────┤
│ │
│ Data Sources │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ RDS │ │ APIs │ │ SaaS │ │
│ │(Postgres)│ │ │ │ (Shopify)│ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └─────────────┴──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Ingestion Layer │ │
│ │ - DMS (Database Migration Service) │ │
│ │ - Kinesis (Streaming) │ │
│ │ - Lambda (Event-driven) │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Data Lake (S3) │ │
│ │ - Raw zone: s3://raw/ │ │
│ │ - Processed: s3://processed/ │ │
│ │ - Analytics: s3://analytics/ │ │
│ │ - Format: Parquet, Delta Lake │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────┼───────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Glue │ │ EMR │ │ Lambda │ │
│ │ (ETL) │ │ (Spark) │ │(Serverless) │
│ └────┬────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └───────────┴────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Data Warehouse (Redshift) │ │
│ │ - Clusters: ra3.4xlarge nodes │ │
│ │ - Auto-scaling │ │
│ │ - Redshift Spectrum (query S3) │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ │
│ │QuickSight│ │ Tableau │ │
│ │ (BI) │ │(3rd party)│ │
│ └─────────┘ └──────────┘ │
│ │
│ Orchestration: MWAA (Managed Airflow) │
│ Governance: Lake Formation, Glue Data Catalog │
│ Security: IAM, KMS encryption │
└─────────────────────────────────────────────────────────┘
2.3. Key Services Deep-Dive
Amazon Redshift (Data Warehouse)
Pros:
- ✅ Columnar storage (fast analytics)
- ✅ Massively parallel processing (MPP)
- ✅ Redshift Spectrum: Query S3 directly (no loading)
- ✅ Auto-scaling: Add nodes on demand
- ✅ Mature ecosystem
Cons:
- ❌ Requires cluster management (not fully serverless)
- ❌ Pauses/resumes manually (or via scheduler)
- ❌ Complex pricing (compute + storage separate)
- ❌ Slower than BigQuery for ad-hoc queries
Pricing:
ra3.4xlarge node:
- $3.26/hour = $2,347/month
- 128GB RAM, 32 vCPUs
- Managed storage: $0.024/GB/month
Example cluster (2 nodes):
Compute: $2,347 × 2 = $4,694/month
Storage (10TB): $10,000 × $0.024 = $240/month
Total: ~$4,934/month
AWS Glue (ETL)
Pros:
- ✅ Serverless (no infrastructure)
- ✅ Auto-scaling
- ✅ Python/Spark based
- ✅ Data Catalog (metadata repository)
Cons:
- ❌ Learning curve (PySpark knowledge needed)
- ❌ Debugging difficult (limited logs)
- ❌ Cold starts (5-10 min for large jobs)
Pricing: $0.44 per DPU-hour (Data Processing Unit)
Amazon S3 (Data Lake)
Pros:
- ✅ 99.999999999% durability (11 nines)
- ✅ Unlimited scalability
- ✅ Lifecycle policies (auto-archive to Glacier)
- ✅ Versioning, encryption
Pricing:
Storage tiers:
- S3 Standard: $0.023/GB/month (frequent access)
- S3 Infrequent Access: $0.0125/GB/month
- S3 Glacier: $0.004/GB/month (archive)
10TB example:
Standard: 10,000 × $0.023 = $230/month
With lifecycle (50% to IA after 30 days):
5,000 × $0.023 + 5,000 × $0.0125 = $178/month
2.4. When to Choose AWS
✅ Already on AWS: Existing EC2, RDS → easier to stay ✅ Need broad services: AWS has most services (IoT, ML, etc.) ✅ Enterprise contracts: Discounts via EDP (Enterprise Discount Program) ✅ Hybrid cloud: AWS Outposts for on-premise integration ✅ Regulatory: Some Vietnamese banks require AWS (proven compliance)
3. GCP: The Analytics Powerhouse
3.1. Overview
Launched: 2008 (Google App Engine) Market share: 11% globally Strengths: Best analytics (BigQuery), simpler pricing, innovation Weaknesses: Smaller ecosystem than AWS, less enterprise features
3.2. Data Platform Components
Architecture:
┌─────────────────────────────────────────────────────────┐
│ GCP Data Platform Architecture │
├─────────────────────────────────────────────────────────┤
│ │
│ Data Sources │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Cloud SQL │ │ APIs │ │ SaaS │ │
│ │(Postgres)│ │ │ │ │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └─────────────┴──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Ingestion Layer │ │
│ │ - Pub/Sub (Streaming) │ │
│ │ - Cloud Functions (Event-driven) │ │
│ │ - Dataflow (Batch + Streaming) │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Data Lake (GCS - Cloud Storage) │ │
│ │ - Buckets: gs://raw/, gs://processed/ │ │
│ │ - Format: Parquet, Avro │ │
│ │ - Lifecycle: Auto-archive after 30d │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Transformation (Dataflow) │ │
│ │ - Apache Beam (Python/Java) │ │
│ │ - Auto-scaling workers │ │
│ │ - Batch + Streaming unified │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Data Warehouse (BigQuery) ⭐ │ │
│ │ - Serverless (no clusters!) │ │
│ │ - Slots auto-allocated │ │
│ │ - Petabyte-scale │ │
│ │ - ML built-in (BigQuery ML) │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ │
│ │ Looker │ │ Metabase │ │
│ │ (BI) │ │(3rd party)│ │
│ └─────────┘ └──────────┘ │
│ │
│ Orchestration: Cloud Composer (Managed Airflow) │
│ Governance: Data Catalog, Dataplex │
│ Security: IAM, CMEK encryption │
└─────────────────────────────────────────────────────────┘
3.3. Key Services Deep-Dive
BigQuery (Data Warehouse) ⭐ Best-in-Class
Pros:
- ✅ Fully serverless: No clusters, no nodes, no management
- ✅ Blazing fast: Dremel architecture, 1 PB scan in seconds
- ✅ Transparent pricing: $5/TB scanned (on-demand)
- ✅ SQL standard: Easy to learn
- ✅ Built-in ML: BigQuery ML (train models in SQL)
- ✅ Auto-scaling: Infinitely scalable
Cons:
- ❌ Cost spiral if not careful (scan-based pricing)
- ❌ Limited procedural SQL (no loops, complex logic)
- ❌ Partitioning required for large tables
Pricing:
On-Demand:
$5 per TB scanned
Example: 100GB query = $0.50
Monthly estimate:
1TB scanned/day × 30 days = 30TB
30TB × $5 = $150/month
Flat-Rate (Reserved slots):
$2,000/month = 100 slots (fixed cost, unlimited queries)
Break-even: 400TB scanned/month
Good for: Heavy query workloads
Storage:
Active: $0.02/GB/month
Long-term (90+ days): $0.01/GB/month
10TB storage: $200/month
Google Cloud Storage (Data Lake)
Pros:
- ✅ Simple pricing (clearer than S3)
- ✅ Global by default (no regions to choose for standard)
- ✅ Strong integration với BigQuery
Pricing:
Storage classes:
- Standard: $0.020/GB/month
- Nearline (30+ days): $0.010/GB/month
- Coldline (90+ days): $0.004/GB/month
- Archive (365+ days): $0.0012/GB/month
10TB example (Standard): $200/month
Dataflow (ETL)
Pros:
- ✅ Apache Beam (portable, run on Spark/Flink too)
- ✅ Unified batch + streaming
- ✅ Auto-scaling
Cons:
- ❌ Complex for simple ETL (overkill)
- ❌ Beam learning curve
Alternative: Many use dbt on BigQuery instead (simpler for SQL transformations)
3.4. When to Choose GCP
✅ Analytics-first: BigQuery là best data warehouse ✅ Startups: Simpler, faster to get started ✅ Data science heavy: Vertex AI excellent for ML ✅ Want simplicity: Less moving parts than AWS ✅ Budget transparency: Clearer pricing
Vietnamese market: 60% startups choose GCP (ease of use + BigQuery)
4. Azure: The Microsoft Ecosystem
4.1. Overview
Launched: 2010 (Windows Azure) Market share: 23% globally Strengths: Microsoft integration (Office 365, AD), Power BI, hybrid cloud Weaknesses: Catch-up mode (後起), less data-specific innovation
4.2. Data Platform Components
Architecture:
┌─────────────────────────────────────────────────────────┐
│ Azure Data Platform Architecture │
├─────────────────────────────────────────────────────────┤
│ │
│ Data Sources │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Azure SQL │ │ APIs │ │ Dynamics │ │
│ │ Database │ │ │ │ 365 │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └─────────────┴──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Ingestion Layer │ │
│ │ - Event Hubs (Streaming) │ │
│ │ - Azure Functions (Event-driven) │ │
│ │ - Data Factory (Batch) │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Data Lake (ADLS Gen2) │ │
│ │ - Hierarchical namespace │ │
│ │ - ACLs (POSIX-style) │ │
│ │ - Integration với Azure AD │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Transformation (Data Factory/Synapse) │ │
│ │ - Mapping Data Flows │ │
│ │ - Spark pools │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Data Warehouse (Synapse Analytics) │ │
│ │ - Dedicated SQL pools │ │
│ │ - Serverless SQL pools │ │
│ │ - Spark pools (unified) │ │
│ └─────────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ │
│ │Power BI │⭐ │ Tableau │ │
│ │ (BI) │ │(3rd party)│ │
│ └─────────┘ └──────────┘ │
│ │
│ Orchestration: Data Factory, Synapse Pipelines │
│ Governance: Purview │
│ Security: Azure AD, Key Vault │
└─────────────────────────────────────────────────────────┘
4.3. Key Services Deep-Dive
Azure Synapse Analytics (Data Warehouse)
Pros:
- ✅ Unified analytics (SQL + Spark in one place)
- ✅ Serverless SQL pools (pay-per-query like BigQuery)
- ✅ Tight Power BI integration
- ✅ Data Explorer (time-series analytics)
Cons:
- ❌ Complex (many components to learn)
- ❌ Performance varies (not as fast as BigQuery)
- ❌ Dedicated pools expensive
Pricing:
Serverless SQL Pool:
$5 per TB scanned (same as BigQuery)
Dedicated SQL Pool (DW100c):
$1.20/hour = $864/month
(Small warehouse, 100 compute units)
DW500c (typical production):
$6/hour = $4,320/month
ADLS Gen2 (Data Lake)
Pros:
- ✅ Hierarchical namespace (like filesystem)
- ✅ POSIX ACLs (fine-grained permissions)
- ✅ Cheaper than competitors
Pricing:
Hot tier: $0.0184/GB/month
Cool tier (30+ days): $0.01/GB/month
Archive: $0.00099/GB/month
10TB example: $184/month
Power BI ⭐ Best BI Tool
Pros:
- ✅ Best visualization tool (voted #1 by Gartner)
- ✅ Excel-like familiarity (easy for business users)
- ✅ Tight Office 365 integration
- ✅ Embedded analytics
Pricing:
Power BI Pro: $10/user/month
Power BI Premium: $20/user/month (more features)
For teams:
10 users × $10 = $100/month (cheap!)
4.4. When to Choose Azure
✅ Microsoft shop: Office 365, Active Directory, .NET stack ✅ Power BI requirement: Best BI tool, tightly integrated ✅ Hybrid cloud: Azure Stack for on-premise ✅ Enterprise agreements: Existing Microsoft licenses (savings) ✅ Vietnam government: Some prefer Azure (Microsoft presence)
5. Side-by-Side Comparison
5.1. Feature Matrix
| Feature | AWS | GCP | Azure |
|---|---|---|---|
| Ease of Use | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Documentation | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
| Warehouse Performance | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| ML Capabilities | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| BI Integration | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Pricing Transparency | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Service Breadth | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Vietnam Regions | Singapore, Japan | Singapore | Singapore |
5.2. Cost Comparison (Same Workload)
Workload: E-commerce startup
- 10TB storage (data lake)
- 5TB active warehouse data
- 30TB query scan/month
- 50 DAG runs/day (ETL)
| Component | AWS | GCP | Azure |
|---|---|---|---|
| Storage (10TB lake) | $230 | $200 | $184 |
| Warehouse | $4,934 (Redshift 2-node) | $150 (BigQuery on-demand) | $864 (Serverless SQL) |
| ETL | $300 (Glue) | $200 (Dataflow) | $250 (Data Factory) |
| Orchestration | $50 (MWAA) | $40 (Composer) | Included |
| BI | $50 (QuickSight) | $300 (Looker, 5 users) | $100 (Power BI, 10 users) |
| Total/Month | $5,564 | $890 🏆 | $1,398 |
Note: GCP significantly cheaper for this analytics-heavy workload (BigQuery on-demand magic).
Different workload (less queries, more storage):
- 50TB storage
- 10TB warehouse
- 5TB query scan/month (light usage)
| Component | AWS | GCP | Azure |
|---|---|---|---|
| Storage (50TB) | $1,150 | $1,000 | $920 |
| Warehouse | $4,934 | $150 | $864 |
| ETL | $300 | $200 | $250 |
| Total | $6,384 | $1,350 | $2,034 |
Takeaway: GCP wins for analytics, AWS competitive for storage-heavy, Azure good for Microsoft ecosystem.
6. Decision Framework
6.1. Choose AWS If...
✅ Already heavily invested in AWS ecosystem
✅ Need broadest service catalog (IoT, ML, etc.)
✅ Enterprise với EDP discount program
✅ Specific services only on AWS (e.g., Aurora)
✅ Team expertise in AWS
✅ Hybrid cloud với AWS Outposts
✅ Vietnamese enterprises (familiar, compliant)
6.2. Choose GCP If...
✅ Analytics is core use case (BigQuery best)
✅ Startup/scale-up (fast, simple)
✅ Want transparent pricing
✅ Heavy data science (Vertex AI excellent)
✅ Modern stack (dbt, Fivetran, Looker)
✅ Small team (less ops overhead)
✅ Cost-conscious (often cheapest)
6.3. Choose Azure If...
✅ Microsoft shop (Office 365, AD, .NET)
✅ Power BI is requirement (best BI tool)
✅ Existing Microsoft licenses (cost savings)
✅ Hybrid cloud với Azure Stack
✅ Enterprise agreements with Microsoft
✅ Government/regulated industries (Microsoft compliance)
6.4. Decision Tree
Start
│
├─ Using Office 365 / Power BI? → YES → Azure
│
├─ Analytics-first, small team? → YES → GCP
│
├─ Already on AWS, large ecosystem? → YES → AWS
│
└─ Greenfield, no constraints?
│
├─ Budget < $5K/month → GCP (often cheapest)
├─ Need ML/AI heavy → GCP (Vertex AI best)
├─ Need breadth of services → AWS
└─ Team Microsoft-familiar → Azure
7. Real-World Case Studies
7.1. Vietnamese Fintech: GCP BigQuery
Company: Lending platform (500K users)
Before: AWS Redshift
- 2-node ra3.4xlarge cluster
- Cost: $5,000/month
- Query performance: 30-60 seconds for complex queries
- Maintenance: 20% engineer time on cluster management
Migration to GCP BigQuery:
- Serverless (no cluster management)
- Cost: $2,500/month (50% reduction)
- Query performance: 5-10 seconds (5x faster)
- Maintenance: 0% time (fully managed)
Why BigQuery won:
- No cluster sizing decisions
- Auto-scaling for traffic spikes
- BigQuery ML for churn prediction (no separate ML infra)
- Transparent pricing (easy to forecast)
Quote from CTO:
"BigQuery transformed our analytics. Queries that took minutes now take seconds. Engineers focus on insights, not infrastructure."
7.2. E-Commerce: Staying on AWS
Company: Fashion e-commerce (2M customers)
Stack: All AWS
- Redshift for data warehouse
- S3 for data lake
- EMR for Spark jobs
- QuickSight for BI
Why stayed on AWS:
- Existing infra: 50+ EC2 instances, RDS databases
- Team expertise: 5 engineers AWS-certified
- EDP discount: 20% discount via enterprise agreement
- Ecosystem: Tight integration với existing services
- Switching cost: 6 months + $200K migration effort
Performance: Good enough (queries < 1 minute acceptable)
Quote:
"We considered BigQuery, but migration cost + team retraining outweighed benefits. AWS meets our needs."
7.3. Enterprise: Azure for Microsoft Stack
Company: Manufacturing company (3,000 employees)
Before: On-premise SQL Server
- $500K upfront hardware
- 3 DBAs maintaining
- Limited scalability
Migration to Azure:
- Synapse Analytics for warehouse
- Power BI for BI (already used)
- Azure Data Factory for ETL
Why Azure:
- Power BI: Already company-wide (1000+ users)
- Active Directory: Seamless SSO
- Microsoft licenses: Enterprise agreement savings
- Hybrid: Some data still on-premise (Azure Stack)
- Training: Team familiar with Microsoft tools
Cost: $8,000/month (vs $1.5M TCO on-premise over 3 years)
Quote:
"Azure was natural choice. Power BI integration alone worth it - business users self-serve analytics now."
8. Multi-Cloud Strategy
8.1. When Multi-Cloud Makes Sense
Pros:
- ✅ Avoid vendor lock-in: Can switch if pricing/service changes
- ✅ Best-of-breed: Use BigQuery (GCP) + Power BI (Azure)
- ✅ Redundancy: Disaster recovery across clouds
- ✅ Geographic: Some regions only on certain clouds
Cons:
- ❌ Complexity: 2-3x operational overhead
- ❌ Cost: Data transfer between clouds expensive
- ❌ Expertise: Team needs to know multiple platforms
- ❌ Integration: Harder to connect services
8.2. Hybrid Example
Company: Scale-up using best of each
GCP:
- BigQuery (data warehouse) ← Core analytics
- Vertex AI (ML models)
- Cloud Storage (data lake)
Azure:
- Power BI (BI tool) ← Business users
- Active Directory (SSO)
AWS:
- EC2 (application servers) ← Legacy infra
Glue layer:
- Fivetran: Sync data to BigQuery
- Power BI: Connect to BigQuery (connector)
- Terraform: Manage all 3 clouds
Cost: 10% higher than single-cloud, but flexibility worth it
8.3. Recommendation
For most Vietnamese companies: Single cloud (simpler)
- Startups → GCP
- Enterprises → AWS or Azure (depending on ecosystem)
Multi-cloud only if:
- Large enterprise (>500 employees)
- Specific requirements (certain service only on X cloud)
- Mergers/acquisitions (inherit different clouds)
9. Vietnam-Specific Considerations
9.1. Data Residency
Law: Decree 72/2013, Cybersecurity Law 24/2018
- Some data (banking, telecom) must stay in Vietnam
Cloud options:
- AWS: Singapore region (closest, 10ms latency)
- GCP: Singapore region
- Azure: Singapore region
- Note: No Vietnam regions yet (2025)
Workaround: Some enterprises use local hosting (Viettel Cloud, VNG Cloud) for regulated data, cloud for analytics.
9.2. Pricing (USD vs VND)
Cloud bills in USD → Exchange rate risk
Example:
- Contract in Jan 2023: $5,000/month = 115M VND (rate 23,000)
- By Dec 2023: $5,000/month = 125M VND (rate 25,000)
- 10M VND increase (8.7%) despite same usage
Mitigation:
- Use GCP committed use discounts (lock USD price)
- Budget in VND with 10% buffer
- Monitor exchange rates
9.3. Support
AWS:
- ✅ Vietnam office (Hanoi, HCMC)
- ✅ Vietnamese support team
- ✅ Local AWS user groups
GCP:
- ✅ Vietnam office (HCMC)
- ✅ Growing Vietnamese community
- ✅ Startup programs (credits)
Azure:
- ✅ Microsoft Vietnam (long presence)
- ✅ Strong enterprise support
- ✅ Power BI community
All 3 have good Vietnam presence - support not differentiator.
Kết Luận
No "best" cloud - depends on your needs.
Quick Recommendations:
| Your Situation | Recommendation |
|---|---|
| Startup, analytics-heavy | GCP (BigQuery + simplicity) |
| Enterprise, AWS ecosystem | AWS (breadth + maturity) |
| Microsoft shop, Power BI | Azure (integration) |
| Budget-constrained | GCP (often cheapest) |
| Need ML/AI | GCP (Vertex AI best) |
| Broadest services | AWS (200+ services) |
Key Takeaways:
- GCP: Best for analytics (BigQuery unmatched), simpler, often cheaper
- AWS: Most mature, broadest services, enterprise-friendly
- Azure: Best for Microsoft ecosystem, Power BI integration
- Cost: GCP often cheapest for analytics workloads
- Vietnam market: GCP popular with startups (60%), AWS with enterprises (30%)
- Multi-cloud: Only for large enterprises (complexity cost)
Next Steps:
- ✅ Assess current infrastructure (on-premise vs cloud)
- ✅ Define requirements (analytics vs storage vs ML)
- ✅ Try free tiers (GCP $300, AWS 12 months free, Azure $200)
- ✅ Đọc Cost Optimization to control cloud spend (Bài 32 upcoming)
- ✅ Đọc Lakehouse Architecture for modern data lake (Bài 33)
Need help? Carptech has implemented Data Platforms on all 3 clouds for Vietnamese companies. Book consultation to discuss your cloud strategy.
Related Posts:
- Coming: Cost Optimization cho Data Platform (Bài 32)
- Coming: Lakehouse Architecture (Bài 33)
- Coming: Serverless Data Architecture (Bài 34)
- Data Platform 101 - Foundation concepts




