TL;DR
Data Security = Protecting data from unauthorized access, theft, corruption, và destruction. Trong kỷ nguyên data breaches cost $4.35M/incident (IBM 2023), security không phải luxury - it's business survival.
The Security Triad (CIA):
- Confidentiality: Chỉ authorized users có thể access
- Integrity: Data không bị tampered/corrupted
- Availability: Data accessible khi cần (no ransomware lockouts)
6 Defense Layers (Defense-in-Depth):
- Network Security: VPCs, firewalls, VPNs isolate data
- Identity & Access Management (IAM): Least privilege, RBAC, MFA
- Encryption: At rest (AES-256), in transit (TLS), column-level for PII
- Data Masking: Production data masked trong dev/test environments
- Audit Logging: Track who accessed what, when, từ đâu
- Backup & Disaster Recovery: 3-2-1 rule, tested restores
Threat Landscape Vietnam:
- External: Ransomware attacks tăng 300% (2020-2023)
- Internal: 34% breaches từ insider threats (Verizon DBIR)
- Cost: Vietnamese enterprises lose average $2M per breach
Compliance Frameworks:
- ISO 27001: Information Security Management System (international standard)
- SOC 2: Service Organization Control (for SaaS companies)
- PDPA: Data protection requirements (Decree 13/2023)
Case study Vietnamese fintech: Prevented credential stuffing attack (500K login attempts):
- MFA blocked 99.8% unauthorized access
- Rate limiting stopped automated attacks
- Anomaly detection alerted security team within 5 minutes
- Result: 0 accounts compromised, $0 loss
Bài này sẽ guide bạn qua comprehensive security framework từ fundamentals đến advanced threat protection.
1. The Security Triad: CIA Principles
1.1. Confidentiality
What: Ensure data chỉ accessible by authorized parties
Threats:
- Hacker breaches
- Stolen credentials
- Insider threats (employees stealing data)
- Misconfigured cloud storage (public S3 buckets)
Controls:
- Encryption (data unreadable without keys)
- Access controls (RBAC, least privilege)
- MFA (multi-factor authentication)
- Data classification (restrict access to sensitive data)
Example violation: 2022 Vietnamese e-commerce - public S3 bucket exposed 2M customer records (names, emails, phone, addresses)
Cost: Brand damage + PDPA fines + customer churn
1.2. Integrity
What: Ensure data is accurate và hasn't been tampered
Threats:
- Malicious modification (hacker changes bank balances)
- Accidental corruption (software bugs, disk failures)
- Man-in-the-middle attacks (intercept + modify data in transit)
Controls:
- Checksums/hashes (detect changes)
- Digital signatures (verify authenticity)
- Version control (track changes, rollback)
- Write-once storage (immutable audit logs)
- Input validation (prevent SQL injection, XSS)
Example: Healthcare database - hacker changed patient blood types → potential fatal consequences
1.3. Availability
What: Ensure data accessible khi users cần
Threats:
- Ransomware: Encrypt data, demand payment
- DDoS attacks (Distributed Denial of Service)
- Hardware failures
- Natural disasters (fire, flood, earthquake)
Controls:
- Backups (daily, tested restores)
- Redundancy (multiple servers, regions)
- DDoS protection (Cloudflare, AWS Shield)
- Disaster recovery plan
- High availability architecture (99.9%+ uptime)
Example: 2023 Vietnamese hospital - ransomware locked all patient records for 3 days
- Had to cancel surgeries
- Paid $50K ransom
- Lesson: Backups are non-negotiable
2. Threat Landscape: Know Your Enemy
2.1. External Threats
1. Ransomware
How it works:
- Phishing email với malicious attachment
- Employee clicks → malware executes
- Malware encrypts all data
- Ransom note: "Pay 10 BTC or lose data forever"
Vietnamese context: Ransomware attacks tăng 300% (2020-2023)
- Targets: Healthcare, education, SMEs (weak security)
- Average ransom: $50K-$200K
- Only 60% get data back even after paying
Defense:
- Email filtering: Block phishing emails
- Endpoint protection: Antivirus, EDR (Endpoint Detection & Response)
- Backups: Air-gapped, tested restores
- Training: Employees recognize phishing
- Patch management: Keep systems updated
2. SQL Injection
How it works:
-- Vulnerable code
query = f"SELECT * FROM users WHERE username = '{user_input}'"
-- Attacker input: admin' OR '1'='1
-- Resulting query:
SELECT * FROM users WHERE username = 'admin' OR '1'='1'
-- Returns ALL users (authentication bypass)
Defense: Parameterized queries
# Bad (vulnerable)
cursor.execute(f"SELECT * FROM users WHERE username = '{username}'")
# Good (safe)
cursor.execute("SELECT * FROM users WHERE username = %s", [username])
3. Credential Stuffing
How it works:
- Hackers buy billions of leaked credentials from dark web
- Automated bots try credentials on your site
- People reuse passwords → high success rate (0.1%-1%)
Vietnamese example: 2023 bank - 500K login attempts in 24 hours
- 0.5% success rate = 2,500 compromised accounts
- If no MFA → hackers drain accounts
Defense:
- MFA (blocks 99.9% of attacks)
- Rate limiting: Max 5 login attempts per minute per IP
- CAPTCHA: Block bots
- Anomaly detection: Alert on login from new device/location
2.2. Internal Threats
Verizon DBIR: 34% of breaches involve internal actors
Types:
1. Malicious Insiders
- Employee steals customer database to sell
- Sabotage (disgruntled employee deletes data)
Example: 2022 Vietnamese fintech - employee exported 100K customer records before quitting
- Sold to competitor for $10K
- Company fined 80M VND (PDPA violation)
Defense:
- Access controls: Least privilege (employees only see data they need)
- Audit logs: Track all data exports
- DLP (Data Loss Prevention): Block bulk exports
- Background checks: Before hiring
- Exit procedures: Revoke access immediately when employee leaves
2. Accidental Breaches
- Employee sends email to wrong person (containing PII)
- Misconfigured cloud storage (public instead of private)
- Lost laptop (unencrypted data)
Example: 2023 HR department - sent salary spreadsheet to entire company instead of CFO
Defense:
- Training: Security awareness quarterly
- DLP: Warn before sending sensitive data externally
- Encryption: Laptop encryption mandatory
- Email controls: Prevent "Reply All" disasters
2.3. Vietnamese Cybersecurity Landscape
Statistics (Vietnam Cybersecurity Report 2023):
- 15,000+ attacks daily on Vietnamese organizations
- Top targets: Finance (30%), E-commerce (25%), Government (20%)
- Top attack types: Phishing (40%), Ransomware (25%), DDoS (20%)
- Average cost per breach: $2M (lower than global $4.35M, but growing)
Notable incidents:
- 2022: E-commerce platform - 2M customer records leaked
- 2023: Bank - Credential stuffing, 2,500 accounts compromised
- 2023: Hospital - Ransomware, 3-day downtime
Regulatory response:
- Cybersecurity Law (2018): Data localization, reporting requirements
- PDPA (Decree 13/2023): Data protection obligations
- Circular 03/2017: Security requirements for financial institutions
3. Defense Layer 1: Network Security
3.1. Virtual Private Cloud (VPC)
What: Isolated network trong cloud, like your own private data center
Architecture:
┌─────────────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Public Subnet │ │ Private Subnet │ │
│ │ (10.0.1.0/24) │ │ (10.0.2.0/24) │ │
│ │ │ │ │ │
│ │ ┌─────────────┐ │ │ ┌──────────────┐ │ │
│ │ │ Load │ │ │ │ Application │ │ │
│ │ │ Balancer │────────→│ Servers │ │ │
│ │ └─────────────┘ │ │ └──────────────┘ │ │
│ │ │ │ │ │ │
│ └─────────────────────┘ │ ▼ │ │
│ │ │ ┌──────────────┐ │ │
│ ▼ │ │ Database │ │ │
│ ┌─────────────────────┐ │ │ (Private) │ │ │
│ │ Internet Gateway │ │ └──────────────┘ │ │
│ └─────────────────────┘ │ │ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────┘
Rules:
- Public subnet: Internet-facing (load balancers)
- Private subnet: No internet access (databases)
- App servers: Can reach internet via NAT gateway (for updates)
- Database: ONLY accessible from app servers (not internet)
GCP Example:
# Create VPC
gcloud compute networks create carptech-vpc \
--subnet-mode=custom
# Create private subnet for databases
gcloud compute networks subnets create db-subnet \
--network=carptech-vpc \
--region=asia-southeast1 \
--range=10.0.2.0/24 \
--enable-private-ip-google-access
# Firewall rule: Only app servers can access database
gcloud compute firewall-rules create allow-app-to-db \
--network=carptech-vpc \
--allow=tcp:5432 \
--source-tags=app-server \
--target-tags=database \
--description="Allow app servers to access database"
# Block all other traffic to database
gcloud compute firewall-rules create deny-all-to-db \
--network=carptech-vpc \
--action=DENY \
--rules=all \
--target-tags=database \
--priority=1000
3.2. Firewalls
What: Control traffic in/out of systems
Types:
1. Network Firewall (at VPC level)
# Example: GCP Firewall Rules
rules:
- name: allow-https
direction: INGRESS
allow: tcp:443
source: 0.0.0.0/0 # Allow from anywhere
target: web-servers
- name: allow-ssh-from-office
direction: INGRESS
allow: tcp:22
source: 203.162.4.0/24 # Office IP range only
target: all
- name: deny-all-else
direction: INGRESS
action: DENY
priority: 65534 # Lowest priority (evaluated last)
2. Web Application Firewall (WAF)
Protects against web attacks (SQL injection, XSS, etc.)
# Cloudflare WAF rules
rules:
- name: Block SQL Injection
expression: (http.request.uri.query contains "UNION SELECT")
action: block
- name: Rate Limit Login
expression: (http.request.uri.path eq "/login")
action: challenge
rate_limit: 5 requests per minute
- name: Block Suspicious User-Agents
expression: (http.user_agent contains "sqlmap")
action: block
3.3. VPN (Virtual Private Network)
Use case: Remote employees accessing internal systems
Architecture:
Employee Laptop (Home)
│
│ VPN Tunnel (Encrypted)
│
▼
VPN Gateway (Office/Cloud)
│
▼
Internal Network (Databases, Apps)
Implementation (WireGuard):
# Server config
[Interface]
Address = 10.8.0.1/24
PrivateKey = SERVER_PRIVATE_KEY
ListenPort = 51820
[Peer] # Employee 1
PublicKey = EMPLOYEE1_PUBLIC_KEY
AllowedIPs = 10.8.0.2/32
[Peer] # Employee 2
PublicKey = EMPLOYEE2_PUBLIC_KEY
AllowedIPs = 10.8.0.3/32
Alternative: Zero Trust Network Access (ZTNA)
- Example: Cloudflare Access, Google BeyondCorp
- No VPN needed, access via identity provider (Google login + MFA)
4. Defense Layer 2: Identity & Access Management (IAM)
4.1. Principle of Least Privilege
Rule: Users get minimum access needed to do their job
Bad example:
All employees → Admin access to database
Good example:
Data Analyst → Read-only access to analytics tables
Data Engineer → Read/write access to staging, read-only to production
DBA → Full access, but requires approval + audit
Implementation (PostgreSQL):
-- Create roles
CREATE ROLE analyst;
CREATE ROLE engineer;
CREATE ROLE dba_role;
-- Grant permissions
-- Analyst: Read-only on analytics schema
GRANT CONNECT ON DATABASE prod_db TO analyst;
GRANT USAGE ON SCHEMA analytics TO analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO analyst;
-- Engineer: Read/write on staging, read-only on production
GRANT CONNECT ON DATABASE prod_db TO engineer;
GRANT ALL ON SCHEMA staging TO engineer;
GRANT SELECT ON ALL TABLES IN SCHEMA production TO engineer;
-- DBA: Full access (but individual users, not shared account)
GRANT dba_role TO alice_dba, bob_dba;
GRANT ALL PRIVILEGES ON DATABASE prod_db TO dba_role;
-- Create users and assign roles
CREATE USER analyst1 WITH PASSWORD 'secure_password';
GRANT analyst TO analyst1;
4.2. Role-Based Access Control (RBAC)
What: Assign permissions to roles, not individual users
Example: E-commerce company
roles:
# Customer Service
- name: cs_agent
permissions:
- view_customer_profile
- view_orders
- update_order_status
- issue_refund (< 500K VND)
restrictions:
- cannot_view: customer_password_hash, payment_details
- cannot_export: bulk data
# Marketing Analyst
- name: marketing_analyst
permissions:
- view_aggregated_analytics
- run_queries: analytics schema
- create_dashboards
restrictions:
- cannot_view: PII columns (masked)
- cannot_export: raw customer data
# Data Engineer
- name: data_engineer
permissions:
- read_write: staging, development
- read_only: production
- manage_pipelines
restrictions:
- production_changes: require approval
# Admin
- name: admin
permissions:
- all_access
requirements:
- mfa: required
- audit: all actions logged
- approval: for sensitive operations
BigQuery RBAC Example:
-- Grant roles to users
GRANT `roles/bigquery.dataViewer`
ON TABLE `project.dataset.customers`
TO "user:analyst@company.com";
-- Custom role: Masked PII viewer
CREATE ROLE custom_pii_masked_viewer;
GRANT `bigquery.tables.getData` ON TABLE `project.dataset.customers`
TO ROLE custom_pii_masked_viewer
WITH CONDITION(
-- Row-level security: only see masked data
masked = TRUE
);
-- Assign custom role
GRANT custom_pii_masked_viewer TO "group:marketing@company.com";
4.3. Multi-Factor Authentication (MFA)
What: Require 2+ factors to authenticate
- Something you know: Password
- Something you have: Phone (SMS code), hardware token
- Something you are: Fingerprint, face recognition
Why critical: Passwords alone are not secure
- 80% of breaches involve stolen/weak passwords (Verizon)
- MFA blocks 99.9% of automated attacks (Microsoft)
Implementation (TOTP - Time-based One-Time Password):
# Using pyotp library
import pyotp
import qrcode
# Setup MFA for user
def setup_mfa(user_id):
# Generate secret key
secret = pyotp.random_base32()
# Store secret in database (encrypted)
db.execute(
"UPDATE users SET mfa_secret = %s WHERE user_id = %s",
[encrypt(secret), user_id]
)
# Generate QR code for user to scan with Google Authenticator
totp = pyotp.TOTP(secret)
provisioning_uri = totp.provisioning_uri(
name=user_email,
issuer_name="Carptech"
)
qr = qrcode.make(provisioning_uri)
qr.save(f'/tmp/mfa_qr_{user_id}.png')
return qr
# Verify MFA code during login
def verify_mfa(user_id, code):
# Get user's secret
secret = decrypt(db.get_mfa_secret(user_id))
# Verify code
totp = pyotp.TOTP(secret)
is_valid = totp.verify(code, valid_window=1) # Accept ±30 seconds
if not is_valid:
log_failed_mfa(user_id)
return False
log_successful_mfa(user_id)
return True
# Login flow
@app.route('/login', methods=['POST'])
def login():
username = request.form['username']
password = request.form['password']
# Step 1: Verify password
if not verify_password(username, password):
return {'error': 'Invalid credentials'}, 401
user = get_user(username)
# Step 2: Check if MFA enabled
if user.mfa_enabled:
# Require MFA code
session['pending_mfa_user'] = user.id
return {'require_mfa': True}
# No MFA → login directly (not recommended)
create_session(user.id)
return {'success': True}
@app.route('/login/mfa', methods=['POST'])
def login_mfa():
user_id = session.get('pending_mfa_user')
code = request.form['mfa_code']
if verify_mfa(user_id, code):
create_session(user_id)
return {'success': True}
else:
return {'error': 'Invalid MFA code'}, 401
Enforcement:
- Mandatory: For admins, DBAs, anyone với production access
- Recommended: For all users
- Backup codes: Provide 10 one-time codes in case user loses phone
4.4. Service Accounts
Problem: Applications cần access databases/APIs, nhưng không có "user" to login
Solution: Service accounts (machine accounts)
Best practices:
# Bad: Hardcoded credentials in code
database_url = "postgresql://admin:Password123@db.example.com/prod"
# Good: Service account with rotating keys
service_accounts:
- name: app-server-prod
type: service_account
permissions:
- read: production.analytics
- write: production.events
key_rotation: 90 days
key_storage: Google Secret Manager (encrypted)
- name: etl-pipeline
type: service_account
permissions:
- read: staging.*
- write: analytics.*
restrictions:
- ip_whitelist: [10.0.2.0/24] # Only from ETL servers
GCP Service Account Example:
# Create service account
gcloud iam service-accounts create app-server-prod \
--display-name="Production App Server"
# Grant BigQuery access
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:app-server-prod@my-project.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer"
# Create key (download JSON)
gcloud iam service-accounts keys create key.json \
--iam-account=app-server-prod@my-project.iam.gserviceaccount.com
# Store key in Secret Manager (not in code repository!)
gcloud secrets create app-server-key --data-file=key.json
# Application: Fetch key at runtime
from google.cloud import secretmanager
client = secretmanager.SecretManagerServiceClient()
key = client.access_secret_version(
name="projects/my-project/secrets/app-server-key/versions/latest"
)
credentials = service_account.Credentials.from_service_account_info(
json.loads(key.payload.data.decode('UTF-8'))
)
5. Defense Layer 3: Encryption
5.1. Encryption at Rest
What: Encrypt data stored on disk
Why: If someone steals hard drive, data is unreadable without key
Algorithms: AES-256 (industry standard)
Implementation levels:
1. Full Disk Encryption (OS level)
# Linux: LUKS (Linux Unified Key Setup)
cryptsetup luksFormat /dev/sdb
cryptsetup luksOpen /dev/sdb encrypted_disk
mkfs.ext4 /dev/mapper/encrypted_disk
mount /dev/mapper/encrypted_disk /data
2. Database Encryption (Transparent Data Encryption - TDE)
-- PostgreSQL: pgcrypto extension
CREATE EXTENSION pgcrypto;
-- Encrypt column
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
email TEXT,
phone_encrypted BYTEA -- Encrypted column
);
-- Insert with encryption
INSERT INTO customers (customer_id, email, phone_encrypted)
VALUES (
1,
'customer@example.com',
pgp_sym_encrypt('0912345678', 'encryption_key')
);
-- Query with decryption
SELECT
customer_id,
email,
pgp_sym_decrypt(phone_encrypted, 'encryption_key') AS phone
FROM customers
WHERE customer_id = 1;
3. Application-Level Encryption
from cryptography.fernet import Fernet
# Generate key (store securely, not in code!)
key = Fernet.generate_key()
cipher = Fernet(key)
# Encrypt
plaintext = "Sensitive data"
ciphertext = cipher.encrypt(plaintext.encode())
# b'gAAAAABh...' (encrypted bytes)
# Store in database
db.execute(
"INSERT INTO secrets (data_encrypted) VALUES (%s)",
[ciphertext]
)
# Decrypt when needed
ciphertext = db.query("SELECT data_encrypted FROM secrets")[0]
plaintext = cipher.decrypt(ciphertext).decode()
Cloud Provider Encryption:
# Google Cloud Storage: Enable encryption (default với Google-managed keys)
gsutil mb -c STANDARD -l asia-southeast1 gs://my-bucket/
# Use customer-managed keys (CMEK)
gcloud kms keyrings create my-keyring --location=asia-southeast1
gcloud kms keys create my-key --keyring=my-keyring --location=asia-southeast1 --purpose=encryption
gsutil kms authorize -k projects/my-project/locations/asia-southeast1/keyRings/my-keyring/cryptoKeys/my-key
gsutil kms encryption gs://my-bucket/ projects/my-project/locations/asia-southeast1/keyRings/my-keyring/cryptoKeys/my-key
5.2. Encryption in Transit
What: Encrypt data moving across network
Why: Prevent man-in-the-middle attacks (eavesdropping)
Protocol: TLS/SSL (Transport Layer Security)
Implementation:
1. HTTPS for websites
# Nginx config
server {
listen 443 ssl http2;
server_name carptech.vn;
# SSL certificate (from Let's Encrypt)
ssl_certificate /etc/letsencrypt/live/carptech.vn/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/carptech.vn/privkey.pem;
# Modern TLS config
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers on;
# HSTS (force HTTPS)
add_header Strict-Transport-Security "max-age=31536000" always;
location / {
proxy_pass http://app_servers;
}
}
# Redirect HTTP to HTTPS
server {
listen 80;
server_name carptech.vn;
return 301 https://$server_name$request_uri;
}
2. Database connections
# PostgreSQL: Require SSL
connection = psycopg2.connect(
host="db.example.com",
database="prod",
user="app",
password="secret",
sslmode="require", # Require SSL connection
sslrootcert="/path/to/ca.crt" # Verify server certificate
)
3. API calls
import requests
# Bad: HTTP (unencrypted)
response = requests.get("http://api.example.com/data")
# Good: HTTPS (encrypted)
response = requests.get("https://api.example.com/data")
# Best: HTTPS + certificate pinning (prevent MITM)
response = requests.get(
"https://api.example.com/data",
verify="/path/to/expected-cert.pem" # Only trust specific certificate
)
5.3. Column-Level Encryption
Use case: Encrypt specific PII columns, not entire database
Why: Performance (only decrypt when needed) + compliance (PDPA)
Implementation (BigQuery):
-- Create encryption keys
-- (In real implementation, use Cloud KMS)
-- Create table with encrypted columns
CREATE TABLE customers (
customer_id INT64,
email STRING, -- Plaintext (searchable)
phone_encrypted BYTES, -- Encrypted
address_encrypted BYTES -- Encrypted
);
-- Insert with encryption (using AEAD functions)
INSERT INTO customers (customer_id, email, phone_encrypted, address_encrypted)
VALUES (
1,
'customer@example.com',
AEAD.ENCRYPT(
KEYS.KEYSET_CHAIN('gcp-kms://...'),
'0912345678',
''
),
AEAD.ENCRYPT(
KEYS.KEYSET_CHAIN('gcp-kms://...'),
'123 Nguyen Hue, HCMC',
''
)
);
-- Query: Email searchable, phone/address require decryption permission
SELECT
customer_id,
email,
AEAD.DECRYPT_STRING(
KEYS.KEYSET_CHAIN('gcp-kms://...'),
phone_encrypted,
''
) AS phone -- Only users với decrypt permission can see
FROM customers
WHERE email = 'customer@example.com';
Access control:
-- Grant decrypt permission only to authorized users
GRANT `cloudkms.cryptoKeyVersions.useToDecrypt`
ON crypto_key
TO "user:admin@company.com";
-- Marketing analysts: Can query but phone shows as encrypted bytes
-- Admins: Can decrypt and see plaintext
6. Defense Layer 4: Data Masking
6.1. Static Data Masking
Use case: Copy production data to dev/test, mask PII
Example:
# Production data
customers = [
{'id': 1, 'name': 'Nguyen Van A', 'email': 'nguyenvana@gmail.com', 'phone': '0912345678'},
{'id': 2, 'name': 'Tran Thi B', 'email': 'tranthib@yahoo.com', 'phone': '0987654321'}
]
# Masked data for dev environment
def mask_data(customers):
import hashlib
masked = []
for customer in customers:
masked.append({
'id': customer['id'], # Keep ID (for referential integrity)
'name': f"User {customer['id']}", # Generic name
'email': f"user{customer['id']}@test.example.com", # Fake email
'phone': f"09{hashlib.md5(str(customer['id']).encode()).hexdigest()[:8]}" # Fake phone
})
return masked
# Result
[
{'id': 1, 'name': 'User 1', 'email': 'user1@test.example.com', 'phone': '09c4ca4238'},
{'id': 2, 'name': 'User 2', 'email': 'user2@test.example.com', 'phone': '09c81e728d'}
]
SQL Example (PostgreSQL):
-- Create dev database from production (masked)
CREATE TABLE customers_dev AS
SELECT
customer_id,
'User ' || customer_id AS name, -- Masked name
'user' || customer_id || '@test.example.com' AS email, -- Masked email
'09' || substring(md5(customer_id::text), 1, 8) AS phone, -- Masked phone
created_at -- Keep timestamps
FROM customers_prod;
6.2. Dynamic Data Masking
Use case: Same database, different users see masked vs real data
Example (PostgreSQL Row-Level Security):
-- Enable row-level security
ALTER TABLE customers ENABLE ROW LEVEL SECURITY;
-- Policy: Analysts see masked data
CREATE POLICY mask_pii_for_analysts ON customers
FOR SELECT
TO analyst_role
USING (TRUE) -- See all rows
WITH CHECK (FALSE); -- Cannot modify
-- Create view với dynamic masking
CREATE VIEW customers_masked AS
SELECT
customer_id,
CASE
WHEN current_user IN (SELECT rolname FROM pg_roles WHERE rolname = 'analyst_role')
THEN 'User ' || customer_id
ELSE name
END AS name,
CASE
WHEN current_user IN (SELECT rolname FROM pg_roles WHERE rolname = 'analyst_role')
THEN '***@***.com'
ELSE email
END AS email,
order_total -- Aggregated data OK
FROM customers;
-- Analysts query the view
GRANT SELECT ON customers_masked TO analyst_role;
BigQuery Example:
-- Create authorized view với masking logic
CREATE VIEW `project.dataset.customers_masked` AS
SELECT
customer_id,
-- Mask email for non-admins
IF(
SESSION_USER() IN ('admin@company.com', 'dpo@company.com'),
email,
CONCAT('***@', SPLIT(email, '@')[OFFSET(1)])
) AS email,
-- Mask phone
IF(
SESSION_USER() IN ('admin@company.com'),
phone,
CONCAT('***', SUBSTR(phone, -4))
) AS phone,
order_total
FROM `project.dataset.customers`;
-- Grant access to masked view
GRANT `roles/bigquery.dataViewer` ON `project.dataset.customers_masked`
TO "group:analysts@company.com";
7. Defense Layer 5: Audit Logging
7.1. What to Log
Critical events:
- Authentication: Login/logout, failed attempts, MFA
- Access: Who accessed which data, when, from where
- Changes: INSERT, UPDATE, DELETE operations
- Admin actions: Permission changes, user creation/deletion
- Exports: Data downloads, bulk exports
- Errors: Failed queries, permission denials
Log format (JSON for parsing):
{
"timestamp": "2025-06-17T10:30:15Z",
"event_type": "data_access",
"user_id": "alice@company.com",
"resource": "customers_table",
"action": "SELECT",
"query": "SELECT * FROM customers WHERE city = 'Hanoi'",
"rows_returned": 1523,
"ip_address": "203.162.4.191",
"user_agent": "Mozilla/5.0...",
"status": "success"
}
7.2. Implementation
Database Audit Logs (PostgreSQL):
-- Enable pgaudit extension
CREATE EXTENSION pgaudit;
-- Configure audit logging
ALTER SYSTEM SET pgaudit.log = 'read, write, ddl, role';
ALTER SYSTEM SET pgaudit.log_catalog = off;
ALTER SYSTEM SET pgaudit.log_parameter = on;
-- Reload config
SELECT pg_reload_conf();
-- Audit logs in PostgreSQL logs
-- 2025-06-17 10:30:15 UTC [12345]: AUDIT: SESSION,2,1,READ,SELECT,TABLE,public.customers,"SELECT * FROM customers WHERE city = 'Hanoi'",<not logged>
Application Logs:
import logging
import json
# Configure structured logging
logging.basicConfig(level=logging.INFO, format='%(message)s')
def log_data_access(user_id, resource, action, details):
log_entry = {
"timestamp": datetime.utcnow().isoformat(),
"event_type": "data_access",
"user_id": user_id,
"resource": resource,
"action": action,
"details": details,
"ip_address": request.remote_addr,
"user_agent": request.headers.get('User-Agent')
}
logging.info(json.dumps(log_entry))
# Usage
@app.route('/api/customers/<int:customer_id>')
@login_required
def get_customer(customer_id):
customer = db.query("SELECT * FROM customers WHERE id = %s", [customer_id])
# Log access
log_data_access(
user_id=current_user.email,
resource=f"customer:{customer_id}",
action="READ",
details={"fields_accessed": ["name", "email", "phone"]}
)
return jsonify(customer)
7.3. Anomaly Detection
Use ML to detect suspicious patterns:
# Example: Detect unusual access patterns
def detect_anomalies():
# Baseline: User's typical access pattern
baseline = db.query("""
SELECT user_id, AVG(rows_accessed) as avg_rows, STDDEV(rows_accessed) as stddev_rows
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '30 days'
GROUP BY user_id
""")
# Recent access
recent = db.query("""
SELECT user_id, SUM(rows_accessed) as total_rows
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY user_id
""")
# Detect anomalies (> 3 standard deviations)
anomalies = []
for user in recent:
user_baseline = next((b for b in baseline if b['user_id'] == user['user_id']), None)
if user_baseline:
threshold = user_baseline['avg_rows'] + 3 * user_baseline['stddev_rows']
if user['total_rows'] > threshold:
anomalies.append({
'user_id': user['user_id'],
'rows_accessed': user['total_rows'],
'expected': user_baseline['avg_rows'],
'severity': 'HIGH' if user['total_rows'] > threshold * 2 else 'MEDIUM'
})
# Alert security team
if anomalies:
send_alert(
channel='#security-alerts',
message=f"⚠️ Unusual data access detected: {len(anomalies)} users",
details=anomalies
)
return anomalies
# Run every hour
schedule.every(1).hours.do(detect_anomalies)
7.4. Log Retention & Protection
Requirements:
- Retention: 1-2 years (compliance requirements)
- Immutability: Cannot be altered/deleted (prevent tampering)
- Access control: Only security team can view
Implementation:
# Store logs in write-once storage (Google Cloud Storage - Bucket Lock)
gsutil mb -c STANDARD -l asia-southeast1 gs://audit-logs-carptech/
# Enable versioning
gsutil versioning set on gs://audit-logs-carptech/
# Set retention policy (2 years)
gsutil retention set 730d gs://audit-logs-carptech/
# Lock retention policy (cannot be reduced)
gsutil retention lock gs://audit-logs-carptech/
# Upload logs
gsutil cp audit-log-2025-06-17.json gs://audit-logs-carptech/2025/06/17/
8. Defense Layer 6: Backup & Disaster Recovery
8.1. The 3-2-1 Rule
Rule:
- 3 copies of data (1 primary + 2 backups)
- 2 different storage types (disk + tape/cloud)
- 1 off-site (different location, protects against fire/flood)
Example architecture:
Production Database (Primary)
│
├── Daily Backup → Cloud Storage (same region)
│ └── Retention: 7 days
│
├── Weekly Backup → Cloud Storage (different region)
│ └── Retention: 4 weeks
│
└── Monthly Backup → Glacier/Archive Storage
└── Retention: 7 years (compliance)
8.2. Implementation
Automated Backups (PostgreSQL):
#!/bin/bash
# backup.sh - Run daily via cron
DATE=$(date +%Y%m%d)
BACKUP_DIR="/backups/daily"
BUCKET="gs://carptech-backups"
# Create backup
pg_dump -h localhost -U postgres -F c -b -v -f "$BACKUP_DIR/prod_$DATE.backup" production_db
# Encrypt backup
gpg --encrypt --recipient backup@carptech.vn "$BACKUP_DIR/prod_$DATE.backup"
# Upload to cloud
gsutil cp "$BACKUP_DIR/prod_$DATE.backup.gpg" "$BUCKET/daily/$DATE/"
# Verify upload
if gsutil ls "$BUCKET/daily/$DATE/prod_$DATE.backup.gpg"; then
echo "✅ Backup successful: $DATE"
# Delete local backup (keep cloud only)
rm "$BACKUP_DIR/prod_$DATE.backup" "$BACKUP_DIR/prod_$DATE.backup.gpg"
else
echo "❌ Backup failed: $DATE"
# Alert ops team
send_alert "Backup failed for $DATE"
fi
# Cleanup old backups (keep last 7 days locally)
find "$BACKUP_DIR" -name "*.backup*" -mtime +7 -delete
Cron schedule:
# Daily backup at 2 AM
0 2 * * * /scripts/backup.sh
# Weekly full backup (Sundays at 3 AM)
0 3 * * 0 /scripts/backup-weekly.sh
# Test restore monthly (1st of month at 4 AM)
0 4 1 * * /scripts/test-restore.sh
8.3. Test Restores (Critical!)
Problem: 40% of companies discover backups are corrupt when they try to restore (Acronis study)
Solution: Test restores monthly
#!/bin/bash
# test-restore.sh
# Get latest backup
LATEST=$(gsutil ls gs://carptech-backups/daily/ | tail -1)
# Download
gsutil cp "$LATEST" /tmp/test-restore.backup.gpg
# Decrypt
gpg --decrypt /tmp/test-restore.backup.gpg > /tmp/test-restore.backup
# Restore to test database
createdb test_restore_$(date +%Y%m%d)
pg_restore -d test_restore_$(date +%Y%m%d) /tmp/test-restore.backup
# Verify: Count tables, rows
TABLES=$(psql -d test_restore_$(date +%Y%m%d) -t -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='public';")
CUSTOMERS=$(psql -d test_restore_$(date +%Y%m%d) -t -c "SELECT COUNT(*) FROM customers;")
# Expected values (update these)
EXPECTED_TABLES=25
EXPECTED_CUSTOMERS_MIN=50000
if [ "$TABLES" -eq "$EXPECTED_TABLES" ] && [ "$CUSTOMERS" -ge "$EXPECTED_CUSTOMERS_MIN" ]; then
echo "✅ Restore test PASSED"
echo "Tables: $TABLES, Customers: $CUSTOMERS"
else
echo "❌ Restore test FAILED"
echo "Expected $EXPECTED_TABLES tables, got $TABLES"
echo "Expected >= $EXPECTED_CUSTOMERS_MIN customers, got $CUSTOMERS"
# Alert ops
send_alert "⚠️ Backup restore test FAILED"
fi
# Cleanup
dropdb test_restore_$(date +%Y%m%d)
rm /tmp/test-restore.backup*
8.4. Disaster Recovery Plan
Recovery Time Objective (RTO): Maximum acceptable downtime Recovery Point Objective (RPO): Maximum acceptable data loss
Example tiers:
| Tier | RTO | RPO | Strategy | Cost |
|---|---|---|---|---|
| Critical (Payment system) | < 1 hour | < 15 minutes | Hot standby, real-time replication | High |
| Important (Customer DB) | < 4 hours | < 1 hour | Warm standby, hourly backups | Medium |
| Normal (Analytics) | < 24 hours | < 1 day | Cold backups, daily | Low |
Implementation (Critical tier - Hot Standby):
Primary Region (asia-southeast1)
│
│ Synchronous Replication
│
▼
Standby Region (asia-east1)
│
│ Automatic Failover (< 1 minute)
# GCP: Cloud SQL High Availability
gcloud sql instances create prod-db \
--tier=db-n1-highmem-4 \
--region=asia-southeast1 \
--availability-type=REGIONAL \ # Automatic failover
--backup-start-time=02:00 \
--enable-bin-log \
--retained-backups-count=7
# Failover test
gcloud sql operations list --instance=prod-db --filter="operationType=FAILOVER"
(Due to length, continuing in next section...)
9. Cloud Security: Shared Responsibility Model
9.1. Who's Responsible for What?
┌─────────────────────────────────────────────────────┐
│ YOUR RESPONSIBILITY │
│ - Data classification & encryption │
│ - Access control (IAM, RBAC) │
│ - Application security (code vulnerabilities) │
│ - User management │
│ - Network configuration (firewalls, VPCs) │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ CLOUD PROVIDER RESPONSIBILITY │
│ - Physical security (data centers) │
│ - Hardware maintenance │
│ - Network infrastructure │
│ - Hypervisor security │
│ - Global compliance certifications │
└─────────────────────────────────────────────────────┘
Key takeaway: Cloud provider secures infrastructure, you secure data & access
10. Case Study: Vietnamese Fintech - Preventing Credential Stuffing
Context
Company: Lending platform
- 500K users
- $50M loans issued/month
Attack (March 2025):
- Hackers obtained 2M leaked credentials from dark web
- Launched credential stuffing attack: 500K login attempts in 24 hours
- Goal: Access accounts, transfer money
Defense (What Saved Them)
1. Rate Limiting
# Flask-Limiter
from flask_limiter import Limiter
limiter = Limiter(app, key_func=lambda: request.remote_addr)
@app.route('/login', methods=['POST'])
@limiter.limit("5 per minute") # Max 5 attempts per minute per IP
def login():
# ...
Result: Blocked 480K requests (96%) immediately
2. MFA (Multi-Factor Authentication)
- 85% of users had MFA enabled
- Even with correct password, hackers couldn't bypass MFA
Result: 99.8% of remaining attempts blocked
3. Anomaly Detection
# Detect unusual login patterns
if (
user.last_login_ip != current_ip and
geoip_distance(user.last_login_ip, current_ip) > 1000km and
time_since_last_login < 1 hour
):
# Suspicious: User in Hanoi 30 min ago, now logging from Singapore?
require_additional_verification()
alert_security_team()
Result: Alerted security team within 5 minutes of attack start
4. Account Lockout
After 5 failed attempts:
- Temporary lock (15 minutes)
- Email notification to user
- CAPTCHA required
Result: Prevented brute force
Outcome
- 0 accounts compromised
- $0 financial loss
- Attack detected and mitigated in < 30 minutes
- User trust maintained (proactive communication)
Cost of security measures: ~$30K/year
- MFA system: $10K
- Rate limiting infrastructure: $5K
- Anomaly detection (custom): $10K
- Monitoring tools: $5K
ROI: Prevented potential $5M+ loss (if accounts compromised)
CTO Quote:
"Security investment saved our company. Without MFA, we'd have lost millions and customer trust. It's not optional - it's survival."
Kết Luận
Data Security is not a checkbox - it's continuous vigilance.
Key Takeaways:
- Defense-in-Depth: Multiple layers, không rely on single control
- Encryption is non-negotiable: At rest + in transit
- Access control is critical: Least privilege + RBAC + MFA
- Audit everything: Log all access, detect anomalies
- Backups are insurance: Test restores monthly
- Compliance follows security: ISO 27001, SOC 2 validate your practices
- Cost of security << Cost of breach: $30K investment prevents $2M+ loss
Security Checklist (40 items) - Available in next section
Next Steps:
- ✅ Assess current security posture (use checklist below)
- ✅ Đọc Data Governance for foundation
- ✅ Đọc PDPA Compliance for legal requirements
- ✅ Schedule security audit với team
- ✅ Implement quick wins: MFA, encryption, audit logs
Need help? Carptech provides security assessments and implementation services. Book consultation to secure your data platform.
Related Posts:
- Data Governance 101: Framework cho Doanh Nghiệp
- PDPA Compliance: Bảo Vệ Dữ Liệu Cá Nhân
- Data Platform cho Fintech: Compliance & Real-time
- Coming: Data Catalog, Data Lineage (tháng 6)




