Quay lại Blog
Machine LearningCập nhật: 27 tháng 5, 202522 phút đọc

MLOps: Production ML tại quy mô Doanh Nghiệp

Hướng dẫn toàn diện về MLOps - từ experiment tracking, feature stores, đến model deployment và monitoring. Khám phá cách triển khai ML models ở quy mô production với CI/CD, automation, và best practices.

Lê Hoàng Anh

Lê Hoàng Anh

Senior MLOps Engineer

MLOps workflow visualization showing experiment tracking, feature store, training pipeline, model registry, deployment, and monitoring components in an automated production environment
#MLOps#Machine Learning#DevOps#Production ML#Data Engineering#Model Deployment#Feature Store#ML Monitoring

TL;DR

MLOps (Machine Learning Operations) là discipline kết hợp ML, DevOps, và Data Engineering để đưa ML models lên production một cách reliable và scalable. Research của VentureBeat cho thấy 85% ML models không bao giờ đến được production - MLOps chính là giải pháp cho vấn đề này.

Core components của MLOps:

  • Experiment Tracking: MLflow, Weights & Biases để track experiments
  • Feature Store: Feast, Tecton để reuse features across models
  • Model Registry: Versioning, staging, metadata management
  • Training Pipeline: Automated, scheduled retraining
  • Deployment: REST APIs, batch predictions, real-time serving
  • Monitoring: Model performance, data drift, concept drift detection
  • CI/CD: Automated testing, deployment, rollback

Case study Vietnamese fintech: Triển khai MLOps cho fraud detection model:

  • Before: 2 tuần để deploy model update, manual retraining hàng tháng
  • After: Deploy hourly updates, auto-retrain daily based on new fraud patterns
  • Result: Phát hiện fraud nhanh hơn 70%, adapt real-time với attack patterns mới

Bài này sẽ guide bạn qua 4 maturity levels của MLOps và cách build MVP MLOps system với open-source tools.


1. Tại sao cần MLOps? The Production Gap

1.1. Jupyter Notebook → Production: The Valley of Death

Scenario quen thuộc tại các doanh nghiệp VN:

Week 1-4: Data Scientist build model trong Jupyter notebook

  • Accuracy: 92% trên test set
  • Leadership hào hứng: "Deploy luôn!"

Week 5-8: Engineering team cố gắng productionize

  • Code trong notebook không run được trên server
  • Hardcoded paths: /Users/datascientist/Downloads/data.csv
  • Library conflicts: notebook dùng pandas 1.5, server có 1.3
  • Model file 2GB, không biết deploy như thế nào

Week 9-12: Sau nhiều debugging

  • Cuối cùng deploy được... nhưng accuracy drop xuống 75%
  • Vì training data đã cũ 3 tháng
  • Không có monitoring → không biết model đang perform thế nào

Week 13+: Model bị "bỏ quên"

  • Không có retraining schedule
  • Performance degradation không được phát hiện
  • 6 tháng sau, model predictions hoàn toàn sai

85% ML projects kết thúc ở đây (VentureBeat, 2019).

1.2. MLOps khác gì DevOps?

MLOps = DevOps + Data + Models

AspectTraditional DevOpsMLOps
CodeVersion control (Git)✅ Same
DataN/A✅ Data versioning (DVC)
ModelsN/A✅ Model versioning
TestingUnit tests, integration tests✅ Same + data validation + model tests
DeploymentBlue-green, canary✅ Same + A/B testing models
MonitoringServer metrics, logs✅ Same + model performance + drift
Dependenciesrequirements.txt, Docker✅ Same + data pipelines

Key difference: ML systems have three moving parts (code, data, model) thay vì chỉ code.

1.3. Business Impact của MLOps

1. Time to Production

  • Without MLOps: 3-6 tháng để deploy 1 model
  • With MLOps: 1-2 tuần

2. Model Performance

  • Without monitoring: Model degradation 10-30% per year
  • With MLOps: Detect và retrain kịp thời

3. Cost Efficiency

  • Manual operations: Data Scientist spend 60% time on deployment
  • Automated MLOps: Focus 80% on model improvement

4. Scalability

  • Manual: Maximum 5-10 models in production
  • MLOps: 50-500+ models

Case study - Vietnamese E-commerce (500M GMV/month):

  • Deployed 12 ML models với MLOps:
    • Product recommendations (3 models)
    • Churn prediction
    • Demand forecasting (per category)
    • Fraud detection
    • Price optimization
    • Customer segmentation
  • Before MLOps: 1 model in production (recommendations), updated quarterly
  • After MLOps: 12 models, automated retraining weekly, hourly deployments
  • Impact: 25% increase in conversion rate, $2M annual revenue increase

2. MLOps Maturity Levels: Roadmap của bạn

Google định nghĩa 4 maturity levels cho MLOps:

Level 0: Manual Process (90% doanh nghiệp VN đang ở đây)

Characteristics:

  • Data Scientists work in notebooks
  • Manual data collection và preprocessing
  • Train model locally, save pickle file
  • Deploy = copy file to server, write Flask API manually
  • No monitoring, manual retraining khi nhớ ra

Tools: Jupyter, pandas, scikit-learn, Flask

Pros: Fast for POC và experiments Cons: Not reproducible, not scalable, high failure rate

When acceptable: Research, one-off analysis, POCs

Level 1: ML Pipeline Automation

Characteristics:

  • Automated training pipeline: scheduled retraining
  • Data validation (check schema, distributions)
  • Feature engineering pipeline
  • Model versioning
  • Automated deployment of new model versions

Tools:

  • Pipeline orchestration: Apache Airflow, Prefect, Kubeflow
  • Model registry: MLflow, custom

Improvement: Reproducible training, scheduled updates

Example pipeline (Apache Airflow):

# airflow_ml_pipeline.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'data-science',
    'retries': 2,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    'churn_prediction_training',
    default_args=default_args,
    schedule_interval='0 2 * * 0',  # Weekly at 2 AM Sunday
    start_date=datetime(2025, 5, 1),
    catchup=False
)

def extract_data():
    """Extract customer data from BigQuery"""
    from google.cloud import bigquery

    client = bigquery.Client()
    query = """
        SELECT
            customer_id,
            -- RFM features
            DATE_DIFF(CURRENT_DATE(), MAX(order_date), DAY) as recency,
            COUNT(DISTINCT order_id) as frequency,
            SUM(order_total) as monetary,
            -- Engagement features
            COUNT(DISTINCT login_date) as login_days,
            AVG(session_duration) as avg_session,
            -- Target
            CASE
                WHEN MAX(order_date) < DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
                THEN 1 ELSE 0
            END as churned
        FROM `project.dataset.customer_events`
        WHERE order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR)
        GROUP BY customer_id
    """

    df = client.query(query).to_dataframe()
    df.to_parquet('/data/raw/churn_data.parquet')
    print(f"Extracted {len(df)} customers")

def validate_data():
    """Validate data quality"""
    import pandas as pd
    import great_expectations as ge

    df = pd.read_parquet('/data/raw/churn_data.parquet')

    # Convert to Great Expectations dataset
    ge_df = ge.from_pandas(df)

    # Define expectations
    ge_df.expect_column_values_to_not_be_null('customer_id')
    ge_df.expect_column_values_to_be_between('recency', 0, 730)
    ge_df.expect_column_values_to_be_between('frequency', 1, 1000)
    ge_df.expect_column_mean_to_be_between('churned', 0.05, 0.30)

    # Validate
    results = ge_df.validate()

    if not results.success:
        raise ValueError(f"Data validation failed: {results}")

    print("✅ Data validation passed")

def train_model():
    """Train churn prediction model"""
    import pandas as pd
    import mlflow
    import mlflow.sklearn
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import roc_auc_score, precision_score, recall_score

    # Load data
    df = pd.read_parquet('/data/raw/churn_data.parquet')

    # Split features and target
    feature_cols = ['recency', 'frequency', 'monetary', 'login_days', 'avg_session']
    X = df[feature_cols]
    y = df['churned']

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )

    # Start MLflow run
    mlflow.set_experiment('churn-prediction')

    with mlflow.start_run():
        # Train model
        model = RandomForestClassifier(
            n_estimators=100,
            max_depth=10,
            min_samples_split=100,
            random_state=42
        )
        model.fit(X_train, y_train)

        # Evaluate
        y_pred = model.predict(X_test)
        y_pred_proba = model.predict_proba(X_test)[:, 1]

        auc = roc_auc_score(y_test, y_pred_proba)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)

        # Log metrics
        mlflow.log_metric('auc', auc)
        mlflow.log_metric('precision', precision)
        mlflow.log_metric('recall', recall)

        # Log parameters
        mlflow.log_param('n_estimators', 100)
        mlflow.log_param('max_depth', 10)

        # Log model
        mlflow.sklearn.log_model(model, 'model')

        print(f"✅ Model trained - AUC: {auc:.3f}, Precision: {precision:.3f}, Recall: {recall:.3f}")

        # Save model version
        run_id = mlflow.active_run().info.run_id
        with open('/data/models/latest_run_id.txt', 'w') as f:
            f.write(run_id)

def deploy_model():
    """Deploy model to production"""
    import mlflow

    # Load latest run ID
    with open('/data/models/latest_run_id.txt', 'r') as f:
        run_id = f.read().strip()

    # Load model
    model_uri = f'runs:/{run_id}/model'

    # Register model (promotes to registry)
    mlflow.register_model(model_uri, 'churn-prediction')

    # Transition to Production
    from mlflow.tracking import MlflowClient
    client = MlflowClient()

    # Get latest version
    versions = client.search_model_versions(f"name='churn-prediction'")
    latest_version = max([int(v.version) for v in versions])

    # Promote to Production
    client.transition_model_version_stage(
        name='churn-prediction',
        version=latest_version,
        stage='Production'
    )

    print(f"✅ Model version {latest_version} deployed to Production")

# Define tasks
task_extract = PythonOperator(
    task_id='extract_data',
    python_callable=extract_data,
    dag=dag
)

task_validate = PythonOperator(
    task_id='validate_data',
    python_callable=validate_data,
    dag=dag
)

task_train = PythonOperator(
    task_id='train_model',
    python_callable=train_model,
    dag=dag
)

task_deploy = PythonOperator(
    task_id='deploy_model',
    python_callable=deploy_model,
    dag=dag
)

# Define dependencies
task_extract >> task_validate >> task_train >> task_deploy

Kết quả: Automated weekly retraining, reproducible pipeline.

Level 2: CI/CD Pipeline Automation

Characteristics:

  • Continuous Integration: Automated testing for code, data, models
  • Continuous Delivery: Automated deployment với approval gates
  • Feature store: Centralized, reusable features
  • Model versioning và staging (Dev → Staging → Production)
  • Automated rollback nếu model performance drop

Tools:

  • CI/CD: GitHub Actions, GitLab CI, Jenkins
  • Feature store: Feast, Tecton
  • Model registry: MLflow, Vertex AI Model Registry

Example CI/CD (GitHub Actions):

# .github/workflows/ml-pipeline.yml
name: ML Training Pipeline

on:
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * 0'  # Weekly at 2 AM Sunday
  workflow_dispatch:  # Manual trigger

jobs:
  data-validation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Validate data quality
        run: |
          python scripts/validate_data.py
        env:
          GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GCP_SA_KEY }}

  model-training:
    needs: data-validation
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Train model
        run: |
          python scripts/train_model.py
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
          GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GCP_SA_KEY }}

      - name: Evaluate model
        id: evaluate
        run: |
          python scripts/evaluate_model.py
          # Returns: {"auc": 0.85, "precision": 0.78}

      - name: Check performance threshold
        run: |
          python -c "
          import json
          import sys
          metrics = json.loads('${{ steps.evaluate.outputs.metrics }}')
          if metrics['auc'] < 0.75:
              print('❌ Model AUC below threshold')
              sys.exit(1)
          print('✅ Model meets performance threshold')
          "

  model-deployment:
    needs: model-training
    runs-on: ubuntu-latest
    environment: production  # Requires approval
    steps:
      - name: Deploy to Vertex AI
        run: |
          python scripts/deploy_vertex_ai.py
        env:
          GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GCP_SA_KEY }}

      - name: Run smoke tests
        run: |
          python scripts/test_deployed_model.py

  monitoring:
    needs: model-deployment
    runs-on: ubuntu-latest
    steps:
      - name: Setup monitoring
        run: |
          python scripts/setup_monitoring.py

Level 3: Automated MLOps (The Goal)

Characteristics:

  • Full automation: Training, deployment, monitoring, retraining
  • Monitoring-driven retraining: Trigger retraining khi detect performance degradation
  • Data drift detection: Automatic alerts
  • Concept drift detection: Model performance monitoring
  • Auto-rollback: Revert to previous version if new model underperforms
  • Multi-model management: Serve 10s-100s models

Tools:

  • Integrated platforms: Databricks ML, AWS SageMaker, GCP Vertex AI
  • Monitoring: Evidently AI, WhyLabs, Arize AI

Architecture diagram:

┌─────────────────┐
│  Data Sources   │
│  (BigQuery, S3) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│ Feature Store   │◄─────┤ Feature Eng  │
│ (Feast, Tecton) │      │  Pipeline    │
└────────┬────────┘      └──────────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│ Training        │◄─────┤ Trigger:     │
│ Pipeline        │      │ - Scheduled  │
│ (Kubeflow)      │      │ - Drift      │
└────────┬────────┘      │ - Manual     │
         │               └──────────────┘
         ▼
┌─────────────────┐
│ Model Registry  │
│ (MLflow)        │
│ - Dev           │
│ - Staging       │
│ - Production    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│ Model Serving   │─────►│ Predictions  │
│ (Vertex AI)     │      │ API          │
└────────┬────────┘      └──────────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│ Monitoring      │─────►│ Alerts &     │
│ - Performance   │      │ Retraining   │
│ - Data drift    │      │ Triggers     │
│ - Concept drift │      └──────────────┘
└─────────────────┘

3. Core Components của MLOps Stack

3.1. Experiment Tracking (MLflow)

Problem: Data Scientists chạy 100+ experiments → "experiment 47 có accuracy cao nhất, nhưng không nhớ hyperparameters là gì"

Solution: MLflow Tracking

Setup MLflow:

# Install
pip install mlflow

# Start MLflow server
mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlflow-artifacts \
    --host 0.0.0.0 \
    --port 5000

Example usage:

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Set experiment
mlflow.set_experiment('churn-prediction-tuning')

# Hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 20],
    'min_samples_split': [50, 100, 200]
}

# Grid search
for n_est in param_grid['n_estimators']:
    for depth in param_grid['max_depth']:
        for min_samples in param_grid['min_samples_split']:

            with mlflow.start_run():
                # Log parameters
                mlflow.log_param('n_estimators', n_est)
                mlflow.log_param('max_depth', depth)
                mlflow.log_param('min_samples_split', min_samples)

                # Train model
                model = RandomForestClassifier(
                    n_estimators=n_est,
                    max_depth=depth,
                    min_samples_split=min_samples,
                    random_state=42
                )
                model.fit(X_train, y_train)

                # Evaluate
                y_pred_proba = model.predict_proba(X_test)[:, 1]
                auc = roc_auc_score(y_test, y_pred_proba)

                # Log metrics
                mlflow.log_metric('auc', auc)

                # Log model
                mlflow.sklearn.log_model(model, 'model')

                print(f"n_est={n_est}, depth={depth}, min_samples={min_samples} → AUC={auc:.3f}")

# Find best run
best_run = mlflow.search_runs(
    experiment_ids=['1'],
    order_by=['metrics.auc DESC'],
    max_results=1
)

print(f"Best run: {best_run[['params.n_estimators', 'params.max_depth', 'metrics.auc']]}")

MLflow UI: http://localhost:5000

3.2. Feature Store (Feast)

Problem:

  • 10 models cùng dùng "customer lifetime value" feature → compute 10 lần
  • Training dùng features tính 1 cách, production tính khác → training-serving skew
  • Features không reusable across teams

Solution: Centralized Feature Store

Feast example:

# feature_repo/features.py
from feast import Entity, Feature, FeatureView, FileSource, ValueType
from datetime import timedelta

# Define entity
customer = Entity(
    name='customer_id',
    value_type=ValueType.INT64,
    description='Customer ID'
)

# Define data source
customer_features_source = FileSource(
    path='/data/customer_features.parquet',
    event_timestamp_column='event_timestamp'
)

# Define feature view
customer_features = FeatureView(
    name='customer_features',
    entities=['customer_id'],
    ttl=timedelta(days=7),
    features=[
        Feature(name='recency', dtype=ValueType.INT64),
        Feature(name='frequency', dtype=ValueType.INT64),
        Feature(name='monetary', dtype=ValueType.FLOAT),
        Feature(name='clv', dtype=ValueType.FLOAT),
        Feature(name='churn_score', dtype=ValueType.FLOAT)
    ],
    source=customer_features_source
)

Feature retrieval (training):

from feast import FeatureStore
import pandas as pd

store = FeatureStore(repo_path='feature_repo/')

# Entity dataframe
entity_df = pd.DataFrame({
    'customer_id': [1001, 1002, 1003],
    'event_timestamp': pd.to_datetime(['2025-05-27'] * 3)
})

# Get historical features (point-in-time correct)
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        'customer_features:recency',
        'customer_features:frequency',
        'customer_features:monetary',
        'customer_features:clv'
    ]
).to_df()

print(training_df)

Feature retrieval (production):

# Get online features (low latency)
online_features = store.get_online_features(
    features=[
        'customer_features:recency',
        'customer_features:frequency',
        'customer_features:monetary',
        'customer_features:clv'
    ],
    entity_rows=[{'customer_id': 1001}]
).to_dict()

print(online_features)
# Output: {'customer_id': [1001], 'recency': [15], 'frequency': [12], ...}

Benefits:

  • Reusability: 1 lần compute, nhiều models dùng
  • Consistency: Training và serving dùng same features
  • Point-in-time correctness: Avoid data leakage

3.3. Model Monitoring & Drift Detection

Hai loại drift:

1. Data Drift: Input data distribution thay đổi

  • Example: COVID-19 → customer behavior thay đổi hoàn toàn

2. Concept Drift: Relationship giữa X và y thay đổi

  • Example: Fraud patterns evolve → old fraud detection model không còn hiệu quả

Detect drift với Evidently AI:

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, DataQualityPreset
import pandas as pd

# Reference data (training data)
reference_data = pd.read_parquet('/data/training/churn_features.parquet')

# Current data (production data)
current_data = pd.read_parquet('/data/production/latest_week.parquet')

# Create drift report
report = Report(metrics=[
    DataDriftPreset(),
    DataQualityPreset()
])

report.run(reference_data=reference_data, current_data=current_data)

# Save report
report.save_html('/reports/drift_report.html')

# Get drift score
drift_metrics = report.as_dict()
drift_detected = drift_metrics['metrics'][0]['result']['dataset_drift']

if drift_detected:
    print("⚠️ Data drift detected! Consider retraining model")
else:
    print("✅ No significant drift detected")

Production monitoring dashboard:

# monitoring/dashboard.py
import streamlit as st
import pandas as pd
import mlflow
from evidently.dashboard import Dashboard
from evidently.dashboard.tabs import DataDriftTab, ClassificationPerformanceTab

st.title("ML Model Monitoring Dashboard")

# Load production predictions
predictions_df = pd.read_gbq("SELECT * FROM `project.dataset.predictions` WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)")

# Model performance over time
st.header("Model Performance Over Time")

daily_metrics = predictions_df.groupby('date').apply(lambda x: {
    'precision': precision_score(x['actual'], x['predicted']),
    'recall': recall_score(x['actual'], x['predicted']),
    'auc': roc_auc_score(x['actual'], x['predicted_proba'])
}).apply(pd.Series)

st.line_chart(daily_metrics)

# Alert threshold
auc_threshold = 0.75
if daily_metrics['auc'].tail(7).mean() < auc_threshold:
    st.error(f"⚠️ ALERT: 7-day average AUC ({daily_metrics['auc'].tail(7).mean():.3f}) below threshold ({auc_threshold})")
    st.info("🔄 Triggering automated retraining pipeline...")

3.4. Model Deployment Options

Option 1: REST API (Flask/FastAPI)

# serve.py
from fastapi import FastAPI
import mlflow.pyfunc
import pandas as pd

app = FastAPI()

# Load model from MLflow
model = mlflow.pyfunc.load_model('models:/churn-prediction/Production')

@app.post('/predict')
def predict(customer_id: int):
    # Fetch features from Feature Store
    from feast import FeatureStore
    store = FeatureStore(repo_path='feature_repo/')

    features = store.get_online_features(
        features=[
            'customer_features:recency',
            'customer_features:frequency',
            'customer_features:monetary'
        ],
        entity_rows=[{'customer_id': customer_id}]
    ).to_dict()

    # Convert to DataFrame
    feature_df = pd.DataFrame([features])

    # Predict
    churn_probability = model.predict(feature_df)[0]

    return {
        'customer_id': customer_id,
        'churn_probability': float(churn_probability),
        'churn_risk': 'HIGH' if churn_probability > 0.7 else 'MEDIUM' if churn_probability > 0.4 else 'LOW'
    }

# Run: uvicorn serve:app --host 0.0.0.0 --port 8000

Option 2: Batch Predictions (Cloud Storage)

# batch_predict.py
import mlflow.pyfunc
import pandas as pd
from google.cloud import bigquery

# Load model
model = mlflow.pyfunc.load_model('models:/churn-prediction/Production')

# Load customers to score
client = bigquery.Client()
query = """
    SELECT customer_id, recency, frequency, monetary
    FROM `project.dataset.customer_features`
    WHERE last_order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 180 DAY)
"""
customers_df = client.query(query).to_dataframe()

# Batch predict
customers_df['churn_probability'] = model.predict(customers_df[['recency', 'frequency', 'monetary']])

# Save results
customers_df[['customer_id', 'churn_probability']].to_gbq(
    'project.dataset.churn_predictions',
    if_exists='replace'
)

print(f"✅ Scored {len(customers_df)} customers")

Option 3: Real-time Serving (Vertex AI)

# deploy_vertex_ai.py
from google.cloud import aiplatform

aiplatform.init(project='my-project', location='us-central1')

# Upload model to Vertex AI Model Registry
model = aiplatform.Model.upload(
    display_name='churn-prediction',
    artifact_uri='gs://my-bucket/models/churn/v2',
    serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest'
)

# Deploy to endpoint
endpoint = model.deploy(
    machine_type='n1-standard-4',
    min_replica_count=1,
    max_replica_count=10,  # Auto-scaling
    traffic_percentage=100
)

print(f"✅ Model deployed to: {endpoint.resource_name}")

# Prediction
prediction = endpoint.predict(instances=[{
    'recency': 45,
    'frequency': 8,
    'monetary': 2500
}])

print(f"Churn probability: {prediction.predictions[0]}")

4. Case Study: Vietnamese Fintech - Fraud Detection MLOps

4.1. Context

Company: Vietnamese digital lending platform (3M customers, 500K loans/month)

Challenge:

  • Fraud patterns evolve rapidly (new attack methods weekly)
  • Manual model updates took 2 weeks from training to deployment
  • Fraud detection model accuracy degraded 15% over 3 months → undetected
  • Data Scientists spend 60% time on deployment instead of improving models

Goal: Build MLOps system để deploy fraud detection updates hourly và auto-retrain daily

4.2. Architecture

Before MLOps:

Data Scientist → Jupyter Notebook → pickle file
→ Email to Engineering → Manual deployment (2 weeks)
→ No monitoring

After MLOps:

BigQuery (transaction data)
    ↓
Feast Feature Store
    ↓
Kubeflow Pipeline (daily training)
    ↓
MLflow Model Registry
    ↓
Vertex AI Endpoint (auto-deploy)
    ↓
Evidently Monitoring → Alerts → Auto-retrain

4.3. Implementation

Step 1: Feature Store

Centralize 50+ fraud detection features:

# Features
- transaction_velocity_1h: Số transactions trong 1h
- amount_deviation_30d: So với average 30 ngày
- device_fingerprint_new: Device mới lạ?
- ip_country_mismatch: IP khác registered country
- merchant_risk_score: Historical fraud rate của merchant
- user_behavior_anomaly_score: ML-based anomaly score
...

Step 2: Automated Daily Training

Kubeflow pipeline chạy mỗi ngày 3 AM:

@dsl.pipeline(name='fraud-detection-training')
def fraud_pipeline():
    # Extract yesterday's transactions + labels
    extract_op = extract_labeled_data(
        query="""
            SELECT * FROM transactions
            WHERE date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
            AND fraud_label IS NOT NULL
        """
    )

    # Train model
    train_op = train_xgboost_model(
        data=extract_op.output,
        params={'max_depth': 6, 'learning_rate': 0.1}
    )

    # Evaluate on holdout set
    eval_op = evaluate_model(train_op.output)

    # Deploy if AUC > 0.90 (threshold)
    with dsl.Condition(eval_op.outputs['auc'] > 0.90):
        deploy_op = deploy_to_vertex(train_op.output)

Step 3: Real-time Serving

API endpoint phản hồi < 100ms:

@app.post('/score-transaction')
def score_transaction(transaction: Transaction):
    # Get features from Feast
    features = feature_store.get_online_features(
        entity_rows=[{'transaction_id': transaction.id}],
        features=[
            'fraud_features:transaction_velocity_1h',
            'fraud_features:amount_deviation_30d',
            ...
        ]
    ).to_dict()

    # Predict
    fraud_score = model.predict_proba(features)[0][1]

    # Decision rules
    if fraud_score > 0.95:
        return {'decision': 'BLOCK', 'score': fraud_score}
    elif fraud_score > 0.75:
        return {'decision': 'REVIEW', 'score': fraud_score}
    else:
        return {'decision': 'APPROVE', 'score': fraud_score}

Step 4: Monitoring & Auto-Retraining

Monitor performance hourly:

# Check performance every hour
if current_hour_precision < 0.85:
    # Trigger immediate retraining
    trigger_kubeflow_pipeline('fraud-detection-training')
    send_slack_alert("🚨 Fraud model performance dropped. Retraining triggered.")

4.4. Results

Deployment Speed:

  • Before: 2 weeks per update
  • After: Hourly deployments (automated)

Model Freshness:

  • Before: Quarterly updates
  • After: Daily retraining với data từ ngày hôm trước

Detection Performance:

  • Before: 78% precision, 65% recall (degrading)
  • After: 92% precision, 88% recall (consistently)

Fraud Adaptation:

  • Before: New fraud patterns detected sau 2-4 tuần
  • After: Detected trong 24 giờ (daily retraining catches new patterns)

Cost Savings:

  • Prevented fraud: $8M/year (additional $3M từ faster detection)
  • Reduced false positives: 40% fewer legitimate transactions blocked → better UX

Data Science Productivity:

  • Before: 60% time on deployment
  • After: 90% time on model improvement → shipped 3 new fraud models in 6 tháng

5. Tools Landscape: Build vs Buy

5.1. All-in-One Platforms

Google Cloud Vertex AI

  • ✅ Fully managed: Training, serving, monitoring
  • ✅ AutoML for non-experts
  • ✅ Feature Store built-in
  • ✅ Tight integration với BigQuery, GCS
  • ❌ Vendor lock-in
  • ❌ Higher cost ($$$)
  • Best for: GCP-native companies, enterprises cần support

AWS SageMaker

  • ✅ Comprehensive MLOps suite
  • ✅ SageMaker Pipelines for orchestration
  • ✅ Model Monitor for drift detection
  • ❌ Complex setup
  • ❌ AWS-only
  • Best for: AWS-heavy companies

Databricks ML

  • ✅ End-to-end: data prep → training → serving
  • ✅ Unity Catalog for governance
  • ✅ Great for Spark workloads
  • ❌ Expensive
  • Best for: Big data + ML workloads

5.2. Best-of-Breed (Open Source)

Recommended MVP Stack:

ComponentToolWhy
Experiment TrackingMLflowIndustry standard, easy setup
Pipeline OrchestrationApache AirflowFlexible, Python-based, proven
Feature StoreFeastOpen-source, cloud-agnostic
Model ServingFastAPI + DockerLightweight, full control
MonitoringEvidently AIFree tier, drift detection
InfrastructureKubernetesScalable, portable

Setup cost: $0 (open-source) + infrastructure cost Time to MVP: 2-4 tuần Best for: Startups, cost-conscious teams, need flexibility

5.3. Decision Framework

Use All-in-One Platform nếu:

  • ✅ Budget > $50K/year cho MLOps tools
  • ✅ Need enterprise support
  • ✅ Already committed to cloud provider (GCP/AWS)
  • ✅ Prefer less maintenance

Use Best-of-Breed nếu:

  • ✅ Budget-constrained
  • ✅ Want flexibility và no vendor lock-in
  • ✅ Have engineering resources để maintain
  • ✅ Multi-cloud strategy

Vietnamese startup reality: Hầu hết nên start với open-source MVP, migrate to managed platform khi scale.


6. Getting Started: Your MLOps MVP in 4 Tuần

Week 1: Experiment Tracking

Goal: Stop losing experiments

Tasks:

  1. Setup MLflow server (Docker)
  2. Migrate 1 model training script to log với MLflow
  3. Train 10 experiments, compare trong UI

Deliverable: All team members track experiments trong MLflow

Week 2: Model Registry & Versioning

Goal: Reproducible model deployments

Tasks:

  1. Setup MLflow Model Registry
  2. Register models với stages (Dev/Staging/Production)
  3. Deploy 1 model to production via registry

Deliverable: Production model served from registry, not local pickle files

Week 3: Automated Training Pipeline

Goal: Scheduled retraining

Tasks:

  1. Setup Apache Airflow (or Prefect)
  2. Convert training script to Airflow DAG
  3. Schedule weekly training

Deliverable: Model auto-retrains every week, auto-registers in MLflow

Week 4: Basic Monitoring

Goal: Detect when model degrades

Tasks:

  1. Setup Evidently AI monitoring
  2. Track predictions in BigQuery/database
  3. Daily drift report

Deliverable: Daily email report về model performance + drift

After 4 tuần: Bạn đã có Level 1 MLOps - đủ để deploy models reliably.


7. Common Pitfalls & Best Practices

❌ Pitfall 1: Boil the Ocean

Cố gắng implement tất cả components cùng lúc → overwhelmed → fail

Best Practice: Start small

  • Week 1-4: Experiment tracking
  • Week 5-8: Model registry
  • Week 9-12: Automated pipeline
  • Iterate từ đó

❌ Pitfall 2: Tools Before Process

Mua Databricks/SageMaker nhưng team vẫn làm manual

Best Practice: Document process first

  • How to train models?
  • How to deploy?
  • How to monitor? Then automate process đó

❌ Pitfall 3: Over-Engineering

Build Kubernetes cluster cho 2 models in production

Best Practice: Match complexity to scale

  • 1-5 models: FastAPI + Docker trên single server
  • 5-20 models: Managed service (Vertex AI)
  • 20+ models: Kubernetes + full MLOps

❌ Pitfall 4: No Monitoring

Deploy model rồi quên → performance degrade 30% không ai biết

Best Practice: Monitoring is non-negotiable

  • Minimum: Track daily prediction accuracy
  • Better: Automated drift detection
  • Best: Real-time performance dashboards

❌ Pitfall 5: Training-Serving Skew

Training dùng pandas, production dùng SQL → features khác nhau

Best Practice: Feature Store

  • Single source of truth cho features
  • Same code for training & serving

8. ROI & Business Case cho MLOps

8.1. Cost của KHÔNG có MLOps

Scenario: Company với 5 ML models in production

Manual operations cost (per year):

  • Data Scientist time on deployment: 60% × $80K × 3 DS = $144K
  • Failed deployments: 2 per quarter × $20K impact = $160K
  • Model performance degradation: 15% revenue impact = $500K (nếu ML contributes $3M revenue)
  • Total cost: $804K/year

With MLOps ($100K investment):

  • Data Scientist focus on models: +40% productivity = $200K value
  • Prevent failed deployments: $160K saved
  • Maintain model performance: $500K saved
  • Total benefit: $860K/year

ROI = ($860K - $100K) / $100K = 760%

8.2. Metrics to Track

Business metrics:

  • Time to production: Weeks → Days → Hours
  • Model refresh rate: Quarterly → Weekly → Daily
  • Number of models in production: 1-5 → 10-50+
  • Data Scientist productivity: % time on model improvement (should increase)

Technical metrics:

  • Deployment success rate: Target > 95%
  • Model performance stability: AUC variance < 5%
  • Incident response time: Hours → Minutes
  • Training pipeline uptime: Target > 99%

9. Tương Lai của MLOps: Trends

9.1. AutoML + MLOps

Automated model selection, hyperparameter tuning → MLOps pipelines tự optimize

Tools: H2O AutoML, Google Vertex AI AutoML, DataRobot

9.2. LLMOps

MLOps cho Large Language Models:

  • Prompt versioning
  • Fine-tuning pipelines
  • LLM monitoring (hallucination detection, toxicity)

Tools: LangChain, PromptLayer, Weights & Biases for LLMs

9.3. Real-time ML

Shift từ batch → streaming ML:

  • Online learning (model updates real-time)
  • Feature computation trong stream (Kafka, Flink)

Use cases: Fraud detection, recommendation systems, dynamic pricing

9.4. ML Governance & Compliance

Especially relevant cho Fintech, Healthcare:

  • Model explainability (SHAP, LIME)
  • Bias detection
  • Audit trails
  • Regulatory compliance (GDPR, SBV)

Tools: IBM AI Fairness 360, Microsoft Fairlearn


Kết Luận

MLOps không phải luxury - it's necessity để scale ML in production.

Key takeaways:

  1. 85% ML models fail không phải vì model không tốt, mà vì lack of MLOps
  2. Start small: Experiment tracking (Week 1) → Registry (Week 2) → Pipeline (Week 3) → Monitoring (Week 4)
  3. Match tools to scale: Open-source MVP → Managed platform khi scale
  4. Monitoring is critical: Model performance degrades over time - bạn cần phát hiện và retrain
  5. Automation is key: Manual operations không scale beyond 5-10 models

Next steps:

  • ✅ Đọc Customer Churn Prediction để hiểu end-to-end ML project
  • ✅ Đọc From BI to AI để assess analytics maturity của bạn
  • ✅ Setup MLflow experiment tracking tuần này
  • ✅ Document current deployment process → identify automation opportunities

Need help? Carptech đã implement MLOps cho 10+ Vietnamese companies (Fintech, E-commerce, Logistics). Book free consultation để discuss MLOps roadmap cho company bạn.


Related Posts:

Có câu hỏi về Data Platform?

Đội ngũ chuyên gia của Carptech sẵn sàng tư vấn miễn phí về giải pháp phù hợp nhất cho doanh nghiệp của bạn. Đặt lịch tư vấn 60 phút qua Microsoft Teams hoặc gửi form liên hệ.

✓ Miễn phí 100% • ✓ Microsoft Teams • ✓ Không cam kết dài hạn