TL;DR
- Agile for data: Adapt Scrum/Kanban cho data work (exploratory, hard to estimate)
- Key rituals: Daily standup (15 min), Sprint planning (bi-weekly, 2h), Sprint review (1h), Retro (1h), Backlog grooming (weekly, 1h)
- Additional rituals: Office hours (weekly, 2h), Show & Tell (monthly, 1h), Documentation (runbooks, ADRs)
- Tools: Slack, Jira/Linear, Confluence, Figma (diagrams)
- Agile challenges for data: Estimating exploratory work, balancing planned vs ad-hoc
- Best practices: Timeboxing, hypothesis-driven analysis, blameless retros, async for remote teams
- Outcome: High-performing teams have clear communication, predictable delivery, continuous improvement
Giới Thiệu: Why Data Teams Need Rituals
Scenario thường gặp (Data Team không có structure):
Monday morning:
- Engineer 1: Working on pipeline X (stakeholder doesn't know)
- Engineer 2: Blocked on data access (nobody knows)
- Analyst: Working on ad-hoc request from Friday (forgot original context)
Friday:
- Manager: "What did team accomplish this week?"
- Team: "Uh... stuff?"
- No visibility, no accountability, chaos
Vấn đề:
- ❌ No coordination (duplicated work, blocking each other)
- ❌ No visibility (stakeholders don't know progress)
- ❌ No learning (repeat same mistakes)
Rituals giải quyết:
- ✅ Daily standup → Coordination, unblock
- ✅ Sprint planning → Prioritize, commit
- ✅ Sprint review → Demo work, get feedback
- ✅ Retrospective → Continuous improvement
- ✅ Documentation → Knowledge sharing
High-performing data teams have strong rituals.
Agile for Data Teams: Adaptations
Challenge: Data Work is Different
Software engineering:
- Predictable: "Build login feature" → 2 weeks (can estimate)
- Binary: Feature works or doesn't
- Iterative: Ship v1, then v2, v3
Data work:
- Exploratory: "Why did revenue drop?" → ??? (unknown unknowns)
- Continuous: Data quality, pipeline maintenance (never "done")
- Ad-hoc heavy: 50% planned work, 50% urgent requests
Agile Adaptations for Data
1. Timeboxing Exploratory Work
Instead of:
Task: "Analyze customer churn"
Estimate: ??? (could be 1 day or 1 month)
Timebox:
Task: "Churn analysis (timeboxed to 2 days)"
Day 1: Explore data, identify patterns
Day 2: Document findings, recommend next steps
If need more time → Create follow-up task
Benefit: Prevents analysis paralysis, forces prioritization.
2. Hypothesis-Driven Analysis
Instead of:
Task: "Analyze sales data"
→ Too broad, endless
Hypothesis:
Task: "Test hypothesis: Discount campaigns don't improve LTV"
Approach:
1. Cohort analysis: Discounted vs full-price customers
2. Measure 6-month LTV
3. Statistical test (t-test)
4. Recommend: Continue discounts or not
Estimated: 3 days
Benefit: Clear scope, measurable outcome.
3. Kanban + Scrum Hybrid
Pure Scrum: Fixed 2-week sprints, commit to backlog
- Problem for data: Ad-hoc requests disrupt sprint
Pure Kanban: Continuous flow, no sprints
- Problem: No forcing function to demo work
Hybrid (Best for data teams):
Sprint = 2 weeks
Capacity allocation:
- 60% planned work (from backlog)
- 40% ad-hoc capacity (buffer for urgent requests)
Still have sprint planning, review, retro
But flexible to handle ad-hoc without breaking sprint
Key Rituals
1. Daily Standup (15 Minutes)
Format: Team stands (or video call), quick updates.
3 Questions:
- What I did yesterday
- What I'll do today
- Any blockers
Example:
Engineer 1:
- Yesterday: Finished Airflow DAG for customer events
- Today: Deploy to prod, monitor
- Blockers: Need approval from DevOps for IAM role
Engineer 2:
- Yesterday: Investigated slow BigQuery query
- Today: Implement partition pruning, test
- Blockers: None
Analyst:
- Yesterday: Started churn analysis
- Today: Finish cohort segmentation
- Blockers: Need clarification from Product on definition of "active user"
Rules:
- ✅ Keep it short: 15 minutes MAX (for 5-person team = 3 min/person)
- ✅ Standups are for coordination, not problem-solving
- If deep discussion needed: "Let's take this offline" (after standup)
- ✅ Same time, every day (e.g., 9:30 AM)
- ✅ Everyone attends (unless OOO)
Common mistakes:
- ❌ Too long (30-45 min) → People zone out
- ❌ Manager turns it into status report → Should be peer-to-peer
- ❌ Problem-solving during standup → Wastes everyone's time
Remote team adaptation:
- Async standup: Post updates in Slack channel before 10 AM
- Synchronous huddle: 10 AM video call (optional, for blockers)
2. Sprint Planning (2 Hours, Bi-Weekly)
Goal: Prioritize work, commit to sprint goals.
Agenda:
Part 1: Review Backlog (30 min)
- Product Manager / Stakeholders present priorities
- Data team asks clarifying questions
Part 2: Estimation (45 min)
- Team estimates effort (t-shirt sizes or story points)
- T-shirt sizes: S (1 day), M (2-3 days), L (1 week), XL (2 weeks)
- Discuss complexity, unknowns
Example:
Task: "Build pipeline for new payment data"
Engineer 1: "This is M - source is new API, need to learn it, but transformation straightforward"
Engineer 2: "Agree, M"
Estimate: M (2-3 days)
Part 3: Commit to Sprint (45 min)
-
Calculate team capacity:
Team: 5 people Sprint: 2 weeks = 10 days/person = 50 person-days total Minus: - Holidays: 2 days - Meetings: 5 days (10%) - Ad-hoc buffer: 15 days (30%) Available: 28 person-days Can commit to: 28 days of S/M/L tasks -
Pull top-priority tasks from backlog until capacity full
-
Sprint goal: "Migrate 20 critical pipelines to Prefect"
Outcome: Clear sprint backlog, team aligned.
3. Sprint Review / Demo (1 Hour)
Goal: Show completed work, get feedback from stakeholders.
Attendees: Data team + stakeholders (Product, Marketing, Execs)
Format: Live demos (not slides!)
Example:
Analyst presents:
"This sprint, I analyzed churn for our premium tier.
[Shares Looker dashboard]
Key findings:
1. Churn rate: 5% monthly (higher than standard tier at 3%)
2. Main reason: Price sensitivity (survey data)
3. Hypothesis: Discount for 3-month commitment → Reduce churn
Recommendation: Run A/B test
Questions?"
Stakeholders ask questions, provide feedback
PM: "Great, let's prioritize A/B test next sprint"
Benefits:
- ✅ Visibility (stakeholders see progress)
- ✅ Feedback (catch misunderstandings early)
- ✅ Celebration (motivates team)
Anti-pattern:
- ❌ No demo (just status report) → Boring
- ❌ Only slides (not actual work) → Not convincing
4. Retrospective (1 Hour)
Goal: Reflect on sprint, identify improvements.
Format: Blameless discussion (no finger-pointing).
3 Questions:
- What went well? (keep doing)
- What didn't go well? (stop doing)
- What can we improve? (action items)
Example:
What went well:
- ✅ Migration to Prefect smooth
- ✅ Good collaboration with ML team
- ✅ All critical pipelines stable
What didn't go well:
- ❌ 3 ad-hoc requests took 15 hours (30% of sprint)
- ❌ BigQuery costs spiked (didn't notice until bill)
- ❌ Documentation lacking for new pipelines
Action items:
1. Create ad-hoc request triage process (only accept if urgent + high impact)
2. Setup BigQuery budget alert ($500/day)
3. Mandate documentation checklist for all new pipelines
- Owner: Engineer 1
- Due: Next sprint
Retro formats (rotate to keep fresh):
- Start/Stop/Continue
- Glad/Sad/Mad
- 4Ls: Liked, Learned, Lacked, Longed for
- Sailboat: Wind (helping), Anchor (blocking), Rocks (risks)
Psychological safety: Critical for honest retros.
- No blame ("Pipeline failed" vs "You broke pipeline")
- Manager participates as peer (not judge)
- Rotate facilitator
5. Backlog Grooming (1 Hour, Weekly)
Goal: Refine user stories, add details, prioritize.
Activities:
1. Refine vague requests:
Before:
"Need marketing data"
After grooming:
Title: "Build dashboard for email campaign performance"
Description:
- Metrics: Open rate, click rate, conversions, revenue
- Granularity: Daily, by campaign
- Tool: Looker
- Stakeholder: Marketing Manager
Acceptance criteria:
- [ ] Dashboard live in Looker
- [ ] Marketing team trained
- [ ] Documentation written
Estimate: M (3 days)
2. Break down large tasks:
Epic: "Migrate to Snowflake"
↓ Break into stories:
- Setup Snowflake account
- Migrate 10 critical tables
- Migrate dbt models
- Migrate BI dashboards
- Cutover & validation
- Decommission old warehouse
3. Prioritize:
- Use framework: Impact vs Effort matrix
High Impact, Low Effort → Do now
High Impact, High Effort → Plan carefully
Low Impact, Low Effort → Nice to have
Low Impact, High Effort → Don't do
Outcome: Backlog ready for next sprint planning.
Additional Rituals
6. Office Hours (2 Hours, Weekly)
Goal: Open time for business users to ask data questions.
Format:
Every Friday 2-4 PM
Zoom room open
Anyone can join, ask questions
Topics:
- SQL help
- Dashboard debugging
- Metric definitions
- Data access requests
Benefits:
- ✅ Reduces ad-hoc Slack interruptions (batch questions)
- ✅ Just-in-time learning
- ✅ Builds relationship with stakeholders
Example questions:
- "How do I calculate churn rate?"
- "Why is my dashboard blank?"
- "Can I get access to customer data?"
7. Show & Tell (1 Hour, Monthly)
Goal: Team members share learnings.
Format: Casual presentation (15-20 min talk + Q&A)
Topics:
- New tool tried: "I tested Great Expectations for data quality"
- Technique learned: "Incremental models in dbt"
- Analysis insights: "How I identified $50K revenue leak"
- Conference recap: "Key takeaways from DataEngConf"
Benefits:
- ✅ Knowledge sharing
- ✅ Presentation practice
- ✅ Cross-pollination (analysts learn from engineers, vice versa)
8. Documentation Rituals
Problem: Documentation always outdated or nonexistent.
Solution: Make documentation mandatory part of "Done".
Definition of Done (checklist):
Task: "Build new pipeline"
Done when:
- [ ] Code written & tested
- [ ] Deployed to prod
- [ ] Monitoring setup (alerts)
- [ ] Runbook created (how to troubleshoot)
- [ ] dbt docs updated
- [ ] Team notified (Slack #data-team)
Documentation types:
1. Runbooks:
# Runbook: Customer Events Pipeline
## Overview
Ingests customer events from Kafka → Snowflake
## Schedule
Runs every 5 minutes
## Monitoring
- Datadog: "customer_events_pipeline" dashboard
- Alert: If lag > 30 min
## Troubleshooting
### Pipeline failing
1. Check Kafka lag: ...
2. Check Snowflake connection: ...
3. Escalate to: @engineer-on-call
### Data looks wrong
1. Check source data quality: ...
2. Verify transformations: ...
2. ADRs (Architecture Decision Records):
# ADR: Migrate from Airflow to Prefect
## Status: Accepted
## Context
Airflow maintenance overhead high, observability poor
## Decision
Migrate to Prefect
## Consequences
- Pros: Better UI, cloud-native, less ops
- Cons: Team needs retraining, migration effort
- Alternatives considered: Dagster (too complex), keep Airflow (too painful)
3. Weekly Snippets:
Each engineer posts weekly update in Slack #data-snippets
Example:
Week of Aug 19:
- ✅ Completed: Churn analysis
- 🚧 In Progress: Email pipeline migration
- 📚 Learned: dbt snapshots for SCD Type 2
- 🎯 Next week: Dashboard for exec team
Communication Tools
1. Slack Channels
Structure:
#data-team (internal team chat)
#data-requests (stakeholders submit requests)
#data-office-hours (Q&A)
#data-incidents (production issues)
#data-wins (celebrate successes)
Best practices:
- ✅ Use threads (keep conversations organized)
- ✅ Tag relevant people (@alice for SQL questions)
- ✅ React with emojis (✅ = acknowledged, 🚀 = shipped)
2. Jira / Linear (Task Management)
Workflow:
Backlog → To Do → In Progress → Review → Done
Columns:
- Backlog: All requests
- To Do: Sprint committed
- In Progress: Currently working (limit: 2/person)
- Review: Code review / QA
- Done: Shipped
Labels:
bug,feature,tech-debt,ad-hocP0(urgent),P1(high),P2(medium),P3(low)
3. Confluence / Notion (Documentation)
Structure:
Data Team Wiki
├── Onboarding
│ ├── New Hire Guide
│ └── Access Requests
├── Runbooks
│ ├── Pipeline X
│ └── Pipeline Y
├── Architecture
│ ├── Data Platform Overview
│ └── ADRs
├── Processes
│ ├── How to Submit Data Request
│ └── On-Call Rotation
└── Metrics Definitions
├── Revenue
└── Churn Rate
Meeting Hygiene
Best Practices
1. Always have agenda:
Meeting: Sprint Planning
Date: Aug 26, 2025
Agenda:
1. Review last sprint (10 min)
2. Backlog priorities from PM (20 min)
3. Estimation (40 min)
4. Sprint commitment (30 min)
5. Parking lot (20 min)
Total: 2 hours
2. Take notes:
- Designate note-taker (rotate)
- Action items: Who, What, When
- Share notes in Slack after meeting
3. Start/end on time:
- Respect people's calendars
- If need more time → Schedule follow-up
4. No laptop rule (for some meetings):
- Retros, brainstorming: Full attention
- Planning: OK to have laptop (need to estimate)
Remote Team Considerations
Challenges
Timezone differences:
Team:
- Engineer 1: HCMC (UTC+7)
- Engineer 2: Hanoi (UTC+7)
- Engineer 3: US West Coast (UTC-8) → 15 hours behind
→ Finding meeting time hard
Solution: Async-first culture
1. Async Standups:
Instead of daily 9 AM standup:
Post updates in Slack #data-standup by 10 AM local time
Template:
Yesterday: Finished X
Today: Working on Y
Blockers: Z
Engineer 3 (US) posts before sleeping, team reads next morning
2. Recorded Demos:
Sprint review:
- Record demo video (Loom)
- Post in Slack with written summary
- Stakeholders watch async, comment
Follow-up: 30-min sync Q&A (if needed)
3. Written RFCs:
Instead of architecture discussion meeting:
Write RFC (Request for Comments) doc
Team reviews, comments inline (Google Docs)
Discuss async in comments
Final decision: Async vote or short sync meeting
Zoom Fatigue
Problem: 5+ hours of video calls/day → Exhausting
Solutions:
- No-meeting Wednesdays: Deep work day
- 25-min meetings (not 30): 5-min break between
- Walking 1-on-1s: Voice call while walking (no video)
- Async default: Meeting only if truly necessary
Case Study: High-Performing Data Team
Background
Company: SaaS startup, 200 employees Data team: 6 people (3 engineers, 2 analysts, 1 manager)
Before rituals (6 months ago):
- Chaos: No coordination
- Backlog: 50+ untracked requests
- Delivery: Unpredictable
- Morale: Low (team felt firefighting constantly)
Implemented Rituals (3 Months Ago)
Week 1: Setup Jira, migrate all requests to backlog
Week 2-4: Started core rituals
- Daily standup (9:30 AM, 15 min)
- Bi-weekly sprint planning (Monday, 2h)
- Bi-weekly retro (Friday, 1h)
Month 2: Added supporting rituals
- Weekly backlog grooming (Wednesday, 1h)
- Office hours (Friday 2-4 PM)
Month 3: Refined processes
- Created Definition of Done checklist
- Runbook template
- Async standup option for remote days
Results (After 3 Months)
Delivery:
- Sprint commitment: 90% completion rate (vs 50% before)
- Lead time: 5 days average (vs 15 days before)
- Stakeholder satisfaction: 4.2/5 (vs 2.8/5)
Team:
- Morale: Much improved (retros show positive sentiment)
- Coordination: Zero blocking issues (spotted & resolved in standups)
- Learning: 3 process improvements implemented from retros
Visibility:
- Stakeholders know what team is working on
- Clear backlog (50 requests → Prioritized into sprints)
Manager: "Rituals transformed our team from reactive firefighting to proactive delivery."
Common Pitfalls
1. Rituals Become Meetings
Symptom: Team dreads standups, retros feel like waste of time.
Cause: Lost focus, no action items, too long.
Fix:
- Timebox strictly (15 min standup, not 30)
- Parking lot for deep discussions (don't derail)
- Retro action items → Jira tickets (actually do them)
2. Cargo Cult Agile
Symptom: Following rituals mechanically without understanding why.
Example:
Team: "We do standups because Scrum book says so"
Manager: "Why?"
Team: "Uh... not sure"
→ Standup becomes status report to manager, no value
Fix: Understand purpose of each ritual, adapt to your needs.
3. No Flexibility
Symptom: Strict Scrum → Can't handle ad-hoc requests → Stakeholders frustrated.
Fix: Hybrid model (60% planned, 40% ad-hoc buffer).
4. Documentation Neglected
Symptom: Rituals running, but knowledge lost when people leave.
Fix: Make documentation part of Definition of Done.
Kết Luận
Key Takeaways
✅ Core rituals: Daily standup (15 min), Sprint planning (2h bi-weekly), Sprint review (1h), Retro (1h), Backlog grooming (1h weekly) ✅ Supporting rituals: Office hours, Show & Tell, Documentation (runbooks, ADRs) ✅ Agile for data: Timeboxing, hypothesis-driven, Kanban+Scrum hybrid ✅ Tools: Slack, Jira/Linear, Confluence, async for remote ✅ Psychological safety: Blameless retros critical for honest feedback ✅ Outcome: Predictable delivery, team alignment, continuous improvement
Recommendations
For New Teams (0-3 months):
- Start simple: Daily standup + Bi-weekly retro
- Use Jira/Linear for backlog
- Document as you go (don't wait)
For Growing Teams (3-12 months):
- Add sprint planning, review
- Backlog grooming
- Office hours for stakeholders
For Mature Teams (12+ months):
- Optimize rituals (what works, what doesn't)
- Strong documentation culture
- Async rituals for remote teams
Universal: Rituals are means, not end. Adapt to your team's needs.
Dành cho Hiring Managers
Đang scale Data Team và cần process chuẩn? Rituals tốt bắt đầu từ team structure đúng. Xem Xây Dựng Data Team: Roles, Hiring, và Org Structure để tổ chức team hiệu quả, hoặc cân nhắc mô hình Outsourcing vs In-house nếu đang thiếu nhân lực.
Next Steps
Muốn setup high-performing Data Team?
Carptech giúp bạn:
- ✅ Agile coaching for data teams
- ✅ Setup rituals & tools (Jira, runbooks, ADRs)
- ✅ Facilitation training (how to run great retros)
- ✅ Process optimization (find bottlenecks, improve)
📞 Liên hệ Carptech: carptech.vn
Related Posts:
- Xây Dựng Data Team: Roles, Hiring, và Org Structure
- Data Team Career Ladders: From Junior đến Principal
- Data-Driven Culture: Từ Intuition-Based đến Data-Informed Decisions
Bước tiếp theo
- Làm Data Maturity Assessment → — Đánh giá mức độ trưởng thành dữ liệu trên 6 dimensions
- Tính ROI Data Platform → — Ước tính chi phí và lợi ích đầu tư data platform
- Đặt lịch tư vấn miễn phí → — 60 phút cùng chuyên gia Carptech




