TL;DR
- Agile for data: Adapt Scrum/Kanban cho data work (exploratory, hard to estimate)
- Key rituals: Daily standup (15 min), Sprint planning (bi-weekly, 2h), Sprint review (1h), Retro (1h), Backlog grooming (weekly, 1h)
- Additional rituals: Office hours (weekly, 2h), Show & Tell (monthly, 1h), Documentation (runbooks, ADRs)
- Tools: Slack, Jira/Linear, Confluence, Figma (diagrams)
- Agile challenges for data: Estimating exploratory work, balancing planned vs ad-hoc
- Best practices: Timeboxing, hypothesis-driven analysis, blameless retros, async for remote teams
- Outcome: High-performing teams have clear communication, predictable delivery, continuous improvement
Giới Thiệu: Why Data Teams Need Rituals
Scenario thường gặp (Data Team không có structure):
Monday morning:
- Engineer 1: Working on pipeline X (stakeholder doesn't know)
- Engineer 2: Blocked on data access (nobody knows)
- Analyst: Working on ad-hoc request from Friday (forgot original context)
Friday:
- Manager: "What did team accomplish this week?"
- Team: "Uh... stuff?"
- No visibility, no accountability, chaos
Vấn đề:
- ❌ No coordination (duplicated work, blocking each other)
- ❌ No visibility (stakeholders don't know progress)
- ❌ No learning (repeat same mistakes)
Rituals giải quyết:
- ✅ Daily standup → Coordination, unblock
- ✅ Sprint planning → Prioritize, commit
- ✅ Sprint review → Demo work, get feedback
- ✅ Retrospective → Continuous improvement
- ✅ Documentation → Knowledge sharing
High-performing data teams have strong rituals.
Agile for Data Teams: Adaptations
Challenge: Data Work is Different
Software engineering:
- Predictable: "Build login feature" → 2 weeks (can estimate)
- Binary: Feature works or doesn't
- Iterative: Ship v1, then v2, v3
Data work:
- Exploratory: "Why did revenue drop?" → ??? (unknown unknowns)
- Continuous: Data quality, pipeline maintenance (never "done")
- Ad-hoc heavy: 50% planned work, 50% urgent requests
Agile Adaptations for Data
1. Timeboxing Exploratory Work
Instead of:
Task: "Analyze customer churn"
Estimate: ??? (could be 1 day or 1 month)
Timebox:
Task: "Churn analysis (timeboxed to 2 days)"
Day 1: Explore data, identify patterns
Day 2: Document findings, recommend next steps
If need more time → Create follow-up task
Benefit: Prevents analysis paralysis, forces prioritization.
2. Hypothesis-Driven Analysis
Instead of:
Task: "Analyze sales data"
→ Too broad, endless
Hypothesis:
Task: "Test hypothesis: Discount campaigns don't improve LTV"
Approach:
1. Cohort analysis: Discounted vs full-price customers
2. Measure 6-month LTV
3. Statistical test (t-test)
4. Recommend: Continue discounts or not
Estimated: 3 days
Benefit: Clear scope, measurable outcome.
3. Kanban + Scrum Hybrid
Pure Scrum: Fixed 2-week sprints, commit to backlog
- Problem for data: Ad-hoc requests disrupt sprint
Pure Kanban: Continuous flow, no sprints
- Problem: No forcing function to demo work
Hybrid (Best for data teams):
Sprint = 2 weeks
Capacity allocation:
- 60% planned work (from backlog)
- 40% ad-hoc capacity (buffer for urgent requests)
Still have sprint planning, review, retro
But flexible to handle ad-hoc without breaking sprint
Key Rituals
1. Daily Standup (15 Minutes)
Format: Team stands (or video call), quick updates.
3 Questions:
- What I did yesterday
- What I'll do today
- Any blockers
Example:
Engineer 1:
- Yesterday: Finished Airflow DAG for customer events
- Today: Deploy to prod, monitor
- Blockers: Need approval from DevOps for IAM role
Engineer 2:
- Yesterday: Investigated slow BigQuery query
- Today: Implement partition pruning, test
- Blockers: None
Analyst:
- Yesterday: Started churn analysis
- Today: Finish cohort segmentation
- Blockers: Need clarification from Product on definition of "active user"
Rules:
- ✅ Keep it short: 15 minutes MAX (for 5-person team = 3 min/person)
- ✅ Standups are for coordination, not problem-solving
- If deep discussion needed: "Let's take this offline" (after standup)
- ✅ Same time, every day (e.g., 9:30 AM)
- ✅ Everyone attends (unless OOO)
Common mistakes:
- ❌ Too long (30-45 min) → People zone out
- ❌ Manager turns it into status report → Should be peer-to-peer
- ❌ Problem-solving during standup → Wastes everyone's time
Remote team adaptation:
- Async standup: Post updates in Slack channel before 10 AM
- Synchronous huddle: 10 AM video call (optional, for blockers)
2. Sprint Planning (2 Hours, Bi-Weekly)
Goal: Prioritize work, commit to sprint goals.
Agenda:
Part 1: Review Backlog (30 min)
- Product Manager / Stakeholders present priorities
- Data team asks clarifying questions
Part 2: Estimation (45 min)
- Team estimates effort (t-shirt sizes or story points)
- T-shirt sizes: S (1 day), M (2-3 days), L (1 week), XL (2 weeks)
- Discuss complexity, unknowns
Example:
Task: "Build pipeline for new payment data"
Engineer 1: "This is M - source is new API, need to learn it, but transformation straightforward"
Engineer 2: "Agree, M"
Estimate: M (2-3 days)
Part 3: Commit to Sprint (45 min)
-
Calculate team capacity:
Team: 5 people Sprint: 2 weeks = 10 days/person = 50 person-days total Minus: - Holidays: 2 days - Meetings: 5 days (10%) - Ad-hoc buffer: 15 days (30%) Available: 28 person-days Can commit to: 28 days of S/M/L tasks -
Pull top-priority tasks from backlog until capacity full
-
Sprint goal: "Migrate 20 critical pipelines to Prefect"
Outcome: Clear sprint backlog, team aligned.
3. Sprint Review / Demo (1 Hour)
Goal: Show completed work, get feedback from stakeholders.
Attendees: Data team + stakeholders (Product, Marketing, Execs)
Format: Live demos (not slides!)
Example:
Analyst presents:
"This sprint, I analyzed churn for our premium tier.
[Shares Looker dashboard]
Key findings:
1. Churn rate: 5% monthly (higher than standard tier at 3%)
2. Main reason: Price sensitivity (survey data)
3. Hypothesis: Discount for 3-month commitment → Reduce churn
Recommendation: Run A/B test
Questions?"
Stakeholders ask questions, provide feedback
PM: "Great, let's prioritize A/B test next sprint"
Benefits:
- ✅ Visibility (stakeholders see progress)
- ✅ Feedback (catch misunderstandings early)
- ✅ Celebration (motivates team)
Anti-pattern:
- ❌ No demo (just status report) → Boring
- ❌ Only slides (not actual work) → Not convincing
4. Retrospective (1 Hour)
Goal: Reflect on sprint, identify improvements.
Format: Blameless discussion (no finger-pointing).
3 Questions:
- What went well? (keep doing)
- What didn't go well? (stop doing)
- What can we improve? (action items)
Example:
What went well:
- ✅ Migration to Prefect smooth
- ✅ Good collaboration with ML team
- ✅ All critical pipelines stable
What didn't go well:
- ❌ 3 ad-hoc requests took 15 hours (30% of sprint)
- ❌ BigQuery costs spiked (didn't notice until bill)
- ❌ Documentation lacking for new pipelines
Action items:
1. Create ad-hoc request triage process (only accept if urgent + high impact)
2. Setup BigQuery budget alert ($500/day)
3. Mandate documentation checklist for all new pipelines
- Owner: Engineer 1
- Due: Next sprint
Retro formats (rotate to keep fresh):
- Start/Stop/Continue
- Glad/Sad/Mad
- 4Ls: Liked, Learned, Lacked, Longed for
- Sailboat: Wind (helping), Anchor (blocking), Rocks (risks)
Psychological safety: Critical for honest retros.
- No blame ("Pipeline failed" vs "You broke pipeline")
- Manager participates as peer (not judge)
- Rotate facilitator
5. Backlog Grooming (1 Hour, Weekly)
Goal: Refine user stories, add details, prioritize.
Activities:
1. Refine vague requests:
Before:
"Need marketing data"
After grooming:
Title: "Build dashboard for email campaign performance"
Description:
- Metrics: Open rate, click rate, conversions, revenue
- Granularity: Daily, by campaign
- Tool: Looker
- Stakeholder: Marketing Manager
Acceptance criteria:
- [ ] Dashboard live in Looker
- [ ] Marketing team trained
- [ ] Documentation written
Estimate: M (3 days)
2. Break down large tasks:
Epic: "Migrate to Snowflake"
↓ Break into stories:
- Setup Snowflake account
- Migrate 10 critical tables
- Migrate dbt models
- Migrate BI dashboards
- Cutover & validation
- Decommission old warehouse
3. Prioritize:
- Use framework: Impact vs Effort matrix
High Impact, Low Effort → Do now
High Impact, High Effort → Plan carefully
Low Impact, Low Effort → Nice to have
Low Impact, High Effort → Don't do
Outcome: Backlog ready for next sprint planning.
Additional Rituals
6. Office Hours (2 Hours, Weekly)
Goal: Open time for business users to ask data questions.
Format:
Every Friday 2-4 PM
Zoom room open
Anyone can join, ask questions
Topics:
- SQL help
- Dashboard debugging
- Metric definitions
- Data access requests
Benefits:
- ✅ Reduces ad-hoc Slack interruptions (batch questions)
- ✅ Just-in-time learning
- ✅ Builds relationship with stakeholders
Example questions:
- "How do I calculate churn rate?"
- "Why is my dashboard blank?"
- "Can I get access to customer data?"
7. Show & Tell (1 Hour, Monthly)
Goal: Team members share learnings.
Format: Casual presentation (15-20 min talk + Q&A)
Topics:
- New tool tried: "I tested Great Expectations for data quality"
- Technique learned: "Incremental models in dbt"
- Analysis insights: "How I identified $50K revenue leak"
- Conference recap: "Key takeaways from DataEngConf"
Benefits:
- ✅ Knowledge sharing
- ✅ Presentation practice
- ✅ Cross-pollination (analysts learn from engineers, vice versa)
8. Documentation Rituals
Problem: Documentation always outdated or nonexistent.
Solution: Make documentation mandatory part of "Done".
Definition of Done (checklist):
Task: "Build new pipeline"
Done when:
- [ ] Code written & tested
- [ ] Deployed to prod
- [ ] Monitoring setup (alerts)
- [ ] Runbook created (how to troubleshoot)
- [ ] dbt docs updated
- [ ] Team notified (Slack #data-team)
Documentation types:
1. Runbooks:
# Runbook: Customer Events Pipeline
## Overview
Ingests customer events from Kafka → Snowflake
## Schedule
Runs every 5 minutes
## Monitoring
- Datadog: "customer_events_pipeline" dashboard
- Alert: If lag > 30 min
## Troubleshooting
### Pipeline failing
1. Check Kafka lag: ...
2. Check Snowflake connection: ...
3. Escalate to: @engineer-on-call
### Data looks wrong
1. Check source data quality: ...
2. Verify transformations: ...
2. ADRs (Architecture Decision Records):
# ADR: Migrate from Airflow to Prefect
## Status: Accepted
## Context
Airflow maintenance overhead high, observability poor
## Decision
Migrate to Prefect
## Consequences
- Pros: Better UI, cloud-native, less ops
- Cons: Team needs retraining, migration effort
- Alternatives considered: Dagster (too complex), keep Airflow (too painful)
3. Weekly Snippets:
Each engineer posts weekly update in Slack #data-snippets
Example:
Week of Aug 19:
- ✅ Completed: Churn analysis
- 🚧 In Progress: Email pipeline migration
- 📚 Learned: dbt snapshots for SCD Type 2
- 🎯 Next week: Dashboard for exec team
Communication Tools
1. Slack Channels
Structure:
#data-team (internal team chat)
#data-requests (stakeholders submit requests)
#data-office-hours (Q&A)
#data-incidents (production issues)
#data-wins (celebrate successes)
Best practices:
- ✅ Use threads (keep conversations organized)
- ✅ Tag relevant people (@alice for SQL questions)
- ✅ React with emojis (✅ = acknowledged, 🚀 = shipped)
2. Jira / Linear (Task Management)
Workflow:
Backlog → To Do → In Progress → Review → Done
Columns:
- Backlog: All requests
- To Do: Sprint committed
- In Progress: Currently working (limit: 2/person)
- Review: Code review / QA
- Done: Shipped
Labels:
bug,feature,tech-debt,ad-hocP0(urgent),P1(high),P2(medium),P3(low)
3. Confluence / Notion (Documentation)
Structure:
Data Team Wiki
├── Onboarding
│ ├── New Hire Guide
│ └── Access Requests
├── Runbooks
│ ├── Pipeline X
│ └── Pipeline Y
├── Architecture
│ ├── Data Platform Overview
│ └── ADRs
├── Processes
│ ├── How to Submit Data Request
│ └── On-Call Rotation
└── Metrics Definitions
├── Revenue
└── Churn Rate
Meeting Hygiene
Best Practices
1. Always have agenda:
Meeting: Sprint Planning
Date: Aug 26, 2025
Agenda:
1. Review last sprint (10 min)
2. Backlog priorities from PM (20 min)
3. Estimation (40 min)
4. Sprint commitment (30 min)
5. Parking lot (20 min)
Total: 2 hours
2. Take notes:
- Designate note-taker (rotate)
- Action items: Who, What, When
- Share notes in Slack after meeting
3. Start/end on time:
- Respect people's calendars
- If need more time → Schedule follow-up
4. No laptop rule (for some meetings):
- Retros, brainstorming: Full attention
- Planning: OK to have laptop (need to estimate)
Remote Team Considerations
Challenges
Timezone differences:
Team:
- Engineer 1: HCMC (UTC+7)
- Engineer 2: Hanoi (UTC+7)
- Engineer 3: US West Coast (UTC-8) → 15 hours behind
→ Finding meeting time hard
Solution: Async-first culture
1. Async Standups:
Instead of daily 9 AM standup:
Post updates in Slack #data-standup by 10 AM local time
Template:
Yesterday: Finished X
Today: Working on Y
Blockers: Z
Engineer 3 (US) posts before sleeping, team reads next morning
2. Recorded Demos:
Sprint review:
- Record demo video (Loom)
- Post in Slack with written summary
- Stakeholders watch async, comment
Follow-up: 30-min sync Q&A (if needed)
3. Written RFCs:
Instead of architecture discussion meeting:
Write RFC (Request for Comments) doc
Team reviews, comments inline (Google Docs)
Discuss async in comments
Final decision: Async vote or short sync meeting
Zoom Fatigue
Problem: 5+ hours of video calls/day → Exhausting
Solutions:
- No-meeting Wednesdays: Deep work day
- 25-min meetings (not 30): 5-min break between
- Walking 1-on-1s: Voice call while walking (no video)
- Async default: Meeting only if truly necessary
Case Study: High-Performing Data Team
Background
Company: SaaS startup, 200 employees Data team: 6 people (3 engineers, 2 analysts, 1 manager)
Before rituals (6 months ago):
- Chaos: No coordination
- Backlog: 50+ untracked requests
- Delivery: Unpredictable
- Morale: Low (team felt firefighting constantly)
Implemented Rituals (3 Months Ago)
Week 1: Setup Jira, migrate all requests to backlog
Week 2-4: Started core rituals
- Daily standup (9:30 AM, 15 min)
- Bi-weekly sprint planning (Monday, 2h)
- Bi-weekly retro (Friday, 1h)
Month 2: Added supporting rituals
- Weekly backlog grooming (Wednesday, 1h)
- Office hours (Friday 2-4 PM)
Month 3: Refined processes
- Created Definition of Done checklist
- Runbook template
- Async standup option for remote days
Results (After 3 Months)
Delivery:
- Sprint commitment: 90% completion rate (vs 50% before)
- Lead time: 5 days average (vs 15 days before)
- Stakeholder satisfaction: 4.2/5 (vs 2.8/5)
Team:
- Morale: Much improved (retros show positive sentiment)
- Coordination: Zero blocking issues (spotted & resolved in standups)
- Learning: 3 process improvements implemented from retros
Visibility:
- Stakeholders know what team is working on
- Clear backlog (50 requests → Prioritized into sprints)
Manager: "Rituals transformed our team from reactive firefighting to proactive delivery."
Common Pitfalls
1. Rituals Become Meetings
Symptom: Team dreads standups, retros feel like waste of time.
Cause: Lost focus, no action items, too long.
Fix:
- Timebox strictly (15 min standup, not 30)
- Parking lot for deep discussions (don't derail)
- Retro action items → Jira tickets (actually do them)
2. Cargo Cult Agile
Symptom: Following rituals mechanically without understanding why.
Example:
Team: "We do standups because Scrum book says so"
Manager: "Why?"
Team: "Uh... not sure"
→ Standup becomes status report to manager, no value
Fix: Understand purpose of each ritual, adapt to your needs.
3. No Flexibility
Symptom: Strict Scrum → Can't handle ad-hoc requests → Stakeholders frustrated.
Fix: Hybrid model (60% planned, 40% ad-hoc buffer).
4. Documentation Neglected
Symptom: Rituals running, but knowledge lost when people leave.
Fix: Make documentation part of Definition of Done.
Kết Luận
Key Takeaways
✅ Core rituals: Daily standup (15 min), Sprint planning (2h bi-weekly), Sprint review (1h), Retro (1h), Backlog grooming (1h weekly) ✅ Supporting rituals: Office hours, Show & Tell, Documentation (runbooks, ADRs) ✅ Agile for data: Timeboxing, hypothesis-driven, Kanban+Scrum hybrid ✅ Tools: Slack, Jira/Linear, Confluence, async for remote ✅ Psychological safety: Blameless retros critical for honest feedback ✅ Outcome: Predictable delivery, team alignment, continuous improvement
Recommendations
For New Teams (0-3 months):
- Start simple: Daily standup + Bi-weekly retro
- Use Jira/Linear for backlog
- Document as you go (don't wait)
For Growing Teams (3-12 months):
- Add sprint planning, review
- Backlog grooming
- Office hours for stakeholders
For Mature Teams (12+ months):
- Optimize rituals (what works, what doesn't)
- Strong documentation culture
- Async rituals for remote teams
Universal: Rituals are means, not end. Adapt to your team's needs.
Next Steps
Muốn setup high-performing Data Team?
Carptech giúp bạn:
- ✅ Agile coaching for data teams
- ✅ Setup rituals & tools (Jira, runbooks, ADRs)
- ✅ Facilitation training (how to run great retros)
- ✅ Process optimization (find bottlenecks, improve)
📞 Liên hệ Carptech: carptech.vn
Related Posts:




