DocsHub
Schema Design

Schema Patterns

Learn practical MongoDB schema design patterns — bucket, subset, and computed — with real school system examples.

Schema Patterns

Good schema design is not just about embedding vs referencing. As your application grows, you run into specific problems — documents getting too large, queries getting too slow, repeated expensive calculations. Schema patterns are proven solutions to these problems.

We will cover three patterns that come up most often in real applications:

  • Bucket pattern — group time-based data into chunks instead of one document per event
  • Subset pattern — store only the most used portion of data in the main document
  • Computed pattern — pre-calculate expensive values and store the result

Bucket Pattern

The Problem

In our school system, we track attendance every day for every student. The naive approach is one document per attendance record:

// One document per day per student — naive approach
{ studentId: ObjectId("..."), date: new Date("2024-09-01"), present: true }
{ studentId: ObjectId("..."), date: new Date("2024-09-02"), present: false }
{ studentId: ObjectId("..."), date: new Date("2024-09-03"), present: true }
// ... 200 more documents per student per year
// With 500 students — 100,000+ documents per year

This creates a massive collection with millions of tiny documents. Querying a student's monthly attendance requires scanning hundreds of documents. The index on studentId has millions of entries.

The Solution — Bucket by Month

Instead of one document per day, group attendance records into buckets — one document per student per month:

// One document per student per month — bucket pattern
{
  studentId: ObjectId("64a1f2c3e4b0a1b2c3d4e5f6"),
  month: "2024-09",
  year: 2024,
  totalDays: 22,
  presentDays: 19,
  absences: 3,
  records: [
    { date: new Date("2024-09-01"), present: true },
    { date: new Date("2024-09-02"), present: false },
    { date: new Date("2024-09-03"), present: true },
    { date: new Date("2024-09-04"), present: true },
    // ... rest of the month
  ]
}

Now one query gets the entire month's attendance — one document, not 22. The collection has 12 documents per student per year instead of 220.

Inserting into a Bucket

When a new attendance entry comes in, update the existing bucket for that month:

const today = new Date();
const monthKey = `${today.getFullYear()}-${String(today.getMonth() + 1).padStart(2, '0')}`;

db.attendance.updateOne(
  {
    studentId: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f6"),
    month: monthKey
  },
  {
    $push: {
      records: { date: today, present: true }
    },
    $inc: {
      totalDays: 1,
      presentDays: 1
    },
    $setOnInsert: {
      year: today.getFullYear(),
      absences: 0
    }
  },
  { upsert: true } // create the bucket if it doesn't exist yet
)

upsert: true creates the bucket document on the first day of the month. After that, each new attendance record pushes into the existing bucket.

Querying a Bucket

// Get Ali's attendance for September 2024
db.attendance.findOne({
  studentId: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f6"),
  month: "2024-09"
})

// Get all months where Ali had more than 3 absences
db.attendance.find({
  studentId: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f6"),
  absences: { $gt: 3 }
})

// Attendance summary for the whole year
db.attendance.aggregate([
  {
    $match: {
      studentId: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f6"),
      year: 2024
    }
  },
  {
    $group: {
      _id: "$studentId",
      totalDays: { $sum: "$totalDays" },
      totalPresent: { $sum: "$presentDays" },
      totalAbsences: { $sum: "$absences" }
    }
  }
])

When to Use Bucket Pattern

Use the bucket pattern when:

  • You have time-series data — events, readings, logs that happen regularly
  • The data grows continuously and you query it in time ranges
  • Individual records are tiny but there are millions of them

Subset Pattern

The Problem

Our student document is getting large. It has everything — address, guardian, all grades from all semesters, all subjects, notes, activity history. But when the school app shows the student list page, it only needs name, grade, and enrolled status. Loading full documents for a list of 100 students is wasteful.

// Full student document — 5KB+
{
  name: "Ali Hassan",
  age: 16,
  grade: "10th",
  enrolled: true,
  address: { ... },
  guardian: { ... },
  grades: [ /* 50+ grade entries across all semesters */ ],
  subjects: [ ... ],
  activityHistory: [ /* long list */ ],
  notes: "long text...",
  // ... many more fields
}

The Solution — Split into Main and Detail

Store the most-used fields in the main document and move the rarely-used or large data into a separate detail document:

// Main student document — lean, fast
db.students.insertOne({
  _id: ObjectId("student1"),
  name: "Ali Hassan",
  age: 16,
  grade: "10th",
  enrolled: true,
  enrollmentDate: new Date("2024-09-01"),
  address: { city: "Lahore", country: "Pakistan" },

  // Only current semester grades — not all history
  currentGrades: [
    { subject: "Math",    score: 88 },
    { subject: "Physics", score: 92 }
  ],

  // Reference to full detail document
  detailId: ObjectId("detail1")
})

// Student detail document — full history, loaded only when needed
db.studentDetails.insertOne({
  _id: ObjectId("detail1"),
  studentId: ObjectId("student1"),

  // Full grade history across all semesters
  gradeHistory: [
    { subject: "Math",    score: 88, semester: 1, year: 2024 },
    { subject: "Math",    score: 82, semester: 2, year: 2023 },
    // ... many more
  ],

  // Guardian full details
  guardian: {
    name: "Mr. Hassan Senior",
    phone: "0300-1234567",
    email: "hassan@gmail.com",
    cnic: "35201-1234567-1",
    relation: "Father"
  },

  // Activity history
  activityHistory: [
    { activity: "Science Fair", year: 2023, result: "1st Place" },
    { activity: "Debate Club",  year: 2024, result: "Participant" }
  ],

  // Medical info
  medicalInfo: {
    bloodGroup: "O+",
    allergies: ["Peanuts"],
    emergencyContact: "0300-9876543"
  }
})

Querying with Subset Pattern

// Student list page — fast, small documents
db.students.find(
  { enrolled: true },
  { name: 1, grade: 1, enrolled: 1, "address.city": 1 }
)

// Student profile page — load full details when needed
const student = await db.students.findOne({ name: "Ali Hassan" })
const details = await db.studentDetails.findOne({ studentId: student._id })

The list page is fast — small documents, only the fields needed. The profile page loads full details only when the user actually views that student.

When to Use Subset Pattern

Use the subset pattern when:

  • Documents have a mix of frequently accessed and rarely accessed data
  • The rarely accessed data is large — long arrays, big text fields
  • You have a list view that shows summary data and a detail view that shows everything

The subset pattern is common in apps with list and detail views — product listings, student lists, employee directories. The list shows the subset. The detail page loads the full document.


Computed Pattern

The Problem

Every time a teacher opens the grade report for a class, the app runs an aggregation to calculate the average score, pass rate, and top student for each grade. This aggregation runs across thousands of student documents. It is slow, and it runs dozens of times per day.

// This expensive aggregation runs every time the report is opened
db.students.aggregate([
  { $match: { enrolled: true } },
  {
    $group: {
      _id: "$grade",
      averageScore: { $avg: "$examScore" },
      highestScore: { $max: "$examScore" },
      lowestScore:  { $min: "$examScore" },
      totalStudents: { $sum: 1 },
      passCount: {
        $sum: { $cond: [{ $gte: ["$examScore", 50] }, 1, 0] }
      }
    }
  }
])

If the underlying scores do not change between report views, you are doing the same expensive calculation over and over.

The Solution — Pre-calculate and Store the Result

Calculate the result once and store it in a dedicated collection. Update it only when the underlying data changes:

// gradeStats collection — pre-computed results
db.gradeStats.insertMany([
  {
    grade: "10th",
    lastUpdated: new Date(),
    totalStudents: 45,
    averageScore: 81.2,
    highestScore: 98,
    lowestScore: 42,
    passCount: 41,
    passRate: 91.1,
    topStudent: "Ali Hassan"
  },
  {
    grade: "11th",
    lastUpdated: new Date(),
    totalStudents: 38,
    averageScore: 84.5,
    highestScore: 97,
    lowestScore: 51,
    passCount: 37,
    passRate: 97.4,
    topStudent: "Ayesha Khan"
  }
])

Reading the Computed Result

// Instant — no aggregation needed
db.gradeStats.find().sort({ averageScore: -1 })

One simple find query instead of a heavy aggregation. Milliseconds instead of seconds.

Updating the Computed Values

Update the stats when exam scores change — after each exam is graded:

async function updateGradeStats(grade) {
  // Run the aggregation once
  const result = await db.students.aggregate([
    { $match: { grade: grade, enrolled: true } },
    {
      $group: {
        _id: "$grade",
        totalStudents: { $sum: 1 },
        averageScore:  { $avg: "$examScore" },
        highestScore:  { $max: "$examScore" },
        lowestScore:   { $min: "$examScore" },
        passCount: {
          $sum: { $cond: [{ $gte: ["$examScore", 50] }, 1, 0] }
        }
      }
    }
  ]).toArray();

  const stats = result[0];

  // Store the result
  await db.gradeStats.updateOne(
    { grade: grade },
    {
      $set: {
        totalStudents: stats.totalStudents,
        averageScore:  Math.round(stats.averageScore * 10) / 10,
        highestScore:  stats.highestScore,
        lowestScore:   stats.lowestScore,
        passCount:     stats.passCount,
        passRate:      Math.round((stats.passCount / stats.totalStudents) * 1000) / 10,
        lastUpdated:   new Date()
      }
    },
    { upsert: true }
  );
}

// Call this after each batch of exam scores is entered
await updateGradeStats("10th");
await updateGradeStats("11th");

When to Use Computed Pattern

Use the computed pattern when:

  • The same expensive calculation runs frequently
  • The underlying data does not change constantly — you can afford to update the computed value periodically
  • Query speed matters more than having perfectly real-time data

The computed pattern is a deliberate trade-off — you accept slightly stale data in exchange for much faster reads. For a grade report, data that is a few minutes old is perfectly fine. For a live stock price, it is not. Know your tolerance for staleness before applying this pattern.


Combining Patterns

Patterns are not mutually exclusive. Our school system uses all three together:

Bucket pattern    → attendance records grouped by month
Subset pattern    → student list (lean) + student detail (full)
Computed pattern  → pre-calculated grade stats per grade

Each pattern solves a different problem. As your app grows, you will find yourself reaching for different patterns for different collections.


Quick Reference

PatternProblem it solvesHow it works
BucketToo many tiny documentsGroup time-based records into chunks
SubsetDocuments are too large for common queriesSplit into main (lean) and detail documents
ComputedSame expensive calculation runs repeatedlyPre-calculate and store the result, update when data changes

Do not apply patterns prematurely. Start with a simple schema. When you notice a real performance problem — list pages loading slowly, the same aggregation running constantly, a collection with millions of tiny documents — then reach for the right pattern. Premature optimization creates complexity without solving a real problem.

On this page