DocsHub
Schema Design

Document Model

Learn when to embed data inside a document and when to reference another collection — the most important decision in MongoDB schema design.

Document Model

The most important decision you make when designing a MongoDB schema is this — should this data live inside the document, or in a separate collection?

Get this right and your app is fast, simple, and easy to maintain. Get it wrong and you end up with slow queries, complicated code, and painful rewrites later.

There are two approaches:

  • Embedding — store related data inside the same document
  • Referencing — store related data in a separate collection and link them by ID

Embedding

Embedding means putting related data directly inside a document as a nested object or array.

Embedded — One Document students collection "{ name: 'Ali Hassan', grade: '10th', address: { city: 'Lahore', country: 'Pakistan' }, subjects: ['Math', 'Physics'

Everything about Ali — his address, his subjects, his guardian — lives in one document. One query gets everything.

Example — Student with embedded address and guardian

db.students.insertOne({
  name: "Ali Hassan",
  age: 16,
  grade: "10th",
  address: {
    street: "12 Garden Road",
    city: "Lahore",
    country: "Pakistan"
  },
  guardian: {
    name: "Mr. Hassan Senior",
    phone: "0300-1234567",
    relation: "Father"
  },
  subjects: ["Math", "Physics", "English"]
})

To get Ali's city:

db.students.findOne({ name: "Ali Hassan" }, { "address.city": 1 })

One query. No joins. Fast.


Referencing

Referencing means storing related data in a separate collection and linking the two with an ID — similar to a foreign key in SQL.

Referenced — Two Collections references references students collection courses collection "{ name: 'Ali Hassan', grade: '10th', courseIds: [ ObjectId('course1'), ObjectId('course2') { _id: ObjectId('course1'), title: 'Mathematics', code: 'MATH-10'} { _id: ObjectId('course2'), title: 'Physics', code: 'PHY-10'}

The student document stores course IDs. The full course data lives in the courses collection. To get a student's course details, you need two queries or a $lookup.

Example — Student referencing courses

// Student document
db.students.insertOne({
  name: "Ali Hassan",
  age: 16,
  grade: "10th",
  courseIds: [
    new ObjectId("64a1f2c3e4b0a1b2c3d4e5f1"),
    new ObjectId("64a1f2c3e4b0a1b2c3d4e5f2")
  ]
})

// Course documents — in a separate collection
db.courses.insertMany([
  {
    _id: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f1"),
    title: "Mathematics",
    code: "MATH-10",
    grade: "10th"
  },
  {
    _id: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f2"),
    title: "Physics",
    code: "PHY-10",
    grade: "10th"
  }
])

To get Ali's course details:

db.students.aggregate([
  { $match: { name: "Ali Hassan" } },
  {
    $lookup: {
      from: "courses",
      localField: "courseIds",
      foreignField: "_id",
      as: "courses"
    }
  }
])

Two collections, one $lookup. More work — but sometimes the right choice.


Embedding vs Referencing — The Core Tradeoffs

Embedding ✅ Embedding ❌ Referencing ✅ Referencing ❌ One query gets everything No joins needed Fast reads Simple code Document can get very large Duplicate data if shared Hard to update shared data Data stays small and clean Shared data in one place Easy to update Needs $lookup to join Multiple queries Slower reads

When to Embed

Embed when the data belongs to only one document.

A student's address belongs to that student only. No other document needs it. Embed it.

// Good embed — address belongs to this student only
{
  name: "Ali Hassan",
  address: { city: "Lahore", country: "Pakistan" }
}

Embed when you always read the data together.

You always show a student's guardian info on their profile. You never fetch guardian info separately. Embed it.

// Good embed — always shown with student
{
  name: "Ali Hassan",
  guardian: { name: "Mr. Hassan", phone: "0300-1234567" }
}

Embed when the data is small and does not grow unboundedly.

A student's subjects — they take maybe 5–8 subjects. That is a small, bounded array. Embed it.

// Good embed — small, bounded array
{
  name: "Ali Hassan",
  subjects: ["Math", "Physics", "English"]
}

Embed when the data does not need to be updated independently.

A student's address rarely changes. When it does change, you update just that one student. Embedding is fine.


When to Reference

Reference when data is shared between multiple documents.

A course is taken by 30 students. If you embed course details inside every student, you have 30 copies of the same data. When the course name changes, you have to update 30 documents. Reference instead — course lives in one place.

// Bad — duplicating course data in every student
{ name: "Ali Hassan", course: { title: "Mathematics", code: "MATH-10", credits: 4 } }
{ name: "Sara Ahmed", course: { title: "Mathematics", code: "MATH-10", credits: 4 } }

// Good — reference the course by ID
{ name: "Ali Hassan", courseIds: [ObjectId("...")] }
{ name: "Sara Ahmed", courseIds: [ObjectId("...")] }

Reference when the embedded data grows unboundedly.

A student's attendance record grows every single school day — 200+ entries per year. Embedding all attendance records inside the student document will make it huge. Reference instead — attendance lives in its own collection.

// Bad — attendance array grows forever inside student
{
  name: "Ali Hassan",
  attendance: [
    { date: "2024-09-01", present: true },
    { date: "2024-09-02", present: false },
    // ... 200 more entries per year
  ]
}

// Good — attendance in its own collection
db.attendance.insertOne({
  studentId: ObjectId("..."),
  date: new Date("2024-09-01"),
  present: true
})

Reference when data needs to be queried independently.

Teachers need their own collection because you query them directly — find all teachers, find teachers by subject, update a teacher's info. If teachers were embedded inside courses, you could not query them independently without $lookup.


The 16MB Limit

MongoDB documents have a hard size limit of 16MB. This is rarely a problem for normal data — a typical student document with all their info is maybe a few KB. But if you embed an array that grows without limit, you can hit this ceiling.

This is another reason to reference data that grows over time — like attendance records, activity logs, or messages.

If you find yourself thinking "this array could grow to thousands of entries" — that is a signal to reference, not embed. A document that hits 16MB cannot be inserted or updated and will throw an error.


Our School System — What We Embed and What We Reference

Here is how we apply these rules to our school management system:

DataApproachReason
Student addressEmbedBelongs to one student, always read together
Student guardianEmbedBelongs to one student, small, rarely changes
Student subjects (list of names)EmbedSmall, bounded, belongs to student
Student grades (array of objects)EmbedBounded per semester, always read with student
Course detailsReferenceShared by many students, changes independently
Teacher detailsReferenceShared by many courses, queried independently
Attendance recordsReferenceGrows unboundedly over time
Exam results (detailed)ReferenceLarge, queried independently
// Final student document shape — what we embed
{
  name: "Ali Hassan",
  age: 16,
  grade: "10th",
  enrolled: true,
  enrollmentDate: new Date("2024-09-01"),

  // Embedded — belongs to student only
  address: {
    city: "Lahore",
    country: "Pakistan"
  },

  // Embedded — belongs to student only
  guardian: {
    name: "Mr. Hassan Senior",
    phone: "0300-1234567",
    relation: "Father"
  },

  // Embedded — small, bounded list
  subjects: ["Math", "Physics", "English"],

  // Embedded — bounded per semester
  grades: [
    { subject: "Math",    score: 88, semester: 1 },
    { subject: "Physics", score: 92, semester: 1 },
    { subject: "English", score: 79, semester: 1 }
  ],

  // Referenced — courses are shared across students
  courseIds: [
    new ObjectId("64a1f2c3e4b0a1b2c3d4e5f1"),
    new ObjectId("64a1f2c3e4b0a1b2c3d4e5f2")
  ]
}

The One Rule That Covers Most Cases

If you only remember one thing from this file, remember this:

Embed data that you always read together. Reference data that is shared, grows unboundedly, or needs to be updated independently.

Most schema design decisions in MongoDB come back to this single principle.


When you are unsure whether to embed or reference, start by asking — "how will I query this data most often?" If the answer is "always together with the parent document" — embed. If the answer is "sometimes on its own, sometimes with the parent" — reference. Design your schema around your query patterns, not around how the data looks in the real world.

On this page