Learn when to embed data inside a document and when to reference another collection — the most important decision in MongoDB schema design.

Document Model

The most important decision you make when designing a MongoDB schema is this — should this data live inside the document, or in a separate collection?

Get this right and your app is fast, simple, and easy to maintain. Get it wrong and you end up with slow queries, complicated code, and painful rewrites later.

There are two approaches:

Embedding — store related data inside the same document
Referencing — store related data in a separate collection and link them by ID

Referencing

Referencing means storing related data in a separate collection and linking the two with an ID — similar to a foreign key in SQL.

The student document stores course IDs. The full course data lives in the courses collection. To get a student's course details, you need two queries or a $lookup.

Example — Student referencing courses

// Student document
db.students.insertOne({
  name: "Ali Hassan",
  age: 16,
  grade: "10th",
  courseIds: [
    new ObjectId("64a1f2c3e4b0a1b2c3d4e5f1"),
    new ObjectId("64a1f2c3e4b0a1b2c3d4e5f2")
  ]
})

// Course documents — in a separate collection
db.courses.insertMany([
  {
    _id: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f1"),
    title: "Mathematics",
    code: "MATH-10",
    grade: "10th"
  },
  {
    _id: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f2"),
    title: "Physics",
    code: "PHY-10",
    grade: "10th"
  }
])

To get Ali's course details:

db.students.aggregate([
  { $match: { name: "Ali Hassan" } },
  {
    $lookup: {
      from: "courses",
      localField: "courseIds",
      foreignField: "_id",
      as: "courses"
    }
  }
])

Two collections, one $lookup. More work — but sometimes the right choice.

Embedding vs Referencing — The Core Tradeoffs

When to Embed

Embed when the data belongs to only one document.

A student's address belongs to that student only. No other document needs it. Embed it.

// Good embed — address belongs to this student only
{
  name: "Ali Hassan",
  address: { city: "Lahore", country: "Pakistan" }
}

Embed when you always read the data together.

You always show a student's guardian info on their profile. You never fetch guardian info separately. Embed it.

// Good embed — always shown with student
{
  name: "Ali Hassan",
  guardian: { name: "Mr. Hassan", phone: "0300-1234567" }
}

Embed when the data is small and does not grow unboundedly.

A student's subjects — they take maybe 5–8 subjects. That is a small, bounded array. Embed it.

// Good embed — small, bounded array
{
  name: "Ali Hassan",
  subjects: ["Math", "Physics", "English"]
}

Embed when the data does not need to be updated independently.

A student's address rarely changes. When it does change, you update just that one student. Embedding is fine.

When to Reference

Reference when data is shared between multiple documents.

A course is taken by 30 students. If you embed course details inside every student, you have 30 copies of the same data. When the course name changes, you have to update 30 documents. Reference instead — course lives in one place.

// Bad — duplicating course data in every student
{ name: "Ali Hassan", course: { title: "Mathematics", code: "MATH-10", credits: 4 } }
{ name: "Sara Ahmed", course: { title: "Mathematics", code: "MATH-10", credits: 4 } }

// Good — reference the course by ID
{ name: "Ali Hassan", courseIds: [ObjectId("...")] }
{ name: "Sara Ahmed", courseIds: [ObjectId("...")] }

Reference when the embedded data grows unboundedly.

A student's attendance record grows every single school day — 200+ entries per year. Embedding all attendance records inside the student document will make it huge. Reference instead — attendance lives in its own collection.

// Bad — attendance array grows forever inside student
{
  name: "Ali Hassan",
  attendance: [
    { date: "2024-09-01", present: true },
    { date: "2024-09-02", present: false },
    // ... 200 more entries per year
  ]
}

// Good — attendance in its own collection
db.attendance.insertOne({
  studentId: ObjectId("..."),
  date: new Date("2024-09-01"),
  present: true
})

Reference when data needs to be queried independently.

Teachers need their own collection because you query them directly — find all teachers, find teachers by subject, update a teacher's info. If teachers were embedded inside courses, you could not query them independently without $lookup.

The 16MB Limit

MongoDB documents have a hard size limit of 16MB. This is rarely a problem for normal data — a typical student document with all their info is maybe a few KB. But if you embed an array that grows without limit, you can hit this ceiling.

This is another reason to reference data that grows over time — like attendance records, activity logs, or messages.

If you find yourself thinking "this array could grow to thousands of entries" — that is a signal to reference, not embed. A document that hits 16MB cannot be inserted or updated and will throw an error.

Our School System — What We Embed and What We Reference

Here is how we apply these rules to our school management system:

Data	Approach	Reason
Student address	Embed	Belongs to one student, always read together
Student guardian	Embed	Belongs to one student, small, rarely changes
Student subjects (list of names)	Embed	Small, bounded, belongs to student
Student grades (array of objects)	Embed	Bounded per semester, always read with student
Course details	Reference	Shared by many students, changes independently
Teacher details	Reference	Shared by many courses, queried independently
Attendance records	Reference	Grows unboundedly over time
Exam results (detailed)	Reference	Large, queried independently

// Final student document shape — what we embed
{
  name: "Ali Hassan",
  age: 16,
  grade: "10th",
  enrolled: true,
  enrollmentDate: new Date("2024-09-01"),

  // Embedded — belongs to student only
  address: {
    city: "Lahore",
    country: "Pakistan"
  },

  // Embedded — belongs to student only
  guardian: {
    name: "Mr. Hassan Senior",
    phone: "0300-1234567",
    relation: "Father"
  },

  // Embedded — small, bounded list
  subjects: ["Math", "Physics", "English"],

  // Embedded — bounded per semester
  grades: [
    { subject: "Math",    score: 88, semester: 1 },
    { subject: "Physics", score: 92, semester: 1 },
    { subject: "English", score: 79, semester: 1 }
  ],

  // Referenced — courses are shared across students
  courseIds: [
    new ObjectId("64a1f2c3e4b0a1b2c3d4e5f1"),
    new ObjectId("64a1f2c3e4b0a1b2c3d4e5f2")
  ]
}

The One Rule That Covers Most Cases

If you only remember one thing from this file, remember this:

Embed data that you always read together. Reference data that is shared, grows unboundedly, or needs to be updated independently.

Most schema design decisions in MongoDB come back to this single principle.

When you are unsure whether to embed or reference, start by asking — "how will I query this data most often?" If the answer is "always together with the parent document" — embed. If the answer is "sometimes on its own, sometimes with the parent" — reference. Design your schema around your query patterns, not around how the data looks in the real world.

Document Model

Document Model

Embedding

Example — Student with embedded address and guardian

Referencing

Example — Student referencing courses

Embedding vs Referencing — The Core Tradeoffs

When to Embed

When to Reference

The 16MB Limit

Our School System — What We Embed and What We Reference

The One Rule That Covers Most Cases

On this page