Document Model
Learn when to embed data inside a document and when to reference another collection — the most important decision in MongoDB schema design.
Document Model
The most important decision you make when designing a MongoDB schema is this — should this data live inside the document, or in a separate collection?
Get this right and your app is fast, simple, and easy to maintain. Get it wrong and you end up with slow queries, complicated code, and painful rewrites later.
There are two approaches:
- Embedding — store related data inside the same document
- Referencing — store related data in a separate collection and link them by ID
Embedding
Embedding means putting related data directly inside a document as a nested object or array.
Everything about Ali — his address, his subjects, his guardian — lives in one document. One query gets everything.
Example — Student with embedded address and guardian
db.students.insertOne({
name: "Ali Hassan",
age: 16,
grade: "10th",
address: {
street: "12 Garden Road",
city: "Lahore",
country: "Pakistan"
},
guardian: {
name: "Mr. Hassan Senior",
phone: "0300-1234567",
relation: "Father"
},
subjects: ["Math", "Physics", "English"]
})To get Ali's city:
db.students.findOne({ name: "Ali Hassan" }, { "address.city": 1 })One query. No joins. Fast.
Referencing
Referencing means storing related data in a separate collection and linking the two with an ID — similar to a foreign key in SQL.
The student document stores course IDs. The full course data lives in the courses collection. To get a student's course details, you need two queries or a $lookup.
Example — Student referencing courses
// Student document
db.students.insertOne({
name: "Ali Hassan",
age: 16,
grade: "10th",
courseIds: [
new ObjectId("64a1f2c3e4b0a1b2c3d4e5f1"),
new ObjectId("64a1f2c3e4b0a1b2c3d4e5f2")
]
})
// Course documents — in a separate collection
db.courses.insertMany([
{
_id: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f1"),
title: "Mathematics",
code: "MATH-10",
grade: "10th"
},
{
_id: new ObjectId("64a1f2c3e4b0a1b2c3d4e5f2"),
title: "Physics",
code: "PHY-10",
grade: "10th"
}
])To get Ali's course details:
db.students.aggregate([
{ $match: { name: "Ali Hassan" } },
{
$lookup: {
from: "courses",
localField: "courseIds",
foreignField: "_id",
as: "courses"
}
}
])Two collections, one $lookup. More work — but sometimes the right choice.
Embedding vs Referencing — The Core Tradeoffs
When to Embed
Embed when the data belongs to only one document.
A student's address belongs to that student only. No other document needs it. Embed it.
// Good embed — address belongs to this student only
{
name: "Ali Hassan",
address: { city: "Lahore", country: "Pakistan" }
}Embed when you always read the data together.
You always show a student's guardian info on their profile. You never fetch guardian info separately. Embed it.
// Good embed — always shown with student
{
name: "Ali Hassan",
guardian: { name: "Mr. Hassan", phone: "0300-1234567" }
}Embed when the data is small and does not grow unboundedly.
A student's subjects — they take maybe 5–8 subjects. That is a small, bounded array. Embed it.
// Good embed — small, bounded array
{
name: "Ali Hassan",
subjects: ["Math", "Physics", "English"]
}Embed when the data does not need to be updated independently.
A student's address rarely changes. When it does change, you update just that one student. Embedding is fine.
When to Reference
Reference when data is shared between multiple documents.
A course is taken by 30 students. If you embed course details inside every student, you have 30 copies of the same data. When the course name changes, you have to update 30 documents. Reference instead — course lives in one place.
// Bad — duplicating course data in every student
{ name: "Ali Hassan", course: { title: "Mathematics", code: "MATH-10", credits: 4 } }
{ name: "Sara Ahmed", course: { title: "Mathematics", code: "MATH-10", credits: 4 } }
// Good — reference the course by ID
{ name: "Ali Hassan", courseIds: [ObjectId("...")] }
{ name: "Sara Ahmed", courseIds: [ObjectId("...")] }Reference when the embedded data grows unboundedly.
A student's attendance record grows every single school day — 200+ entries per year. Embedding all attendance records inside the student document will make it huge. Reference instead — attendance lives in its own collection.
// Bad — attendance array grows forever inside student
{
name: "Ali Hassan",
attendance: [
{ date: "2024-09-01", present: true },
{ date: "2024-09-02", present: false },
// ... 200 more entries per year
]
}
// Good — attendance in its own collection
db.attendance.insertOne({
studentId: ObjectId("..."),
date: new Date("2024-09-01"),
present: true
})Reference when data needs to be queried independently.
Teachers need their own collection because you query them directly — find all teachers, find teachers by subject, update a teacher's info. If teachers were embedded inside courses, you could not query them independently without $lookup.
The 16MB Limit
MongoDB documents have a hard size limit of 16MB. This is rarely a problem for normal data — a typical student document with all their info is maybe a few KB. But if you embed an array that grows without limit, you can hit this ceiling.
This is another reason to reference data that grows over time — like attendance records, activity logs, or messages.
If you find yourself thinking "this array could grow to thousands of entries" — that is a signal to reference, not embed. A document that hits 16MB cannot be inserted or updated and will throw an error.
Our School System — What We Embed and What We Reference
Here is how we apply these rules to our school management system:
| Data | Approach | Reason |
|---|---|---|
| Student address | Embed | Belongs to one student, always read together |
| Student guardian | Embed | Belongs to one student, small, rarely changes |
| Student subjects (list of names) | Embed | Small, bounded, belongs to student |
| Student grades (array of objects) | Embed | Bounded per semester, always read with student |
| Course details | Reference | Shared by many students, changes independently |
| Teacher details | Reference | Shared by many courses, queried independently |
| Attendance records | Reference | Grows unboundedly over time |
| Exam results (detailed) | Reference | Large, queried independently |
// Final student document shape — what we embed
{
name: "Ali Hassan",
age: 16,
grade: "10th",
enrolled: true,
enrollmentDate: new Date("2024-09-01"),
// Embedded — belongs to student only
address: {
city: "Lahore",
country: "Pakistan"
},
// Embedded — belongs to student only
guardian: {
name: "Mr. Hassan Senior",
phone: "0300-1234567",
relation: "Father"
},
// Embedded — small, bounded list
subjects: ["Math", "Physics", "English"],
// Embedded — bounded per semester
grades: [
{ subject: "Math", score: 88, semester: 1 },
{ subject: "Physics", score: 92, semester: 1 },
{ subject: "English", score: 79, semester: 1 }
],
// Referenced — courses are shared across students
courseIds: [
new ObjectId("64a1f2c3e4b0a1b2c3d4e5f1"),
new ObjectId("64a1f2c3e4b0a1b2c3d4e5f2")
]
}The One Rule That Covers Most Cases
If you only remember one thing from this file, remember this:
Embed data that you always read together. Reference data that is shared, grows unboundedly, or needs to be updated independently.
Most schema design decisions in MongoDB come back to this single principle.
When you are unsure whether to embed or reference, start by asking — "how will I query this data most often?" If the answer is "always together with the parent document" — embed. If the answer is "sometimes on its own, sometimes with the parent" — reference. Design your schema around your query patterns, not around how the data looks in the real world.