Aggregation Pipeline
Learn what aggregation is, how the pipeline works, and why it is the most powerful tool in MongoDB for analyzing and transforming data.
Aggregation Pipeline
So far, every query we wrote either fetched documents as they are or filtered which ones to return. But sometimes you need more — you want to calculate something, group documents together, transform their shape, or combine data from multiple collections.
That is what aggregation is for.
What is Aggregation?
Aggregation is the process of transforming and analyzing data inside MongoDB itself — without pulling all the data into your application first.
Some real examples from our school system:
- How many students are in each grade?
- What is the average exam score per subject?
- Which teacher teaches the most students?
- What is the total fee collected this month?
You could fetch all documents and calculate these in JavaScript. But that means transferring thousands of documents over the network and doing the work in your app. Aggregation does the work inside MongoDB — only the final result comes back to your app.
What is a Pipeline?
An aggregation pipeline is a series of stages. Each stage takes the documents from the previous stage, transforms them in some way, and passes the results to the next stage.
Think of it like an assembly line. Raw documents go in one end. Each station on the line does something to them. The final result comes out the other end.
Each stage receives the output of the previous stage — not the original collection. This means you can chain as many stages as you need, and each one builds on the last.
Basic Syntax
db.collection.aggregate([
{ $stage1: { ... } },
{ $stage2: { ... } },
{ $stage3: { ... } }
])You pass an array of stage objects. Each object has one key — the stage name starting with $ — and its configuration as the value.
A First Pipeline
Let's write our first aggregation pipeline. We want to know how many students are in each grade — only counting enrolled students.
db.students.aggregate([
// Stage 1 — filter to enrolled students only
{
$match: { enrolled: true }
},
// Stage 2 — group by grade and count
{
$group: {
_id: "$grade",
totalStudents: { $sum: 1 }
}
},
// Stage 3 — sort by total students descending
{
$sort: { totalStudents: -1 }
}
])Result:
[
{ _id: "10th", totalStudents: 45 },
{ _id: "11th", totalStudents: 38 },
{ _id: "9th", totalStudents: 41 }
]Let's trace what happened:
Stage 1 — $match: Out of all students, only enrolled ones pass through. Unenrolled students are filtered out here and never reach the next stage.
Stage 2 — $group: The remaining documents are grouped by grade. For each group, we count the documents using $sum: 1 — adding 1 for each document in the group.
Stage 3 — $sort: The grouped results are sorted by totalStudents in descending order — highest count first.
How Stages See Data
This is the most important concept to understand about pipelines — each stage only sees the output of the previous stage, not the original collection.
After $group, the pipeline no longer has individual student documents — it has one summary document per grade. The $sort stage that comes after works on those summary documents, not the original students.
This means the order of your stages matters a lot. A $match early in the pipeline reduces the number of documents, making later stages faster. Always put $match as early as possible.
Pipeline vs find()
You might wonder — when should I use aggregation instead of find()?
Use find() when:
- You want to fetch documents as they are
- You just need to filter, sort, or limit results
- You do not need to calculate or transform data
Use aggregation when:
- You need to group documents and calculate totals, averages, counts
- You need to reshape documents — add computed fields, rename fields, restructure
- You need to join data from multiple collections
- You need to work with arrays in complex ways
// This is a find() job — just fetch enrolled students
db.students.find({ enrolled: true })
// This is an aggregation job — calculate average age per grade
db.students.aggregate([
{ $match: { enrolled: true } },
{ $group: { _id: "$grade", avgAge: { $avg: "$age" } } }
])Field References in Aggregation
Inside aggregation stages, you reference field values by prefixing the field name with $. This tells MongoDB to use the value of that field — not treat it as a literal string.
// "$grade" means "the value of the grade field"
{ $group: { _id: "$grade" } }
// "$age" means "the value of the age field"
{ $group: { _id: "$grade", avgAge: { $avg: "$age" } } }Without the $ prefix, MongoDB treats it as a literal string:
// Wrong — groups everything under the literal string "grade"
{ $group: { _id: "grade" } }
// Correct — groups by the actual value of the grade field
{ $group: { _id: "$grade" } }This $fieldName syntax appears everywhere in aggregation. Get comfortable with it — you will write it hundreds of times.
A More Complex Pipeline
Let's build a more complete pipeline for our school system. We want the average exam score per grade, only for enrolled students, sorted from highest to lowest average, showing only the top 3 grades:
db.students.aggregate([
// Stage 1 — only enrolled students
{
$match: { enrolled: true }
},
// Stage 2 — group by grade, calculate average score
{
$group: {
_id: "$grade",
averageScore: { $avg: "$examScore" },
totalStudents: { $sum: 1 }
}
},
// Stage 3 — sort by average score descending
{
$sort: { averageScore: -1 }
},
// Stage 4 — only top 3 grades
{
$limit: 3
},
// Stage 5 — rename _id to grade for cleaner output
{
$project: {
_id: 0,
grade: "$_id",
averageScore: 1,
totalStudents: 1
}
}
])Result:
[
{ grade: "11th", averageScore: 84.5, totalStudents: 38 },
{ grade: "10th", averageScore: 81.2, totalStudents: 45 },
{ grade: "9th", averageScore: 78.9, totalStudents: 41 }
]Five stages, each doing one thing, building on the last. This is the power of the aggregation pipeline.
Performance Tips
Put $match first — filter documents as early as possible. Every document that gets filtered out in $match does not need to be processed by any later stage.
// Good — filter early
db.students.aggregate([
{ $match: { enrolled: true } }, // filter first
{ $group: { ... } }
])
// Bad — filter late
db.students.aggregate([
{ $group: { ... } }, // processes all documents
{ $match: { enrolled: true } } // filters after grouping
])Put $limit early when possible — if you only need the top 5 results, limit early so later stages process fewer documents.
Use indexes with $match — if the $match stage filters on an indexed field, MongoDB uses the index and the pipeline runs much faster.
MongoDB can use indexes in aggregation pipelines — but only for $match and $sort stages that appear at the beginning of the pipeline. Once any other stage runs, the index can no longer be used. This is another reason to put $match first.
Quick Summary
| Concept | What it means |
|---|---|
| Aggregation | Processing and transforming data inside MongoDB |
| Pipeline | A series of stages applied in order |
| Stage | One transformation step — filter, group, sort, reshape |
$fieldName | Reference the value of a field inside a stage |
| Stage order | Matters — each stage sees only the previous stage's output |
When building a complex pipeline, build it one stage at a time. Start with just $match, run it, check the output. Add $group, run it, check the output. Keep adding stages and verifying at each step. This makes it much easier to catch mistakes than writing all stages at once.
Using explain()
Learn how to use explain() to read query plans, understand collection scans vs index scans, and confirm your indexes are working.
$match and $group
Learn how to filter documents with $match and group them with $group to calculate counts, sums, and averages in MongoDB aggregation pipelines.