Count Word Frequency
Learn how to count how many times each word appears in a string in Python.
Count Word Frequency
Problem
Given a string, count how many times each word appears. Return the results sorted by frequency — most common first.
Input: "the cat sat on the mat the cat"
Output:
the → 3
cat → 2
sat → 1
on → 1
mat → 1
Input: "apple banana apple cherry banana apple"
Output:
apple → 3
banana → 2
cherry → 1Logic
- Clean the text — lowercase, remove punctuation
- Split into individual words
- Loop through each word
- Count how many times each word appears
- Sort by count — most frequent first
- Display the results
Flow
Solution 1 — using a dictionary
Build a dictionary where each key is a word and each value is its count.
def word_frequency(text):
# step 1 — clean the text
# lowercase so "The" and "the" are the same word
text = text.lower()
# remove common punctuation
for char in ".,!?;:\"'()[]{}":
text = text.replace(char, "")
# step 2 — split into words
words = text.split()
# step 3 — count each word
frequency = {}
for word in words:
if word in frequency:
frequency[word] += 1 # seen before — increment
else:
frequency[word] = 1 # first time — set to 1
# step 4 — sort by count descending
sorted_freq = sorted(frequency.items(), key=lambda x: x[1], reverse=True)
return sorted_freq
results = word_frequency("the cat sat on the mat the cat")
for word, count in results:
print(f" {word:<10} → {count}")Code Execution — Solution 1
Trace through word_frequency("the cat sat on the mat the cat"):
After cleaning and splitting:
words = ["the", "cat", "sat", "on", "the", "mat", "the", "cat"]
Building the frequency dictionary:
| Step | word | in frequency? | frequency |
|---|---|---|---|
| 1st | "the" | No | {"the": 1} |
| 2nd | "cat" | No | {"the": 1, "cat": 1} |
| 3rd | "sat" | No | {"the": 1, "cat": 1, "sat": 1} |
| 4th | "on" | No | {"the": 1, "cat": 1, "sat": 1, "on": 1} |
| 5th | "the" | Yes | {"the": 2, "cat": 1, "sat": 1, "on": 1} |
| 6th | "mat" | No | {"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1} |
| 7th | "the" | Yes | {"the": 3, "cat": 1, "sat": 1, "on": 1, "mat": 1} |
| 8th | "cat" | Yes | {"the": 3, "cat": 2, "sat": 1, "on": 1, "mat": 1} |
Sorting by value descending:
[("the", 3), ("cat", 2), ("sat", 1), ("on", 1), ("mat", 1)]
Solution 2 — using dict.get()
A cleaner way to update counts — get() returns 0 if the key does not exist yet.
def word_frequency(text):
# clean and split
text = text.lower()
for char in ".,!?;:\"'":
text = text.replace(char, "")
words = text.split()
frequency = {}
for word in words:
# get current count (default 0) and add 1
frequency[word] = frequency.get(word, 0) + 1
# sort by count descending
return sorted(frequency.items(), key=lambda x: x[1], reverse=True)
results = word_frequency("apple banana apple cherry banana apple")
for word, count in results:
print(f" {word:<10} → {count}")Code Execution — Solution 2
Trace through word_frequency("apple banana apple cherry banana apple"):
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
| Step | word | frequency.get(word, 0) | + 1 | frequency |
|---|---|---|---|---|
| 1st | "apple" | 0 | 1 | {"apple": 1} |
| 2nd | "banana" | 0 | 1 | {"apple": 1, "banana": 1} |
| 3rd | "apple" | 1 | 2 | {"apple": 2, "banana": 1} |
| 4th | "cherry" | 0 | 1 | {"apple": 2, "banana": 1, "cherry": 1} |
| 5th | "banana" | 1 | 2 | {"apple": 2, "banana": 2, "cherry": 1} |
| 6th | "apple" | 2 | 3 | {"apple": 3, "banana": 2, "cherry": 1} |
Sorted: [("apple", 3), ("banana", 2), ("cherry", 1)]
frequency.get(word, 0) + 1 is cleaner than checking if word in frequency. It reads as — get the current count for this word, default to 0 if it does not exist yet, then add 1.
Solution 3 — using collections.Counter
Python's Counter class is built exactly for this purpose. It counts items in any iterable automatically.
from collections import Counter
def word_frequency(text):
# clean and split
text = text.lower()
for char in ".,!?;:\"'":
text = text.replace(char, "")
words = text.split()
# Counter counts everything automatically
frequency = Counter(words)
# most_common() returns items sorted by count — highest first
return frequency.most_common()
results = word_frequency("the cat sat on the mat the cat")
for word, count in results:
print(f" {word:<10} → {count}")
# bonus — top 3 most common words
print("\nTop 3:")
for word, count in word_frequency("the cat sat on the mat the cat")[:3]:
print(f" {word} ({count})")Code Execution — Solution 3
Trace through Counter(["the", "cat", "sat", "on", "the", "mat", "the", "cat"]):
Counter internally does what Solution 1 and 2 do — but written in optimized C code.
Result: Counter({"the": 3, "cat": 2, "sat": 1, "on": 1, "mat": 1})
most_common() — returns all items sorted by count:
[("the", 3), ("cat", 2), ("sat", 1), ("on", 1), ("mat", 1)]
most_common(3) — returns only the top 3:
[("the", 3), ("cat", 2), ("sat", 1)]
Solution 4 — using defaultdict
defaultdict(int) automatically creates a value of 0 for any new key — no need for get() or if checks.
from collections import defaultdict
def word_frequency(text):
# clean and split
text = text.lower()
for char in ".,!?;:\"'":
text = text.replace(char, "")
words = text.split()
# defaultdict(int) starts every new key at 0 automatically
frequency = defaultdict(int)
for word in words:
frequency[word] += 1 # no KeyError — new keys start at 0
# sort by count descending
return sorted(frequency.items(), key=lambda x: x[1], reverse=True)
results = word_frequency("the cat sat on the mat the cat")
for word, count in results:
print(f" {word:<10} → {count}")Code Execution — Solution 4
Trace through first 3 words ["the", "cat", "the"]:
| Step | word | frequency[word] before | += 1 | frequency |
|---|---|---|---|---|
| 1st | "the" | 0 (auto) | 1 | {"the": 1} |
| 2nd | "cat" | 0 (auto) | 1 | {"the": 1, "cat": 1} |
| 3rd | "the" | 1 | 2 | {"the": 2, "cat": 1} |
With defaultdict(int), accessing a key that does not exist automatically creates it with value 0. So frequency["new_word"] += 1 works without any if check or .get().
Bonus — frequency with percentage
Show each word as a percentage of total words.
from collections import Counter
def word_frequency_with_percent(text):
text = text.lower()
for char in ".,!?;:\"'":
text = text.replace(char, "")
words = text.split()
total = len(words)
frequency = Counter(words)
print(f"{'Word':<12} {'Count':>6} {'Percent':>8}")
print("-" * 28)
for word, count in frequency.most_common():
percent = (count / total) * 100
print(f"{word:<12} {count:>6} {percent:>7.1f}%")
word_frequency_with_percent("the cat sat on the mat the cat")Output:
Word Count Percent
----------------------------
the 3 37.5%
cat 2 25.0%
sat 1 12.5%
on 1 12.5%
mat 1 12.5%Which solution to use?
| Solution | How | Best when |
|---|---|---|
| Solution 1 | Plain dictionary + if/else | Understanding the logic |
| Solution 2 | dict.get() | Cleaner version of Solution 1 |
| Solution 3 | Counter | Real projects — cleanest and fastest |
| Solution 4 | defaultdict | When building frequency maps for complex data |
Output
the → 3
cat → 2
sat → 1
on → 1
mat → 1
apple → 3
banana → 2
cherry → 1