Count Word Frequency

Problem

Given a string, count how many times each word appears. Return the results sorted by frequency — most common first.

Input:  "the cat sat on the mat the cat"
Output:
  the  → 3
  cat  → 2
  sat  → 1
  on   → 1
  mat  → 1

Input:  "apple banana apple cherry banana apple"
Output:
  apple  → 3
  banana → 2
  cherry → 1

Logic

Clean the text — lowercase, remove punctuation
Split into individual words
Loop through each word
Count how many times each word appears
Sort by count — most frequent first
Display the results

Flow

Solution 1 — using a dictionary

Build a dictionary where each key is a word and each value is its count.

def word_frequency(text):
    # step 1 — clean the text
    # lowercase so "The" and "the" are the same word
    text = text.lower()

    # remove common punctuation
    for char in ".,!?;:\"'()[]{}":
        text = text.replace(char, "")

    # step 2 — split into words
    words = text.split()

    # step 3 — count each word
    frequency = {}

    for word in words:
        if word in frequency:
            frequency[word] += 1    # seen before — increment
        else:
            frequency[word] = 1     # first time — set to 1

    # step 4 — sort by count descending
    sorted_freq = sorted(frequency.items(), key=lambda x: x[1], reverse=True)

    return sorted_freq


results = word_frequency("the cat sat on the mat the cat")

for word, count in results:
    print(f"  {word:<10} → {count}")

Code Execution — Solution 1

Trace through word_frequency("the cat sat on the mat the cat"):

After cleaning and splitting: words = ["the", "cat", "sat", "on", "the", "mat", "the", "cat"]

Building the frequency dictionary:

Step	`word`	`in frequency?`	`frequency`
1st	`"the"`	No	`{"the": 1}`
2nd	`"cat"`	No	`{"the": 1, "cat": 1}`
3rd	`"sat"`	No	`{"the": 1, "cat": 1, "sat": 1}`
4th	`"on"`	No	`{"the": 1, "cat": 1, "sat": 1, "on": 1}`
5th	`"the"`	Yes	`{"the": 2, "cat": 1, "sat": 1, "on": 1}`
6th	`"mat"`	No	`{"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1}`
7th	`"the"`	Yes	`{"the": 3, "cat": 1, "sat": 1, "on": 1, "mat": 1}`
8th	`"cat"`	Yes	`{"the": 3, "cat": 2, "sat": 1, "on": 1, "mat": 1}`

Sorting by value descending: [("the", 3), ("cat", 2), ("sat", 1), ("on", 1), ("mat", 1)]

Solution 2 — using dict.get()

A cleaner way to update counts — get() returns 0 if the key does not exist yet.

def word_frequency(text):
    # clean and split
    text = text.lower()
    for char in ".,!?;:\"'":
        text = text.replace(char, "")
    words = text.split()

    frequency = {}

    for word in words:
        # get current count (default 0) and add 1
        frequency[word] = frequency.get(word, 0) + 1

    # sort by count descending
    return sorted(frequency.items(), key=lambda x: x[1], reverse=True)


results = word_frequency("apple banana apple cherry banana apple")

for word, count in results:
    print(f"  {word:<10} → {count}")

Code Execution — Solution 2

Trace through word_frequency("apple banana apple cherry banana apple"):

words = ["apple", "banana", "apple", "cherry", "banana", "apple"]

Step	`word`	`frequency.get(word, 0)`	`+ 1`	`frequency`
1st	`"apple"`	`0`	`1`	`{"apple": 1}`
2nd	`"banana"`	`0`	`1`	`{"apple": 1, "banana": 1}`
3rd	`"apple"`	`1`	`2`	`{"apple": 2, "banana": 1}`
4th	`"cherry"`	`0`	`1`	`{"apple": 2, "banana": 1, "cherry": 1}`
5th	`"banana"`	`1`	`2`	`{"apple": 2, "banana": 2, "cherry": 1}`
6th	`"apple"`	`2`	`3`	`{"apple": 3, "banana": 2, "cherry": 1}`

Sorted: [("apple", 3), ("banana", 2), ("cherry", 1)]

frequency.get(word, 0) + 1 is cleaner than checking if word in frequency. It reads as — get the current count for this word, default to 0 if it does not exist yet, then add 1.

Solution 3 — using collections.Counter

Python's Counter class is built exactly for this purpose. It counts items in any iterable automatically.

from collections import Counter

def word_frequency(text):
    # clean and split
    text = text.lower()
    for char in ".,!?;:\"'":
        text = text.replace(char, "")
    words = text.split()

    # Counter counts everything automatically
    frequency = Counter(words)

    # most_common() returns items sorted by count — highest first
    return frequency.most_common()


results = word_frequency("the cat sat on the mat the cat")

for word, count in results:
    print(f"  {word:<10} → {count}")

# bonus — top 3 most common words
print("\nTop 3:")
for word, count in word_frequency("the cat sat on the mat the cat")[:3]:
    print(f"  {word} ({count})")

Code Execution — Solution 3

Trace through Counter(["the", "cat", "sat", "on", "the", "mat", "the", "cat"]):

Counter internally does what Solution 1 and 2 do — but written in optimized C code.

Result: Counter({"the": 3, "cat": 2, "sat": 1, "on": 1, "mat": 1})

most_common() — returns all items sorted by count: [("the", 3), ("cat", 2), ("sat", 1), ("on", 1), ("mat", 1)]

most_common(3) — returns only the top 3: [("the", 3), ("cat", 2), ("sat", 1)]

Solution 4 — using defaultdict

defaultdict(int) automatically creates a value of 0 for any new key — no need for get() or if checks.

from collections import defaultdict

def word_frequency(text):
    # clean and split
    text = text.lower()
    for char in ".,!?;:\"'":
        text = text.replace(char, "")
    words = text.split()

    # defaultdict(int) starts every new key at 0 automatically
    frequency = defaultdict(int)

    for word in words:
        frequency[word] += 1   # no KeyError — new keys start at 0

    # sort by count descending
    return sorted(frequency.items(), key=lambda x: x[1], reverse=True)


results = word_frequency("the cat sat on the mat the cat")

for word, count in results:
    print(f"  {word:<10} → {count}")

Code Execution — Solution 4

Trace through first 3 words ["the", "cat", "the"]:

Step	`word`	`frequency[word]` before	`+= 1`	`frequency`
1st	`"the"`	`0` (auto)	`1`	`{"the": 1}`
2nd	`"cat"`	`0` (auto)	`1`	`{"the": 1, "cat": 1}`
3rd	`"the"`	`1`	`2`	`{"the": 2, "cat": 1}`

With defaultdict(int), accessing a key that does not exist automatically creates it with value 0. So frequency["new_word"] += 1 works without any if check or .get().

Bonus — frequency with percentage

Show each word as a percentage of total words.

from collections import Counter

def word_frequency_with_percent(text):
    text = text.lower()
    for char in ".,!?;:\"'":
        text = text.replace(char, "")
    words = text.split()

    total = len(words)
    frequency = Counter(words)

    print(f"{'Word':<12} {'Count':>6} {'Percent':>8}")
    print("-" * 28)

    for word, count in frequency.most_common():
        percent = (count / total) * 100
        print(f"{word:<12} {count:>6} {percent:>7.1f}%")


word_frequency_with_percent("the cat sat on the mat the cat")

Output:

Word           Count  Percent
----------------------------
the                3    37.5%
cat                2    25.0%
sat                1    12.5%
on                 1    12.5%
mat                1    12.5%

Which solution to use?

Solution	How	Best when
Solution 1	Plain dictionary + if/else	Understanding the logic
Solution 2	`dict.get()`	Cleaner version of Solution 1
Solution 3	`Counter`	Real projects — cleanest and fastest
Solution 4	`defaultdict`	When building frequency maps for complex data

Output

  the        → 3
  cat        → 2
  sat        → 1
  on         → 1
  mat        → 1

  apple      → 3
  banana     → 2
  cherry     → 1

Count Word Frequency

On this page