DocsHub
PythonIntermediate

Count Word Frequency

Learn how to count how many times each word appears in a string in Python.

Count Word Frequency

Problem

Given a string, count how many times each word appears. Return the results sorted by frequency — most common first.

Input:  "the cat sat on the mat the cat"
Output:
  the  → 3
  cat  → 2
  sat  → 1
  on   → 1
  mat  → 1

Input:  "apple banana apple cherry banana apple"
Output:
  apple  → 3
  banana → 2
  cherry → 1

Logic

  1. Clean the text — lowercase, remove punctuation
  2. Split into individual words
  3. Loop through each word
  4. Count how many times each word appears
  5. Sort by count — most frequent first
  6. Display the results

Flow

Yes No Yes No Start Clean and lowercase the text Split into words frequency = empty dict Take each word word in frequency? frequency word += 1 frequency word = 1 More words? Sort by count descending Return sorted result

Solution 1 — using a dictionary

Build a dictionary where each key is a word and each value is its count.

def word_frequency(text):
    # step 1 — clean the text
    # lowercase so "The" and "the" are the same word
    text = text.lower()

    # remove common punctuation
    for char in ".,!?;:\"'()[]{}":
        text = text.replace(char, "")

    # step 2 — split into words
    words = text.split()

    # step 3 — count each word
    frequency = {}

    for word in words:
        if word in frequency:
            frequency[word] += 1    # seen before — increment
        else:
            frequency[word] = 1     # first time — set to 1

    # step 4 — sort by count descending
    sorted_freq = sorted(frequency.items(), key=lambda x: x[1], reverse=True)

    return sorted_freq


results = word_frequency("the cat sat on the mat the cat")

for word, count in results:
    print(f"  {word:<10}{count}")

Code Execution — Solution 1

Trace through word_frequency("the cat sat on the mat the cat"):

After cleaning and splitting: words = ["the", "cat", "sat", "on", "the", "mat", "the", "cat"]

Building the frequency dictionary:

Stepwordin frequency?frequency
1st"the"No{"the": 1}
2nd"cat"No{"the": 1, "cat": 1}
3rd"sat"No{"the": 1, "cat": 1, "sat": 1}
4th"on"No{"the": 1, "cat": 1, "sat": 1, "on": 1}
5th"the"Yes{"the": 2, "cat": 1, "sat": 1, "on": 1}
6th"mat"No{"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1}
7th"the"Yes{"the": 3, "cat": 1, "sat": 1, "on": 1, "mat": 1}
8th"cat"Yes{"the": 3, "cat": 2, "sat": 1, "on": 1, "mat": 1}

Sorting by value descending: [("the", 3), ("cat", 2), ("sat", 1), ("on", 1), ("mat", 1)]


Solution 2 — using dict.get()

A cleaner way to update counts — get() returns 0 if the key does not exist yet.

def word_frequency(text):
    # clean and split
    text = text.lower()
    for char in ".,!?;:\"'":
        text = text.replace(char, "")
    words = text.split()

    frequency = {}

    for word in words:
        # get current count (default 0) and add 1
        frequency[word] = frequency.get(word, 0) + 1

    # sort by count descending
    return sorted(frequency.items(), key=lambda x: x[1], reverse=True)


results = word_frequency("apple banana apple cherry banana apple")

for word, count in results:
    print(f"  {word:<10}{count}")

Code Execution — Solution 2

Trace through word_frequency("apple banana apple cherry banana apple"):

words = ["apple", "banana", "apple", "cherry", "banana", "apple"]

Stepwordfrequency.get(word, 0)+ 1frequency
1st"apple"01{"apple": 1}
2nd"banana"01{"apple": 1, "banana": 1}
3rd"apple"12{"apple": 2, "banana": 1}
4th"cherry"01{"apple": 2, "banana": 1, "cherry": 1}
5th"banana"12{"apple": 2, "banana": 2, "cherry": 1}
6th"apple"23{"apple": 3, "banana": 2, "cherry": 1}

Sorted: [("apple", 3), ("banana", 2), ("cherry", 1)]

frequency.get(word, 0) + 1 is cleaner than checking if word in frequency. It reads as — get the current count for this word, default to 0 if it does not exist yet, then add 1.


Solution 3 — using collections.Counter

Python's Counter class is built exactly for this purpose. It counts items in any iterable automatically.

from collections import Counter

def word_frequency(text):
    # clean and split
    text = text.lower()
    for char in ".,!?;:\"'":
        text = text.replace(char, "")
    words = text.split()

    # Counter counts everything automatically
    frequency = Counter(words)

    # most_common() returns items sorted by count — highest first
    return frequency.most_common()


results = word_frequency("the cat sat on the mat the cat")

for word, count in results:
    print(f"  {word:<10}{count}")

# bonus — top 3 most common words
print("\nTop 3:")
for word, count in word_frequency("the cat sat on the mat the cat")[:3]:
    print(f"  {word} ({count})")

Code Execution — Solution 3

Trace through Counter(["the", "cat", "sat", "on", "the", "mat", "the", "cat"]):

Counter internally does what Solution 1 and 2 do — but written in optimized C code.

Result: Counter({"the": 3, "cat": 2, "sat": 1, "on": 1, "mat": 1})

most_common() — returns all items sorted by count: [("the", 3), ("cat", 2), ("sat", 1), ("on", 1), ("mat", 1)]

most_common(3) — returns only the top 3: [("the", 3), ("cat", 2), ("sat", 1)]


Solution 4 — using defaultdict

defaultdict(int) automatically creates a value of 0 for any new key — no need for get() or if checks.

from collections import defaultdict

def word_frequency(text):
    # clean and split
    text = text.lower()
    for char in ".,!?;:\"'":
        text = text.replace(char, "")
    words = text.split()

    # defaultdict(int) starts every new key at 0 automatically
    frequency = defaultdict(int)

    for word in words:
        frequency[word] += 1   # no KeyError — new keys start at 0

    # sort by count descending
    return sorted(frequency.items(), key=lambda x: x[1], reverse=True)


results = word_frequency("the cat sat on the mat the cat")

for word, count in results:
    print(f"  {word:<10}{count}")

Code Execution — Solution 4

Trace through first 3 words ["the", "cat", "the"]:

Stepwordfrequency[word] before+= 1frequency
1st"the"0 (auto)1{"the": 1}
2nd"cat"0 (auto)1{"the": 1, "cat": 1}
3rd"the"12{"the": 2, "cat": 1}

With defaultdict(int), accessing a key that does not exist automatically creates it with value 0. So frequency["new_word"] += 1 works without any if check or .get().


Bonus — frequency with percentage

Show each word as a percentage of total words.

from collections import Counter

def word_frequency_with_percent(text):
    text = text.lower()
    for char in ".,!?;:\"'":
        text = text.replace(char, "")
    words = text.split()

    total = len(words)
    frequency = Counter(words)

    print(f"{'Word':<12} {'Count':>6} {'Percent':>8}")
    print("-" * 28)

    for word, count in frequency.most_common():
        percent = (count / total) * 100
        print(f"{word:<12} {count:>6} {percent:>7.1f}%")


word_frequency_with_percent("the cat sat on the mat the cat")

Output:

Word           Count  Percent
----------------------------
the                3    37.5%
cat                2    25.0%
sat                1    12.5%
on                 1    12.5%
mat                1    12.5%

Which solution to use?

SolutionHowBest when
Solution 1Plain dictionary + if/elseUnderstanding the logic
Solution 2dict.get()Cleaner version of Solution 1
Solution 3CounterReal projects — cleanest and fastest
Solution 4defaultdictWhen building frequency maps for complex data

Output

  the        → 3
  cat        → 2
  sat        → 1
  on         → 1
  mat        → 1

  apple      → 3
  banana     → 2
  cherry     → 1

On this page