DocsHub
PythonIntermediate

Remove Duplicates

Learn how to remove duplicate values from a list in Python while preserving order.

Remove Duplicates

Problem

Given a list, remove all duplicate values and return only unique values. The order of first appearance must be preserved.

Input:  [1, 2, 3, 2, 4, 3, 5]
Output: [1, 2, 3, 4, 5]

Input:  ["apple", "banana", "apple", "cherry", "banana"]
Output: ["apple", "banana", "cherry"]

Input:  [1, 1, 1, 1]
Output: [1]

Input:  [3, 1, 4, 1, 5, 9, 2, 6, 5]
Output: [3, 1, 4, 5, 9, 2, 6]

Logic

  1. Create an empty result list
  2. Keep track of items already seen
  3. Loop through every item
  4. If the item has not been seen — add it to result and mark it as seen
  5. If the item has been seen — skip it
  6. Return the result

Flow

Yes No Yes No Start result = empty listseen = empty set Take each item item in seen? Skip — already in result Add to resultAdd to seen More items? Return result

Solution 1 — using a set to track seen items

A set checks membership in O(1) — the fastest way to check if something has been seen before.

def remove_duplicates(items):
    seen = set()     # tracks items we have already added
    result = []      # final list with no duplicates

    for item in items:
        if item not in seen:
            result.append(item)   # first time seeing this — add it
            seen.add(item)        # mark it as seen

    return result


print(remove_duplicates([1, 2, 3, 2, 4, 3, 5]))
# [1, 2, 3, 4, 5]

print(remove_duplicates(["apple", "banana", "apple", "cherry", "banana"]))
# ['apple', 'banana', 'cherry']

print(remove_duplicates([1, 1, 1, 1]))
# [1]

print(remove_duplicates([3, 1, 4, 1, 5, 9, 2, 6, 5]))
# [3, 1, 4, 5, 9, 2, 6]

Code Execution — Solution 1

Trace through remove_duplicates([1, 2, 3, 2, 4, 3, 5]):

Stepitemitem in seen?resultseen
Start[]{}
1st1No[1]{1}
2nd2No[1, 2]{1, 2}
3rd3No[1, 2, 3]{1, 2, 3}
4th2Yes[1, 2, 3]{1, 2, 3}
5th4No[1, 2, 3, 4]{1, 2, 3, 4}
6th3Yes[1, 2, 3, 4]{1, 2, 3, 4}
7th5No[1, 2, 3, 4, 5]{1, 2, 3, 4, 5}
Done[1, 2, 3, 4, 5]

We use both a result list and a seen set. The list preserves order. The set gives us fast O(1) lookup — checking item in list gets slower as the list grows, but item in set is always instant regardless of size.


Solution 2 — using dict.fromkeys()

dict.fromkeys() creates a dictionary from a list where each item is a key. Since dictionary keys are unique, duplicates are removed. In Python 3.7+, dictionaries preserve insertion order.

def remove_duplicates(items):
    # dict.fromkeys() removes duplicates and preserves order
    # then convert back to a list
    return list(dict.fromkeys(items))


print(remove_duplicates([1, 2, 3, 2, 4, 3, 5]))
# [1, 2, 3, 4, 5]

print(remove_duplicates(["apple", "banana", "apple", "cherry", "banana"]))
# ['apple', 'banana', 'cherry']

print(remove_duplicates([3, 1, 4, 1, 5, 9, 2, 6, 5]))
# [3, 1, 4, 5, 9, 2, 6]

Code Execution — Solution 2

Trace through remove_duplicates([1, 2, 3, 2, 4, 3, 5]):

StepCodeResult
Inputitems[1, 2, 3, 2, 4, 3, 5]
dict.fromkeys()keys: 1, 2, 3, 2, 4, 3, 5{1: None, 2: None, 3: None, 4: None, 5: None}
Duplicate 2already a keyignored
Duplicate 3already a keyignored
list()convert back[1, 2, 3, 4, 5]

The dictionary stores each item as a key with None as the value. Keys are unique — so duplicates are silently ignored. The order of first insertion is preserved.


Solution 3 — using a loop without a set

Check if the item is already in the result list before adding. Simpler but slower for large lists.

def remove_duplicates(items):
    result = []

    for item in items:
        # only add if not already in result
        if item not in result:
            result.append(item)

    return result


print(remove_duplicates([1, 2, 3, 2, 4, 3, 5]))
# [1, 2, 3, 4, 5]

print(remove_duplicates(["apple", "banana", "apple", "cherry"]))
# ['apple', 'banana', 'cherry']

Code Execution — Solution 3

Trace through remove_duplicates([3, 1, 4, 1, 5]):

Stepitemitem in result?result
Start[]
1st3No[3]
2nd1No[3, 1]
3rd4No[3, 1, 4]
4th1Yes — skip[3, 1, 4]
5th5No[3, 1, 4, 5]
Done[3, 1, 4, 5]

item not in result checks the entire list every time — this is O(n) per check. For a list of 1000 items, that is up to 1,000,000 comparisons. Solution 1 with a set is O(1) per check — always instant. Use Solution 3 only for small lists or when you want to keep the code simple.


Solution 4 — using list comprehension with enumerate

Keep an item only if it is the first time it appears — check if its first index matches the current position.

def remove_duplicates(items):
    # keep item only if its first occurrence index matches current index
    # items.index(item) returns the FIRST position of item
    # if i == items.index(item) — this is the first time we see it
    return [item for i, item in enumerate(items) if items.index(item) == i]


print(remove_duplicates([1, 2, 3, 2, 4, 3, 5]))
# [1, 2, 3, 4, 5]

print(remove_duplicates(["apple", "banana", "apple", "cherry"]))
# ['apple', 'banana', 'cherry']

Code Execution — Solution 4

Trace through remove_duplicates([1, 2, 3, 2, 4]):

iitemitems.index(item)i == index?Kept?
010Yes
121Yes
232Yes
3213 == 1? No
444Yes

Result: [1, 2, 3, 4]

When i=3 and item=2items.index(2) returns 1 (the first time 2 appeared). Since 3 != 1, this is a duplicate and gets skipped.


Bonus — remove duplicates from a list of dictionaries

A common real-world problem — deduplicate a list of records by a specific key.

def remove_duplicate_dicts(items, key):
    seen = set()
    result = []

    for item in items:
        # use the value of the key as the identifier
        identifier = item[key]

        if identifier not in seen:
            result.append(item)
            seen.add(identifier)

    return result


users = [
    {"id": 1, "name": "Ahmad"},
    {"id": 2, "name": "Sara"},
    {"id": 1, "name": "Ahmad"},   # duplicate id
    {"id": 3, "name": "Omar"},
    {"id": 2, "name": "Sara"},    # duplicate id
]

unique_users = remove_duplicate_dicts(users, key="id")
for user in unique_users:
    print(user)

Output:

{'id': 1, 'name': 'Ahmad'}
{'id': 2, 'name': 'Sara'}
{'id': 3, 'name': 'Omar'}

Which solution to use?

SolutionHowBest when
Solution 1Set for seen trackingBest performance — use this by default
Solution 2dict.fromkeys()Cleanest one-liner
Solution 3item not in resultSmall lists, simplest to read
Solution 4enumerate + indexClever but avoid for large lists
BonusSet with keyDeduplicating dictionaries by field

Output

[1, 2, 3, 4, 5]
['apple', 'banana', 'cherry']
[1]
[3, 1, 4, 5, 9, 2, 6]

On this page