Remove Duplicates
Learn how to remove duplicate values from a list in Python while preserving order.
Remove Duplicates
Problem
Given a list, remove all duplicate values and return only unique values. The order of first appearance must be preserved.
Input: [1, 2, 3, 2, 4, 3, 5]
Output: [1, 2, 3, 4, 5]
Input: ["apple", "banana", "apple", "cherry", "banana"]
Output: ["apple", "banana", "cherry"]
Input: [1, 1, 1, 1]
Output: [1]
Input: [3, 1, 4, 1, 5, 9, 2, 6, 5]
Output: [3, 1, 4, 5, 9, 2, 6]Logic
- Create an empty result list
- Keep track of items already seen
- Loop through every item
- If the item has not been seen — add it to result and mark it as seen
- If the item has been seen — skip it
- Return the result
Flow
Solution 1 — using a set to track seen items
A set checks membership in O(1) — the fastest way to check if something has been seen before.
def remove_duplicates(items):
seen = set() # tracks items we have already added
result = [] # final list with no duplicates
for item in items:
if item not in seen:
result.append(item) # first time seeing this — add it
seen.add(item) # mark it as seen
return result
print(remove_duplicates([1, 2, 3, 2, 4, 3, 5]))
# [1, 2, 3, 4, 5]
print(remove_duplicates(["apple", "banana", "apple", "cherry", "banana"]))
# ['apple', 'banana', 'cherry']
print(remove_duplicates([1, 1, 1, 1]))
# [1]
print(remove_duplicates([3, 1, 4, 1, 5, 9, 2, 6, 5]))
# [3, 1, 4, 5, 9, 2, 6]Code Execution — Solution 1
Trace through remove_duplicates([1, 2, 3, 2, 4, 3, 5]):
| Step | item | item in seen? | result | seen |
|---|---|---|---|---|
| Start | — | — | [] | {} |
| 1st | 1 | No | [1] | {1} |
| 2nd | 2 | No | [1, 2] | {1, 2} |
| 3rd | 3 | No | [1, 2, 3] | {1, 2, 3} |
| 4th | 2 | Yes | [1, 2, 3] | {1, 2, 3} |
| 5th | 4 | No | [1, 2, 3, 4] | {1, 2, 3, 4} |
| 6th | 3 | Yes | [1, 2, 3, 4] | {1, 2, 3, 4} |
| 7th | 5 | No | [1, 2, 3, 4, 5] | {1, 2, 3, 4, 5} |
| Done | [1, 2, 3, 4, 5] |
We use both a result list and a seen set. The list preserves order. The set gives us fast O(1) lookup — checking item in list gets slower as the list grows, but item in set is always instant regardless of size.
Solution 2 — using dict.fromkeys()
dict.fromkeys() creates a dictionary from a list where each item is a key. Since dictionary keys are unique, duplicates are removed. In Python 3.7+, dictionaries preserve insertion order.
def remove_duplicates(items):
# dict.fromkeys() removes duplicates and preserves order
# then convert back to a list
return list(dict.fromkeys(items))
print(remove_duplicates([1, 2, 3, 2, 4, 3, 5]))
# [1, 2, 3, 4, 5]
print(remove_duplicates(["apple", "banana", "apple", "cherry", "banana"]))
# ['apple', 'banana', 'cherry']
print(remove_duplicates([3, 1, 4, 1, 5, 9, 2, 6, 5]))
# [3, 1, 4, 5, 9, 2, 6]Code Execution — Solution 2
Trace through remove_duplicates([1, 2, 3, 2, 4, 3, 5]):
| Step | Code | Result |
|---|---|---|
| Input | items | [1, 2, 3, 2, 4, 3, 5] |
dict.fromkeys() | keys: 1, 2, 3, 2, 4, 3, 5 | {1: None, 2: None, 3: None, 4: None, 5: None} |
Duplicate 2 | already a key | ignored |
Duplicate 3 | already a key | ignored |
list() | convert back | [1, 2, 3, 4, 5] |
The dictionary stores each item as a key with None as the value. Keys are unique — so duplicates are silently ignored. The order of first insertion is preserved.
Solution 3 — using a loop without a set
Check if the item is already in the result list before adding. Simpler but slower for large lists.
def remove_duplicates(items):
result = []
for item in items:
# only add if not already in result
if item not in result:
result.append(item)
return result
print(remove_duplicates([1, 2, 3, 2, 4, 3, 5]))
# [1, 2, 3, 4, 5]
print(remove_duplicates(["apple", "banana", "apple", "cherry"]))
# ['apple', 'banana', 'cherry']Code Execution — Solution 3
Trace through remove_duplicates([3, 1, 4, 1, 5]):
| Step | item | item in result? | result |
|---|---|---|---|
| Start | — | — | [] |
| 1st | 3 | No | [3] |
| 2nd | 1 | No | [3, 1] |
| 3rd | 4 | No | [3, 1, 4] |
| 4th | 1 | Yes — skip | [3, 1, 4] |
| 5th | 5 | No | [3, 1, 4, 5] |
| Done | [3, 1, 4, 5] |
item not in result checks the entire list every time — this is O(n) per check. For a list of 1000 items, that is up to 1,000,000 comparisons. Solution 1 with a set is O(1) per check — always instant. Use Solution 3 only for small lists or when you want to keep the code simple.
Solution 4 — using list comprehension with enumerate
Keep an item only if it is the first time it appears — check if its first index matches the current position.
def remove_duplicates(items):
# keep item only if its first occurrence index matches current index
# items.index(item) returns the FIRST position of item
# if i == items.index(item) — this is the first time we see it
return [item for i, item in enumerate(items) if items.index(item) == i]
print(remove_duplicates([1, 2, 3, 2, 4, 3, 5]))
# [1, 2, 3, 4, 5]
print(remove_duplicates(["apple", "banana", "apple", "cherry"]))
# ['apple', 'banana', 'cherry']Code Execution — Solution 4
Trace through remove_duplicates([1, 2, 3, 2, 4]):
i | item | items.index(item) | i == index? | Kept? |
|---|---|---|---|---|
0 | 1 | 0 | Yes | ✅ |
1 | 2 | 1 | Yes | ✅ |
2 | 3 | 2 | Yes | ✅ |
3 | 2 | 1 | 3 == 1? No | ❌ |
4 | 4 | 4 | Yes | ✅ |
Result: [1, 2, 3, 4]
When i=3 and item=2 — items.index(2) returns 1 (the first time 2 appeared). Since 3 != 1, this is a duplicate and gets skipped.
Bonus — remove duplicates from a list of dictionaries
A common real-world problem — deduplicate a list of records by a specific key.
def remove_duplicate_dicts(items, key):
seen = set()
result = []
for item in items:
# use the value of the key as the identifier
identifier = item[key]
if identifier not in seen:
result.append(item)
seen.add(identifier)
return result
users = [
{"id": 1, "name": "Ahmad"},
{"id": 2, "name": "Sara"},
{"id": 1, "name": "Ahmad"}, # duplicate id
{"id": 3, "name": "Omar"},
{"id": 2, "name": "Sara"}, # duplicate id
]
unique_users = remove_duplicate_dicts(users, key="id")
for user in unique_users:
print(user)Output:
{'id': 1, 'name': 'Ahmad'}
{'id': 2, 'name': 'Sara'}
{'id': 3, 'name': 'Omar'}Which solution to use?
| Solution | How | Best when |
|---|---|---|
| Solution 1 | Set for seen tracking | Best performance — use this by default |
| Solution 2 | dict.fromkeys() | Cleanest one-liner |
| Solution 3 | item not in result | Small lists, simplest to read |
| Solution 4 | enumerate + index | Clever but avoid for large lists |
| Bonus | Set with key | Deduplicating dictionaries by field |
Output
[1, 2, 3, 4, 5]
['apple', 'banana', 'cherry']
[1]
[3, 1, 4, 5, 9, 2, 6]