Introduction
The difference between deep and shallow copies is something we often learn early in our educational journey with Python. The concepts themselves aren't especially complex. However, mistakes relating to copies still wind up code from time to time.
Generally, the mistakes I see aren't deliberate errors but rather issues resulting from working quickly on lower-priority tasks against a deadline. Nonetheless, these mistakes can be challenging to detect and fix so it’s worth a review to better understand the concepts and avoid the errors before they happen.
Deep Copy vs. Shallow Copy
Here's what the docs have to say about deep and shallow copies as of Python 3.11.4:
- A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
- A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
In the basic terms, a shallow copy of an object will have keys that point to the values of the original object. These values are represented just once in memory. We can visualize this with the following example:
import copy
d1 = {'k1': [1, ]}
d2 = copy.copy(d1) # shallow copy
# also
# d2 = d1.copy()
# show that these values have the same memory address
print(id(d1['k1']))
print(id(d2['k1']))
print(id(d1['k1'])==id(d2['k1']))
# output
4766704192
4766704192
True
Then these are the same object? No, they're still independent. Only references to the values of compound objects remain. Primitive types will, on the other hand, be new references.
import copy
d1 = {'k1': [1, ]}
d2 = copy.copy(d1)
# show that these aren't the same object
print(id(d1))
print(id(d2))
print(id(d1)==id(d2))
# output
4503578560
4503653888
False
In contrast, see that deep copy recursively copies each value and stores it in different memory.
import copy
d1 = {'k1': [1, ]}
d2 = copy.deepcopy(d1) # deep copy
# show that these values have different addresses in memory
print(id(d1['k1']))
print(id(d2['k1']))
print(id(d1['k1'])==id(d2['k1']))
# output
4766999040
4766647808
False
Common Implications
The situation I often see is where a second object is created by "copying" the first with the intent of working with the attributes independently, especially in testing.
If my implementation utilized a shallow copy, I've failed (aside from primitives). A relatively benign issue on the surface, however, data being modified unintentionally is a big problem if you need to preserve the contents of the initial object.
import copy
d1 = {'k1': [1, ]}
d2 = copy.copy(d1) # shallow copy
d2['k1'].append(2) # add another number to the list
print(d2['k1']) # the list in d2 was updated
print(d1['k1']) # d1 also updated because these dictionaries point to the same value
# output
[1, 2]
[1, 2]
The Fix
With the function deepcopy of the built-in module 'copy', we're able to avoid the issues described above. If our goal is to copy out the attributes to work with them independently, this is the route that should be followed. See below how values are preserved:
import copy
d1 = {'k1': [1, ]}
d2 = copy.deepcopy(d1) # deep copy
d2['k1'].append(2) # add another number to the list
print(d2['k1']) # the list in d2 was updated
print(d1['k1']) # d1 values have been preserved
# output
[1, 2]
[1]
Assignment
Another mistake I see with making copies is using assignment ("=") instead of copy. Using the assignment operator doesn't copy the object at all. Instead, the copied object contains a reference, or pointer, back to the first.
# made a "copy" through assignment
d1 = {'k1': [1, ]}
d2 = d1
# show that these are really the same object
print(id(d1))
print(id(d2))
print(id(d1)==id(d2))
# output
4503578240
4503578240
True
Final Thoughts
When it comes to making copies, we need to be careful how those copies are being made if the contents of the object contain compound objects. If the intention is to make changes to the copied data while preserving the source values, then copy.deepcopy() is the way to go.