IAN WALDRON IAN WALDRON

Mistakes in Making Copies of Python Objects

The differences between deep and shallow copies in Python, mistakes I commonly make, and how I correct for these errors.
July 22, 2023

Introduction

The difference between deep and shallow copies is something we often learn early in our educational journey with Python. The concepts themselves aren't especially difficult to conquer (we'll explore these concepts in the next section). However, I still encounter and make mistakes relating to copies frequently. Generally, the mistakes I see aren't deliberate errors but rather issues resulting from working quickly on lower-priority tasks against a deadline. Nonetheless, these mistakes can be costly to detect and fix so it’s worth a review to better understand the concepts and avoid the errors before they happen.

Deep Copy vs. Shallow Copy

Here's what the docs have to say about deep and shallow copies as of Python 3.11.4:

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

In the basic terms, a shallow copy of an object will have keys that point to the same values as the original object. These values are represented just once in memory. We can visualize this with the following example:


d1 = {'k1': [1, ]}
d2 = d1.copy() # shallow copy
# other ways:
#   d2 = d1
#   d2 = copy.copy(d1)  # from standard 'copy' module

# show that these values use the same memory
print(id(d1['k1']))
print(id(d2['k1']))
print(id(d1['k1'])==id(d2['k1']))

# output
4766704192
4766704192
True

In contrast, see that deep copy (recursively) copies each value of the object and separately stores the values in different memory:


import copy  # a built-in module

d1 = {'k1': [1, ]}
d2 = copy.deepcopy(d1) # deep copy

# show that these values use different memory
print(id(d1['k1']))
print(id(d2['k1']))
print(id(d1['k1'])==id(d2['k1']))

# output
4766999040
4766647808
False

Common Implications

The situation I often run in to is where an object is copied with the intent of working with the attributes independently, especially in testing. For example, I copy an object with object_one = object_two and change one or more of the attributes of object_two. If my intention was to leave object_one untouched, I've failed (so long as the attributes are mutable, immutable values are not affected). A relatively benign issue on the surface, however, data being modified unintentionally is a big problem if you need the initial values downstream.


d1 = {'k1': [1, ]}
d2 = d1 # shallow copy

d2['k1'].append(2)  # add another number to the list
print(d2['k1'])  # the list in d2 was updated
print(d1['k1'])  # d1 also updated because these dictionaries point to the same value

# output
[1, 2]
[1, 2]


The Fix

With the function deepcopy of the built-in module 'copy', we're able to avoid the issues described above. If our goal is to copy out the attributes to work with them independently, this is the route that should be followed. See below how values are preserved:


import copy

d1 = {'k1': [1, ]}
d2 = copy.deepcopy(d1) # deep copy

d2['k1'].append(2)  # add another number to the list
print(d2['k1'])  # the list in d2 was updated
print(d1['k1'])  # d1 values have been preserved

# output
[1, 2]
[1]

Final Thoughts

When it comes to making copies, we need to be careful how those copies are being made if we're dealing with mutable values. If the intention is to make changes to the copied data while preserving the source values, then copy.deepcopy() is the way to go.