Mistakes in Making Copies of Python Objects

The differences between deep and shallow copies in Python, mistakes common mistakes and how to correct for these errors.

Introduction

The difference between deep and shallow copies is something we often learn early in our educational journey with Python. The concepts themselves aren't especially complex. However, mistakes relating to copies still wind up code from time to time. 

Generally, the mistakes I see aren't deliberate errors but rather issues resulting from working quickly on lower-priority tasks against a deadline. Nonetheless, these mistakes can be challenging to detect and fix so it’s worth a review to better understand the concepts and avoid the errors before they happen.

Deep Copy vs. Shallow Copy

Here's what the docs have to say about deep and shallow copies as of Python 3.11.4:

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

In the basic terms, a shallow copy of an object will have keys that point to the values of the original object. These values are represented just once in memory. We can visualize this with the following example:

import copy

d1 = {'k1': [1, ]}
d2 = copy.copy(d1) # shallow copy
# also
#   d2 = d1.copy()

# show that these values have the same memory address
print(id(d1['k1']))
print(id(d2['k1']))
print(id(d1['k1'])==id(d2['k1']))

# output
4766704192
4766704192
True

Then these are the same object? No, they're still independent. Only references to the values of compound objects remain. Primitive types will, on the other hand, be new references.

import copy

d1 = {'k1': [1, ]}
d2 = copy.copy(d1)

# show that these aren't the same object
print(id(d1))
print(id(d2))
print(id(d1)==id(d2))

# output
4503578560
4503653888
False

In contrast, see that deep copy recursively copies each value and stores it in different memory.

import copy

d1 = {'k1': [1, ]}
d2 = copy.deepcopy(d1) # deep copy

# show that these values have different addresses in memory
print(id(d1['k1']))
print(id(d2['k1']))
print(id(d1['k1'])==id(d2['k1']))

# output
4766999040
4766647808
False

Common Implications

The situation I often see is where a second object is created by "copying" the first with the intent of working with the attributes independently, especially in testing.

If my implementation utilized a shallow copy, I've failed (aside from primitives). A relatively benign issue on the surface, however, data being modified unintentionally is a big problem if you need to preserve the contents of the initial object.

import copy

d1 = {'k1': [1, ]}
d2 = copy.copy(d1) # shallow copy

d2['k1'].append(2)  # add another number to the list
print(d2['k1'])  # the list in d2 was updated
print(d1['k1'])  # d1 also updated because these dictionaries point to the same value

# output
[1, 2]
[1, 2]


The Fix

With the function deepcopy of the built-in module 'copy', we're able to avoid the issues described above. If our goal is to copy out the attributes to work with them independently, this is the route that should be followed. See below how values are preserved:


import copy

d1 = {'k1': [1, ]}
d2 = copy.deepcopy(d1) # deep copy

d2['k1'].append(2)  # add another number to the list
print(d2['k1'])  # the list in d2 was updated
print(d1['k1'])  # d1 values have been preserved

# output
[1, 2]
[1]

Assignment

Another mistake I see with making copies is using assignment ("=") instead of copy. Using the assignment operator doesn't copy the object at all. Instead, the copied object contains a reference, or pointer, back to the first.

# made a "copy" through assignment
d1 = {'k1': [1, ]}
d2 = d1

# show that these are really the same object
print(id(d1))
print(id(d2))
print(id(d1)==id(d2))

# output
4503578240
4503578240
True

Final Thoughts

When it comes to making copies, we need to be careful how those copies are being made if the contents of the object contain compound objects. If the intention is to make changes to the copied data while preserving the source values, then copy.deepcopy() is the way to go.

Details
Published
July 22, 2023
Topics
Tags