Currently I’m working on a script to search for duplicate images on my hard drive. To do so, I need to search recursively and go down through all subdirectories.
Recursive searching can be done by importing the commonly used os module. Use the os.walk() function to recursively enumerate file names and paths in the supplied directory. There are then many ways to identify file types, but in this case I’m using fnmatch. Here’s the official fnmatch description:
This module provides support for Unix shell-style wildcards, which are not the same as regular expressions (which are documented in the re module).
We have the modules to use, now the code:
1 2 3 4 5 6 7 8 9 |
import os import fnmatch def find_images(search_directory): images = [] for root, dirnames, filenames in os.walk(search_directory): for image in fnmatch.filter(filenames, '*.jpg'): images.append(os.path.join(root, image)) return images |
This function, called find_images(), returns an array of full file paths within a given directory that have a file type of ‘.jpg’. While searching, once an image is found we join the image name with root, the current directory being searched by os.walk(). This gives us back a list of full paths instead of just file names.
Example usage:
1 2 3 |
search_path = os.path.expanduser('~/Pictures') image_paths = find_images(search_path) |
Hopefully I’ve explained the code clearly, but if not let me know. If you have any questions at all please feel free to ask in the comments.
Whenever you’re reading this, I hope you’re having a great day!
There is 1 comment on this post
This post is a few years old and one of the very few that ran perfect for me on the first shot in Pythonista.
If you're like me and tried running the script directly, dont forget to append "print(image_paths)" at the bottom of the file. Without the quotes.