A Pythonic way to iterate a set of files
May 09, 2008 at 10:05 p.m. by: Kevin Le (0) Comments
In this post, I present a Pythonic approach for Python developers to easily iterate through a set of files, all the files in a folder or simply a single file. You might find yourself having to write over and over the same code that would check if a given string represents a file or a folder. If it's the case of a folder, the code would get the list of files in that folder and put the result in a list. Then finally, you would loop through that list of files to do some kind of processing on that file. You might also need to check if a file name has the right extension or starts with a certain pattern before processing it. If all that sounds repetitive and not elegant at all, I think you'll agree the following approach would be elegant and Pythonic.
The heart of this approach is in the file FileDirWalker.py, and here's the code for it:
__license__ = "http://creativecommons.org/licenses/by-sa/2.5/"
__copyright__ = "Copyright (C) 2008, Kevin Hoang Le"
__author__ = "Kevin Hoang Le <http://pragmaticobjects.org/>"
import sys, os
def walkFileDir(fileOrDir, filterFn=None):
if os.path.isdir(fileOrDir):
files = os.listdir(fileOrDir)
for file in files:
filePath = os.path.join(fileOrDir, file)
if os.path.isfile(filePath):
if not filterFn:
yield filePath
elif filterFn(filePath):
yield filePath
elif os.path.isfile(fileOrDir):
if not filterFn:
yield fileOrDir
elif filterFn(fileOrDir):
yield fileOrDir
else:
return
I'll illustrate how to use it with an example. Let's write a program that accepts an input argument. The user could pass in a fully-qualified (FQ) file name or a FQ folder name. Depending on what is passed in, we would need to iterate through one single file or a set of files in the given FQ folder name, respectively. Let's make it even more interesting. We would care only if each iterated file has a ".java" extension. So let's write a test stub for the above FileDirWalker.py.
The test stub file TestStub.py:
import FileDirWalker
def fx(file):
if file.endswith(".java"):
return True
else:
return False
def main():
fileOrDir = raw_input("Enter fully-qualified input file name or folder: ")
for inFile in FileDirWalker.walkFileDir(fileOrDir, fx):
print inFile #or do more interesting here
The function fx(file) tests the file name if it ends with ".java" extension and returns True. Otherwise, fx(file) returns False. The loop in main(), which is what we're after in the first place, now becomes extremely simple and easy to understand. It works whether the users enter a file name or a folder name, and depending on what gets entered, FileDirWalker produces a generator that yields control to the caller main() one time or multiple times.
Since FileDirWalker uses Python generator concept, no additional list is created to hold the files. Also note that the filterFn argument in FileDirWalker is optional and it defaults to None.
Let me know if you find this to be useful.