Python Zone is brought to you in partnership with:

I work at Tangent Labs, a digital agency, writing applications in python and django. I spend most of my free time hacking. I run and am the author of the e-commerce framework django-oscar amongst other things. I used to be a mathematician; I have a PhD in Mathematics from the University of Nottingham and an associated interest in cryptic crosswords, chess and playing devil's advocate. David is a DZone MVB and is not an employee of DZone and has posted 27 posts at DZone. You can read more from them at their website. View Full User Profile

csvfilter - A Python Command-line Tool for Manipulating CSV Data

  • submit to reddit

You want a unix-like tool for manipulating CSV data from the command-line.

The standard tools cut and awk aren't always suitable as they don't handle quoting and escaping which are common in CSVs.


Use the CSV manipulation function csvfilter, a simple Python library I've put together.

Install with:

pip install csvfilter

Sample usage:

# Pluck columns 2, 5 and 6
cat in.csv | csvfilter -f 2,5,6 > out.csv

# Pluck all columns except 4
cat in.csv | csvfilter -f 4 -i > out.csv

# Skip header row
cat in.csv | csvfilter -s 1 > out.csv

# Work with pipe-separated data
cat in.csv | csvfilter -s 1,3 --delimiter="|" > out.csv

The above examples show csvfilter processing sys.STDIN but it can also act directly on a file:

csvfilter -f 2,5,6 in.csv  > out.csv


$ csvfilter --help
Usage: csvfilter [options]

-h, --help            show this help message and exit
-f FIELDS, --fields=FIELDS
                        Specify which fields to pluck
-s SKIP, --skip=SKIP  Number of rows to skip
                        Delimiter of incoming CSV data
-i, --inverse         Invert the filter - ie drop the selected fields
                        Delimiter to use for output
                        Quote character to use for output

There is also a simple python API that allows you to add validators to determine which rows are filtered out:

from csvfilter import Processor

def contains_cheese(row):
    return 'cheese' in row

processor = Processor(fields=[1,2,3])
generator = processor.process(sys.stdin)

for cheesy_row in generator:

It's possible to do basic CSV manipulation from the command-line using cut or awk - for example:

cat in.csv | cut -d "," -f 0,1,2 > out.csv

or :

cat in.csv | awk 'BEGIN {FS=","} {print $1,$2,$3}' > out.csv

However neither cut or awk make it easy to handle CSVs with escaped characters - hence the motivation for this tool.

I'm not the first to write such a utility - there are several others out there (although none had quite the API I was looking for):

Source available on Github.


Published at DZone with permission of David Winterbottom, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)