Python 2.7 CSV Files with Unicode Characters
This content is part of the Python Zone, which is presented to you by DZone and New Relic. Visit the Python Zone for news, tips, and tutorials on the Python programming language. New Relic provides the resources and best practices to help you monitor these applications.
The csv module in Python 2.7 is more-or-less hard-wired to work with ASCII and only ASCII.
Sadly, we're often confronted with CSV files that include Unicode characters. There are numerous Stack Overflow questions on this topic. http://stackoverflow.com/search?q=python+csv+unicode
What to do? Since csv is married to seeing ASCII/bytes, we must explicitly decode the column values.
One solution is to wrap csv.DictReader, something like the following. We need to decode each individual column before attempting to do anything with value.
This new object is an iterable which contains a DictReader. We could subclass DictReader, also.
The use case, then, becomes something simple like this.
We can now get Unicode characters from a CSV file.
Source: http://slott-softwarearchitect.blogspot.com/2012/01/python-27-csv-files-with-unicode.html
Published at DZone with permission of Steven Lott, author and DZone MVB.Sadly, we're often confronted with CSV files that include Unicode characters. There are numerous Stack Overflow questions on this topic. http://stackoverflow.com/search?q=python+csv+unicode
What to do? Since csv is married to seeing ASCII/bytes, we must explicitly decode the column values.
One solution is to wrap csv.DictReader, something like the following. We need to decode each individual column before attempting to do anything with value.
class UnicodeDictReader( object ):
def __init__( self, *args, **kw ):
self.encoding= kw.pop('encoding', 'mac_roman')
self.reader= csv.DictReader( *args, **kw )
def __iter__( self ):
decode= codecs.getdecoder( self.encoding )
for row in self.reader:
t= dict( (k,decode(row[k])[0]) for k in row )
yield t
This new object is an iterable which contains a DictReader. We could subclass DictReader, also.
The use case, then, becomes something simple like this.
with open("some.csv","rU") as source:
rdr= UnicodeDictReader( source )
for row in rdr:
# process the rowWe can now get Unicode characters from a CSV file.
Source: http://slott-softwarearchitect.blogspot.com/2012/01/python-27-csv-files-with-unicode.html
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)
Python is a fast, powerful, dynamic, and versatile programming language that is being used in a variety of application domains. It has flourished as a beginner-friendly language that is penetrating more and more industries. The Python Zone is
a community that features a diverse collection of news, tutorials,
advice, and opinions about Python and Django. The Python Zone is
sponsored by New Relic, the all-in-one web application performance tool that lets you see performance from the end user experience, through servers, and down to the line of application code.


