Wednesday, October 28, 2009

Programming :: Mac to Unix :: Excel Exports have New Line Problems for Perl on OSX

Why won't my Perl script parse the CSV or tab-delimited table I just exported from Excel?
Note that when exporting a tab-delimited or CSV file from Excel in MacOSX you may find you can't get your Perl scripts to work on the text file. This is likely due to problems with the newline characters. Excel tab delimited files are exported with carriage returns (\r) rather than the unix linefeeds (\n) at the end of each line.  This can cause a lot of frustration and here are some solutions:
1.  Download the software Tex_Edit Plus.  Open the tab delimited file in Tex_Edit. Go to Tools; Quick Cleanup; Mac to Unix conversion.  Save the file and now run your script on this converted file.
2.  Alternatively, there should be a mac2unix tool in your usr/bin/ that could work in your perl script if you have it.
3.  With Perl you can do a mac to unix conversion using the regular expression $line =~ s/\r/\n/g, but if you are reading in your file through a while loop it is already screwed...so it is easier to do the conversion from the Unix command line by typing the following at the command line:
perl -p -e 's/\r/\n/g' (less than sign)excelexport_infile.txt (greater than sign)unixconverted_outfile.txt
* this regular expression globally (g) replaces (s) returns (\r) with newlines (\n).  Note that html requires that I write out (less than sign) and (greater than sign), but I mean for you to use the actual signs here.

These will switch the newline characters to Unix (\n) and your perl script should now work on the unixconverted_outfile.txt.