Spaces, tabs, commas, etc..Working with text files

I mentioned that one of the first datasets I needed to work with were ICESat data in HDF5 format. I’d never heard of that…(Now I know that is a format.) So I opted for a custom download in TAB-delimited ASCII text files.

I knew that I could load the individual ASCII file into a text editor or Excel and manually convert the file type. But I’m dealing with a whole bunch of files for analysis. There’s no way I’m going to sit here manually converting everything… which meant that I had to learn some Python. 

Now, earlier this year when I was dicking around with CodeAcademy and FreeCodeCamp I was mostly using HTML, CSS, and beginning JavaScript. I knew Python would be the most useful for me if I wanted to do scripting for geology, or for a potential new job using GIS.  I had made a simple calculator, but I didn’t know how to do anything really useful with it outside of my python shell. 

I vaguely remembered using several text editors from earlier this year. PyCharm was my editor of choice for this project. My work computer is a Windows machine. (More on that later.)

After several iterations, a lot of Googling, and probably a headache or two my script looks like this:

import re

import os
files = os.listdir('ascii')
for f in files:

in f.endswith('.ascii'):
with open('ascii/'+f) as fin, open ('tab/'+f+'.asc','w+') as fout:
for line in fin:
fout.write(re.sub(' +', '\t', line))


The image at the top of this entry is the ICESat data in ASCII format. There is a header that is somewhat complicated, even Excel wasn’t loading it properly, and there is data in some format with combinations of tabs and spaces. I didn’t know what to do with that so my first script iteration skipped the header and turned all spaces into commas. This worked for csv’s, so there was no reason to move onto turning those spaces into tabs, I just wanted to know I could do it. And if I really needed to, I figured it would be easier to extract or skip predetermined rows. 

import re

with open('file.ascii') as fin, open ('file.csv') as fout:
next(fin) #skip the header
for lin in fin:
fout.write(re.sub(' +', ',', line))

It’s worth noting that once I was able to hobble together this script, I didn’t know how to make it run and do the thing.  Do the thing!

At some point in button mashing and willy nilly clicking I realized double clicking my python file (i.e. script.py) was the way to make it go. Remember, I’m really new to this and I’m trying to cheat my way into automating tasks. Don’t worry, I’ve been reading up on how these things work. There’s not as much clickity clicking going on lately. 

Leave a comment