Playing with Python

For something I was playing around with at work, I wanted to be able to retrieve an rss feed, parse it and post the title/description fields to another website site at timed intervals. These days I only really write in Java and JavaScript but Java seemed like such a longhand way to achieve this. I probably could have written a shell script, but it’s such a long time since I wrote shell scripts that I’d have been starting from scratch so I decided to take a look at Python… and so far I’m impressed. Very impressed.

From start to finish this probably only took a couple of hours and that includes referring back to the api docs for almost every line I wrote. The code below fetches a feed, and extracts the title field from the items. Each time it finds an item, it adds its guid to a text file so that it can ignore items that have been previously processed. I’m sure this can be tidied up lots, but for a first attempt I’m pretty happy (for simplicity I’ve removed the code that posts the items
[sourcecode language="python"]
import urllib
from threading import Timer
from xml.dom import minidom

def retrieveXml(url):
#get the feed
f = urllib.urlopen(url)
xmldoc = minidom.parse(f)

# read the history (assumes the file already exists)
historyFile = open(‘./history.dat’,'r’)
history =
found = False
# iterate through each item in the feed
items = xmldoc.getElementsByTagName(‘item’)
for item in items:
title = item.getElementsByTagName(‘title’)[0].firstChild.toxml()
guid = item.getElementsByTagName(‘guid’)[0].firstChild.toxml()
# if the current item isn’t in the history, then use that
if history.find(guid) < 0:
found = True
# if we found a new entry while iterating over the feed…
if found:
historyFile = open(‘./history.dat’,'a’)
historyFile.write(guid + “n”)
# for now just print the title to screen
print title
t = Timer(10.0, retrieveXml, [url])


Leave a Reply