Python Soup is Tasty
April 30, 2009, 5:50 pm
This is why I love Python. I wanted to get a list of countries for my Timeline Project, so I went to WikiPedia and found a pretty decent list of countries. The combination of Python and Beautiful Soup made writing a tool to scrape the data faster than copy-pasting and text editing.
import urllib2
import httplib
import codecs
import sys
from BeautifulSoup import BeautifulSoup
opener = urllib2.build_opener()
try:
url = "http://en.wikipedia.org/wiki/List_of_countries"
req = urllib2.Request(url, "", { "User-Agent" : "Souper" } ())
response = opener.open(req)
data = response.read()
except urllib2.URLError, err:
print "HTTP error:", err.reason
sys.exit ()
except httplib.HTTPException, err:
print "HTTP error:", err
sys.exit ()
streamWriter = codecs.lookup('utf-8')[-1]
sys.stdout = streamWriter(sys.stdout)
soup = BeautifulSoup (data)
print "$countries = array ("
countries = []
image_spans = soup.findAll('span', {"class" : "flagicon"})
for span in image_spans:
href = span.findNextSibling('a')
if (href):
countries.append (unicode(href.contents[0]).encode('ascii','ignore'))
for i in range(0, len(countries)):
print """ + countries[i] + (""," if (i < len(countries) - 1) else "");")
Permalink - Tags: Development