Wednesday, January 28, 2009

Python: Simple URL extractor

def url_finder(data):

all =re.findall("http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+",data)

for i in all:
outpt = i.strip('"').strip("'") + "\n"
print outpt


inpt = "aaaaaaaaaaaaaa http://www.google.com bbbbbbbbb http://example010.blogspot.com ccccccccc http://google.com dddd http://a.b/a/a/a/index.html"

url_finder(inpt)

This code will simply find url using regular expression and output it.

No comments: