I use a small python script to automate a user login on a site to retrieve in csv a list of other’s user linked to me.
I search a lot how to do it in python, to create a fast automated script.
Before starting you should get the mechanize (i use here the manual version (not using setup.py)), you can download it here :
# -*- coding: utf-8 -*- import sys sys.path.append("./mechanize") from mechanize import Browser
Now the import is ready, we can call pages :
def login(br, url): """login before retrieving user information""" page = br.open(url) #Get the form used by normal user to logon br.select_form(name="connexion") #send login information br.form["login"] = "mylogin" br.form["password"] = "mypassword" #submit form br.submit()
Pretty simple so, now you can reuse br to now get access to private part to retrieve information like that (this system is using a base id/PHPSESSID to go over pages) :
def browse(br, url, baseurl): """Browse a page after passing threw login""" page = br.open(baseurl + url) print page.read().decode("UTF-8")
Just use browse with br from login, this will print each pages you ask with it like this :
br = Browser() login(br, "http://www.example.com") browse(br, "http://www.example.com", "?id=10&PHPSESSID=98bzdqd3")
That’s already finish you can now use internet site like a normal user. I personnally don’t use thread in this case to not have overload on server side (not needed to go fast for me), but you can improve this code by using a thread system to catch many pages in a row.