Facebook Graph API.

April 28, 2010 8 comments

So it turns out that Facebook, that site you can use to view pictures of attractive people who you’ll never have the courage to talk to, has a nice Graph API you can use to get info about the people you are “friends” with.

My immediate thought upon hearing this, since all Facebook is known for is privacy violations, was “How long would it take to make a Python script to build a list of all images of friends?”. The answer to this question being “Not very long”.

First step, of course, is to define the server name, your own username, and your access_token (which seems to be how it verifies which user is making the request) and write a little function to return the data of a web-page:

import http.client
import json

server = "graph.facebook.com"
myID = "MY USERNAME"
accessToken = "NOPE, PROBABLY SHOULDN'T INCLUDE THE REAL ONE HERE"

def getDataURL(URL):
	"""Takes a URL, strips off the http:// and gets server and URL, then calls below function.
	Not meant to be a general function, just a quick hack to deal with the next page links returned."""

	return(getData(server, URL[26:]))

def getData(serv, URL):
	"""Gets the data from the site as a byte string.
	Assume UTF-8 encoding."""
	conn = http.client.HTTPConnection(server)
	conn.request("GET", URL)

	response = conn.getresponse()
	#I was going to have it keep trying if there was an error but decided it'd just cause infinite loops if there was a genuine reason for failing.
	if (response.status != 200):
		return None
	else:
		data = response.read().decode('utf8')
		return data

This is easy enough to follow I assume, first function is a nasty little hack, working only under the assumption that all pages you need to visit begin with “http://graph.facebook.com/”.

Looking at the API documentation linked above it’s fairly easy to see that we need:

One nice thing about the API is that the ID can be a user ID, or an album ID in the third one, since both contain photos. You can also specify which fields you need by appending “?fields=” and whatever field you want, in this script we will use the “id” fields to get User and Album IDs and the “source” field to get the URLs to the images.

The next step is of course to get a list of your friends. The following function will do that. Note that you can only look up your own friends, not anyone else’s (the error message states that you can look it up for pairs of users, not sure what that means or how to do it though).

def getFriends(UID):
	data = getData(server, UID + "/friends?fields=id&access_token=" + accessToken)

	if (data == None):
		return []

	list = json.loads(data)['data']
	IDs = [friend['id'] for friend in list]

	return IDs

Once we have done this we need to do three things:

  • Get the IDs of every photo album belonging to this list of friends.
  • Get all the photos of each user in the list.
  • Get all the photos in each album in the list of albums.

As I’ve noted above, the second and the third one can be done using the same API call as an ID can refer to either a user or an Album (or pretty much anything else, such as an app, a wall post, a link etc., but for now we’ll stick to photos :P). However thanks to the awesomeness of JSON we can do the first one using the exact same code, simply by varying what we’re looking for.

I decided to actually comment this one as it’s a bit longer than the above, but the comments are probably still useless and unhelpful, maybe not even accurate.

def getAll(ID, toFind, field):
	URLs = []
	pagesVisited = []
	#In this function we use absolute URLs, as this is what Facebook gives us to indicate next/previous pages of items.
	nextPage = "http://" + server + "/" + ID + "/" + toFind + "?fields=" + field + "&access_token=" + accessToken
	data = getDataURL(nextPage)

	#Lists of images/albums are split across many pages.
	while (True):
		while (data == None):
			#I said in the previous function that I'd avoid these infinite loops, but I figure since we'll
			#be getting ridiculously large amounts of pages it'd be best to avoid having to start again or skipping
			#any if possible.
			print("Error getting" + nextPage + ", trying again")
			data = getDataURL(nextPage)

		#Loads the data as JSON, for this program we can assume that any pages useful for this function
		#have info on next and previous pages, this is true even if there isn't enough data to merit them.
		info = json.loads(data)
		list = info['data']
		if 'paging' not in info:
			break
		nextPage = info['paging']['next']

		#Finally we actually build up the list of URLs.
		#Duplicates are checked as they do tend to happen in testing. Especially if there's less than 25 images in a gallery.
		#It keeps a list of pages checked and stops when the "next" page given has already been visited.
		#It actually would probably work just by comparing current to previous.
		for i in list:
			if i[field] not in URLs:
				URLs.append(i[field])
		if (nextPage in pagesVisited):
			break
		pagesVisited.append(nextPage)

	return URLs

Finally the code that actually USES these functions to stalk people. I decided that due to the massive amounts of output that it’d be best not to actually try download all these photos, just to print out the URLs :P.

friendIDs = [myID] + getFriends(myID)
print(friendIDs, len(friendIDs))

albumIDs = []
for i in friendIDs:
	albumIDs.append( getAll(i, "albums", "id") )
print(albumIDs, len(albumIDs))

for i in friendIDs:
	print("\n\n\n\nSTART: " + i + ":")
	print(getAll(i, "photos", "source"))

for i in albumIDs:
	for j in i:
		print("\n\n\n\nSTART ALBUM: " + j + ":")
		print(getAll(j, "photos", "source"))

This will print them all out in a kinda weird format, but one which should be easy enough to understand. This took quite a long time to run for me since there’s a huge amount of pages it has to download (but only a fraction of the amount if you actually wanted to download everything).
One issue I’ve noticed is that it only downloads a small fraction of photos tagged of people, for one it only displays those uploaded by themselves, not by others who tagged them, and it doesn’t seem to go past a certain date (the URLs given to find next/previous seem to include a “since” and/or “until” date), I haven’t checked whether it has similar problems for albums, it does seem to return all Album and Friend IDs successfully.

Tons of room for improvement here, but since I have no reason to actually use this for anything, I was merely curious about the API, I saw no need to spend extra time on it. You may also have noticed it seems quite badly written and probably has tons of bugs, I’ll use the previous excuse for that too. If you do want to work on it and improve it (or indeed make your own), here’s some ways you could improve it:

  • Actually download the images (the obvious one).
  • Nice folder structure.
    Rather than just provide IDs when getting friend and album lists you could provide both the name and ID (“?fields=id,name”). Then you could download them in a nice folder structure (“PersonName/AlbumName/image.jpg”)
  • More interactivity
    This could actually be used to make a nice GUI image viewer. Simply give it your access_token and it gives you a nice way to view/save albums or images, if you look through the documentation linked to at the top of this post you should be able to find a way to get thumbnail images of the photos to make this less network-intensive (only download when needed, or get the thumbnails first to quickly show the folder then download the full pics in background).

Well that appears to be it, probably a hideously boring blog post with horribly messy code, not even sure who my target audience is :P. Feel free to comment if you like, be generous with your praise and not too harsh with your criticisms please.

Categories: Programming Tags: , ,