Tag Archives: google

Google Chat History Downloader

Update 2011-11-09:
Gmail now officially supports downloading chat history via IMAP. Thank’s to Steve for pointing it out. It can be enabled in the “Labels” section of Gmail settings.

Update 2011-08-30:

Based on the comments, this doesn’t work anymore. I’d recommend checking out this thread for solutions: http://www.google.com/support/forum/p/gmail/thread?tid=7a7d2d6da5be047f

I personally have been using a javascript-based solution for exporting recent chat data, which still doesn’t solve the TOS / getting blocked problem. If there is enough interest, I’ll post my code.

A couple weeks ago, I decided to migrate from one Google Account to another. I was able to transfer all of my emails from one to the other without too much difficulty. However, I looked around for a while and have not found any way to export all of my Google Talk Chat history. I don’t think there is any way to access saved chats from either IMAP or POP. I did notice though, that through the Gmail web interface, you can view saved chats as a raw message. There happens to be an old python library for interacting with the Gmail web interface called libgmail. I found however that it does not scale very well to large amounts of messages, so I had to write my own method to only process results one page at a time. Also, I found that I was easily blocked using this method over a long time, so I added 13 second delays after every request so as not to get my account suspended. It took me a day and a half to actually export all of the messages. I’m not sure if this is over kill or not, but I am tired of getting my account blocked.

Anyway, This program goes through and saves each chat history message as an .eml file. One they are in that format, it is not super hard to get them into a different Gmail account, but I’ll save that for another post.

import os
import time
import libgmail # http://libgmail.sourceforge.net/

def thread_search(ga, searchType, **kwargs):
    index = 0
    while (index == 0) or index < threadListSummary[libgmail.TS_TOTAL]:
            threadsInfo = []
            items = ga._parseSearchResult(searchType, index, **kwargs)
            try:
                threads = items[libgmail.D_THREAD]
            except KeyError:
                break
            else:
                for th in threads:
                    if not type(th[0]) is libgmail.types.ListType:
                        th = [th]
                    threadsInfo.append(th)
                threadListSummary = items[libgmail.D_THREADLIST_SUMMARY][0]
                threadsPerPage = threadListSummary[libgmail.TS_NUM]
                index += threadsPerPage
            yield libgmail.GmailSearchResult(ga, (searchType, kwargs), threadsInfo)

ga = libgmail.GmailAccount("username@gmail.com", "password")
ga.login()

for page in thread_search(ga, "query", q="is:chat"):
    print "New Page"
    time.sleep(13)
    for thread in page:
        if thread.info[0] == thread.info[10]:
            # Common case: Chats that only span one message
            filename = "chats/%s_%s.eml" % (thread.id, thread.id)
            #only download the message if we don't have it already
            if os.path.exists(filename):
                print "already have %s" % filename
                continue
            print "Downloading raw message: %s" % filename,
            message = ga.getRawMessage(thread.id).decode('utf-8').lstrip()
            print "done."
            file(filename, 'wb').write(message)
            time.sleep(13)
            continue
        # Less common case: A thread that has multiple messages
        print "Looking up messages in thread %s" % thread.id
        time.sleep(13)
        for message in thread:
            filename = "chats/%s_%s.eml" % (thread.id, message.id)
            #only download the message if we don't have it already
            if os.path.exists(filename):
                print "already have %s" % filename
                continue
            print "Downloading raw message: %s" % filename,
            file(filename, 'wb').write(message.source.lstrip())
            print "done."
            time.sleep(13)

This one checks to make sure my Gmail Contacts’ names are spelt how the contact spells them.

It looks at emails sent from the contact, and compares the sender name with the name on file for that contact.

import libgmail # http://libgmail.sourceforge.net/

ga = libgmail.GmailAccount("email@gmail.com", "password")
ga.login()
all_contacts = ga.getContacts().getAllContacts()

def test_thread(thread, contact):
    for message in thread:
        if message.sender.lower() != contact.email.lower():
            continue
        if '@' in message.author_fullname:
            continue
        if message.author_fullname == contact.name:
            return True
        else:
            print "\t", message.author_fullname, "->", contact.name
            return True

for contact in all_contacts:
    results = ga.getMessagesByQuery("from: %s -is:chat" % contact.email)
    if not results:
        print "%s -------No Email" % contact.email
        continue
    for thread in results:
        result = test_thread(thread, contact)
        if result:
            break
    else:
        print "%s -------No Good Email" % contact.email

XMPP Jabber Photo Module

I was looking around for a XMPP vCard Photo Module in May of 2007, and could not find one, so I wrote my own. It uses the xmpppy python module for communicating with the jabber server. Once the connection is made with the server, call register_handler(session). It will then download avatars of people on the roster list. To get the filename of a person’s avatar, call get_photo(photo_hash).

import base64
import os
import sha
import xmpp

PHOTO_DIR = "./photos/"

PHOTO_TYPES = {
    'image/png': '.png',
    'image/jpeg': '.jpg',
    'image/gif': '.gif',
    'image/bmp': '.bmp',
    }

def append_directory(filename):
    return os.path.join(PHOTO_DIR, filename)

def register_handler(session):
    session.RegisterHandler('presence', photo_update_handler)

def photo_update_handler(session, stanza):
    JID = stanza['from'].getStripped()
    vupdate = stanza.getTag('x', namespace='vcard-temp:x:update')
    if not vupdate:
        return
    photo = vupdate.getTag('photo')
    if not photo:
        return
    photo = photo.getData()
    if not photo:
        return
    #request the photo only if we don't have it already
    if not get_photo(photo):
        request_vcard(session, JID)

def get_photo(photo_hash):
    for ext in PHOTO_TYPES.values():
        filepath = append_directory(photo_hash + ext)
        if os.path.exists(filepath):
            return filepath

def request_vcard(session, JID):
    n = xmpp.Node('vCard', attrs={'xmlns': xmpp.NS_VCARD})
    iq = xmpp.Protocol('iq', JID, 'get', payload=[n])
    return session.SendAndCallForResponse(iq, recieve_vcard)

def recieve_vcard(session, stanza):
    photo = stanza.getTag('vCard').getTag('PHOTO')
    if not photo:
        return
    photo_type = photo.getTag('TYPE').getData()
    photo_bin = photo.getTag('BINVAL').getData()
    photo_bin = base64.b64decode(photo_bin)
    ext = PHOTO_TYPES[photo_type]
    photo_hash = sha.new()
    photo_hash.update(photo_bin)
    photo_hash = photo_hash.hexdigest()
    filename = append_directory(photo_hash + ext)
    file(filename, 'wb').write(photo_bin)

UPDATE 7/20/09
Here is roughly how I would solve the use case in the comment. I realize that my code doesn’t solve that case very nicely. If it made this more object oriented I could solve that problem. Until then, here is some rough code that I think does what it needs to. Save the above code as photo.py and be sure to create a “photos” directory

import xmpp
import photo

jid_photo_map = {}

def receive_presence(session, stanza):
    jid = stanza['from'].getStripped()
    vupdate = stanza.getTag('x', namespace='vcard-temp:x:update')
    if not vupdate:
        return
    photo = vupdate.getTag('photo')
    if not photo:
        return
    photo = photo.getData()
    if not photo:
        return
    jid_photo_map[jid] = photo

def login(username, password):
    jabber = xmpp.Client('gmail.com')
    jabber.connect(server=('talk.google.com', 5223))
    jabber.auth(username, password, 'test_client')
    jabber.sendInitPresence()
    jabber.RegisterHandler('presence', receive_presence)
    photo.register_handler(jabber)
    return jabber

def display_filenames(j):
    roster = j.getRoster()
    for jid in roster.getItems():
        if jid in jid_photo_map:
            photo_filename = photo.get_photo(jid_photo_map[jid])
        else:
            photo_filename = ""
        print jid, photo_filename

if __name__ == '__main__':
    j = login('username', 'password')
    for x in range(30):
        j.Process(1) # process for 1 second
    display_filenames(j)

kml

This one generates kml files.

#based on http://code.google.com/support/bin/answer.py?answer=82711

import urllib
import xml.dom.minidom

# Sign up for one here: http://code.google.com/apis/maps/signup.html
MAPS_KEY = 'ABQIAAAAEeSCthnR8JRVr1KT6BaG_RQMZpOArjdEOtW7EKQiKtof-X3NFhSVjqCZ7dXYYmHgopatyvyTIp_r7w'

def geocode(address):
    # This function queries the Google Maps API geocoder with an
    # address. It gets back a csv file, which it then parses and
    # returns a string with the longitude and latitude of the address.

    mapsKey = MAPS_KEY
    mapsUrl = 'http://maps.google.com/maps/geo?q='
     
    url = ''.join([mapsUrl,urllib.quote(address),'&output=csv&key=',mapsKey])

    print "Looking up %s..." % address,
    coordinates = urllib.urlopen(url).read().split(',')
    print coordinates
    coorText = '%s,%s' % (coordinates[3],coordinates[2])
    if coorText == '0,0':
        print "trying again in 2 seconds."
        import time
        time.sleep(2)
        print "Looking up %s..." % address,
        coordinates = urllib.urlopen(url).read().split(',')
        print coordinates
        coorText = '%s,%s' % (coordinates[3],coordinates[2])        
    return coorText

class KML():
    def __init__(self, name=None, description=None):
        self.kmlDoc = xml.dom.minidom.Document()

        kmlElement = self.kmlDoc.createElementNS('http://earth.google.com/kml/2.2','kml')

        kmlElement = self.kmlDoc.appendChild(kmlElement)

        documentElement = self.kmlDoc.createElement('Document')
        self.documentElement = kmlElement.appendChild(documentElement)

        if name:
            nameElement = self.kmlDoc.createElement('name')
            nameElement.appendChild(self.kmlDoc.createTextNode(name))   
            self.documentElement.appendChild(nameElement)

        if description:
            descriptionElement = self.kmlDoc.createElement('description')
            descriptionElement.appendChild(self.kmlDoc.createTextNode(description))   
            self.documentElement.appendChild(descriptionElement)

    def add_placemark(self, name, description, address):
        placemarkElement = self.kmlDoc.createElement('Placemark')

        nameElement = self.kmlDoc.createElement('name')
        nameElement.appendChild(self.kmlDoc.createTextNode(name))   
        placemarkElement.appendChild(nameElement)

        descriptionElement = self.kmlDoc.createElement('description')
        descriptionElement.appendChild(self.kmlDoc.createTextNode(description))   
        placemarkElement.appendChild(descriptionElement)

        pointElement = self.kmlDoc.createElement('Point')
        placemarkElement.appendChild(pointElement)

        coordinates = geocode(address)
        coorElement = self.kmlDoc.createElement('coordinates')
        coorElement.appendChild(self.kmlDoc.createTextNode(coordinates))
        pointElement.appendChild(coorElement)

        self.documentElement.appendChild(placemarkElement)

    def __str__(self):
        return self.kmlDoc.toprettyxml(' ')

def email(address):
    return '<a href="mailto:%s">%s</a>' % (address, address)

if __name__ == '__main__':
    k = KML()
    k.add_placemark('Google Headquarters', "1600 Amphitheatre Pkwy<br/>Mountain View, CA 94043<br/>", "1600 Amphitheatre Pkwy Mountain View, CA 94043")

    f = file('google.kml', 'w')
    f.write(str(k))
    f.close()

xmpp (jabber) photo module

I made a Google Talk and Google Maps mashup a while back. I could not find a xmpp module that gave access to the person’s picture, so I had to write one that extracts it from the vcard.

import base64, hashlib
from os import path
import xmpp

def photo_update_handler(session, stanza):
    JID = stanza['from'].getStripped()
    vupdate = stanza.getTag('x',namespace='vcard-temp:x:update')
    if not vupdate:return
    photo = vupdate.getTag('photo')
    if not photo:return
    photo = photo.getPayload()
    if not photo:return
    photo = photo[0]
    session._owner.Roster._data[JID]['photo'] = photo
    #download the photo only if we don't have a photo with that hash
    if not get_photo(photo) and 'summona' not in JID:
        #a quick fix to an unkonwn bug with andy's picture
        request_vcard(session, JID)

def register_handler(session):
    session.RegisterHandler(
        'presence',
        photo_update_handler,
        '',
        'jabber:client',
        )

def recieve_vcard(session, stanza):
    vcard = stanza.getTag('vCard')
    #name = vcard.getTags('FN')[0].getPayload()[0]
    photo = vcard.getTag('PHOTO')
    if not photo: return
    photo_type = photo.getTag('TYPE').getPayload()[0]
    photo_bin = photo.getTag('BINVAL').getPayload()[0]
    photo_bin = base64.b64decode(photo_bin)
    if photo_type == 'image/png':
        ext = '.png'
    elif photo_type == 'image/jpeg':
        ext = '.jpg'
    elif photo_type == 'image/gif':
        ext = '.gif'
    elif photo_type == 'image/bmp':
        ext = '.bmp'
    else:
        print "Unknown file type: %s" % photo_type
        ext = ''
    sha1sh = hashlib.sha1()
    sha1sh.update(photo_bin)
    sha1sh = sha1sh.hexdigest()
    f = file(sha1sh + ext,'wb')
    f.write(photo_bin)
    f.close()
    #JID = stanza['from'].getStripped()

def request_vcard(session, JID):
    n = xmpp.Node('vCard', attrs={'xmlns':xmpp.NS_VCARD})
    i = xmpp.Protocol('iq', JID, 'get', payload=[n])
    return session.SendAndCallForResponse(i, recieve_vcard)

def get_photo(sha1sh):
    ext = ['.jpg','.png','.gif','.bmp']
    for x in ext:
        if path.exists(sha1sh + x):
            return sha1sh + x