IMAP Import

I used this to upload all of my old chats saved as .eml files directly into my new Gmail account. It uses IMAP’s append to do this. I had tried Thunderbird and the ImportExportTools plugin, but ran into trouble.

This puts all of the messages into a label called “oldchats”. This label needs to exist before this program is run. It also deletes each message after it is uploaded. I made a backup copy of my messages.

import email
import os
import time
import imaplib

imap = imaplib.IMAP4_SSL('imap.gmail.com', 993)
imap.login('email@gmail.com', 'password')

os.chdir("chats")
for filename in os.listdir("."):
    print filename,
    raw_eml = file(filename).read()
    msg = email.message_from_string(raw_eml)
    date = email.utils.parsedate(msg['Date'])
    print date,
    imap.append('oldchats', None, date, raw_eml)
    print "done"
    os.remove(filename)
Advertisements

Google Chat History Downloader

Update 2011-11-09:
Gmail now officially supports downloading chat history via IMAP. Thank’s to Steve for pointing it out. It can be enabled in the “Labels” section of Gmail settings.

Update 2011-08-30:

Based on the comments, this doesn’t work anymore. I’d recommend checking out this thread for solutions: http://www.google.com/support/forum/p/gmail/thread?tid=7a7d2d6da5be047f

I personally have been using a javascript-based solution for exporting recent chat data, which still doesn’t solve the TOS / getting blocked problem. If there is enough interest, I’ll post my code.

A couple weeks ago, I decided to migrate from one Google Account to another. I was able to transfer all of my emails from one to the other without too much difficulty. However, I looked around for a while and have not found any way to export all of my Google Talk Chat history. I don’t think there is any way to access saved chats from either IMAP or POP. I did notice though, that through the Gmail web interface, you can view saved chats as a raw message. There happens to be an old python library for interacting with the Gmail web interface called libgmail. I found however that it does not scale very well to large amounts of messages, so I had to write my own method to only process results one page at a time. Also, I found that I was easily blocked using this method over a long time, so I added 13 second delays after every request so as not to get my account suspended. It took me a day and a half to actually export all of the messages. I’m not sure if this is over kill or not, but I am tired of getting my account blocked.

Anyway, This program goes through and saves each chat history message as an .eml file. One they are in that format, it is not super hard to get them into a different Gmail account, but I’ll save that for another post.

import os
import time
import libgmail # http://libgmail.sourceforge.net/

def thread_search(ga, searchType, **kwargs):
    index = 0
    while (index == 0) or index < threadListSummary[libgmail.TS_TOTAL]:
            threadsInfo = []
            items = ga._parseSearchResult(searchType, index, **kwargs)
            try:
                threads = items[libgmail.D_THREAD]
            except KeyError:
                break
            else:
                for th in threads:
                    if not type(th[0]) is libgmail.types.ListType:
                        th = [th]
                    threadsInfo.append(th)
                threadListSummary = items[libgmail.D_THREADLIST_SUMMARY][0]
                threadsPerPage = threadListSummary[libgmail.TS_NUM]
                index += threadsPerPage
            yield libgmail.GmailSearchResult(ga, (searchType, kwargs), threadsInfo)

ga = libgmail.GmailAccount("username@gmail.com", "password")
ga.login()

for page in thread_search(ga, "query", q="is:chat"):
    print "New Page"
    time.sleep(13)
    for thread in page:
        if thread.info[0] == thread.info[10]:
            # Common case: Chats that only span one message
            filename = "chats/%s_%s.eml" % (thread.id, thread.id)
            #only download the message if we don't have it already
            if os.path.exists(filename):
                print "already have %s" % filename
                continue
            print "Downloading raw message: %s" % filename,
            message = ga.getRawMessage(thread.id).decode('utf-8').lstrip()
            print "done."
            file(filename, 'wb').write(message)
            time.sleep(13)
            continue
        # Less common case: A thread that has multiple messages
        print "Looking up messages in thread %s" % thread.id
        time.sleep(13)
        for message in thread:
            filename = "chats/%s_%s.eml" % (thread.id, message.id)
            #only download the message if we don't have it already
            if os.path.exists(filename):
                print "already have %s" % filename
                continue
            print "Downloading raw message: %s" % filename,
            file(filename, 'wb').write(message.source.lstrip())
            print "done."
            time.sleep(13)

Comet Chat Server

Here is a demo I wrote that demonstrates how to use the Comet method of http streaming. Of course this was before it was named Comet.

from cgi import escape
from random import uniform
from Queue import Queue,Empty
from sets import Set
from socket import error
from threading import Thread
from urllib import unquote_plus
from wsgiref.simple_server import make_server

class Connection(Queue):
    """Handles the persistant connection between the client and server"""
    #This set could get messed up by multi-threading.
    objects=Set() #set of live connections

    def __init__(self,obj_up_hook=None):
        self.name=""
        self.obj_up_hook=obj_up_hook
        Queue.__init__(self)

    def __str__(self):
        return self.name

    def __repr__(self):
        return "Connection object: " + str(self)

    def online(self):
        self.objects.add(self)
        print '"' + str(self) + '" has joined'
        if self.obj_up_hook:
            self.obj_up_hook(self)

    def offline(self):
        self.objects.discard(self)
        print '"' + str(self) + '" has left'
        if self.obj_up_hook:
            self.obj_up_hook(self)

    def send_to_all(msg):
        """Sends a message to all online objects"""
        for x in Connection.objects:
            x.put(msg)
            
    send_to_all = staticmethod(send_to_all)

    def run(self,write,keep_alive=" "):
        """Waits for messages and outputs them until window is closed"""
        self.online()
        while 1:
            try:
                #Wait for a new message.
                m=self.get(True,uniform(10,15))
            except Empty:
                #The waiting timed out.
                m=keep_alive
            try:
                write(m)
            except error:
                #most likely the client closed the window
                self.offline()
                return

class ChatApp():
    """Handles a Request"""
    def __init__(self, environ, start_response):
        self.environ=environ
        self.start_response=start_response

    def index(self):
        """login page"""
        return """<script>
function submit(e){
 if(!e)e=window.event;
 if(e.keyCode==13){
  url="/main/"+input.value;
  location.href=url
 }
}
</script>
<body>
<table width=100% height=60%>
<td width=100% height=100%><center>Enter your name:<br><input id=input style="width:50%" onkeypress="submit(event)">"""

    def main(self,user):
        """main page"""
        return '''<script>
function submit(e){
 if(!e)e=window.event;
 if(e.keyCode==13){
  url="/ajax/'''+str(user)+'''?"+input.value;
  input.value="";
  if(window.ActiveXObject){ajax=new ActiveXObject("Microsoft.XMLHTTP")};
  if(window.XMLHttpRequest){ajax=new XMLHttpRequest()};
  ajax.open("GET",url,true);
  ajax.send(null);
 }
}
</script>
<body topmargin=0 bottommargin=0 leftmargin=0 rightmargin=0>
<table width=100% height=100% cellspacing=0 cellpadding=0>
<td width=80% height=100%>
<iframe id=thebox style="border-right:0;border-left:0;border-top:0;border-bottom:0" width=100% height=100% src="/top/'''+str(user)+'''"></iframe>
<td width=20% height=100%>
<iframe name=thelist style="border-right:0;border-left:0;border-top:0;border-bottom:0" width=100% height=100% src="/list/'''+str(user)+'''"></iframe>
<tr><td><input id=input style="width:100%" value="Type your message here" onkeypress="submit(event)">'''

    def refresh_online_list(self,connection):
        Connection.send_to_all('<script>u()</script>')
        
    def top(self,write,usern):
        """actual chat window"""
        #create another thread to serve new requests
        Server()

        write("""<body><script>
function u(){parent.frames["thelist"].location.reload();}
function s(str){document.write(str);window.scrollBy(0,100);}
</script>""")
        u=Connection(self.refresh_online_list)
        u.name=usern
        u.run(write)

    def onlinelist(self,user):
        """online list"""
        string =  "<b>" + str(len(Connection.objects)) + " Online:</b><br>"
        for u in Connection.objects:
            string += str(u)+"<br>n"
        return string

    def ajax(self,user,message):
        """page that accepts messages"""
        #this escape function escapes all html and quotes
        print "recieved message: " + message
        print "sending message: " + self.esc(message)
        Connection.send_to_all("<script>s('" + "<b>" + self.esc(str(user)) + 
                ":</b> " + self.esc(message) + "<br>" + "')</script>")
        print "done"

    def esc(string):
        #for html:
        string = escape(string,True)
        #for javascript (order is important)
        string = string.replace("\","\\")
        string = string.replace("'","\'")
        return string

    esc = staticmethod(esc)

    def __iter__(self):
        print "recieved request"
        write = self.start_response('200 OK', [('Content-type', 'text/html')])
        patharray = self.environ['PATH_INFO'].split('/')
        if patharray==["",""]:
            yield self.index()
            return
        if patharray[1]=="favicon.ico":
            return
        command=patharray[1]
        user=patharray[2]
        if command=="main":
            yield self.main(user)
        elif command=="top":
            self.top(write,user)
            yield ""
        elif command=="list":
            yield self.onlinelist(user)
        elif command=="ajax":
            self.ajax(user,unquote_plus(self.environ['QUERY_STRING']))
            yield ""
        else:
            yield "unknown command: "+str(command)


class Server(Thread):
    """A thread that serves requests"""
    def __init__(self):
        Thread.__init__(self)
        self.setDaemon(1)
        self.start()
    def run(self):
        self.httpd.serve_forever()

def start():
    httpd = make_server('0.0.0.0', 9081, ChatApp)
    Server.httpd=httpd
    print "Serving HTTP on port 9081..."
    s=Server()
    s.join()#don't exit

if __name__ == '__main__':
    start()