I really want to turn off the catch-all option in my Google Apps email service. Before I do that however, I would like to get an idea about the usernames that are actually received mail on a semi-regular basis. That way I can decide which ones I would like to keep (by making an alias) or lose (by just letting it bounce).
Is there a way to extract this information from a Gmail account, preferably without having to copy a significant chunk of my inbox over IMAP?
Google has a dedicated site, called Data Liberation, from where you can get all the information you need on how to extract the data from their services.
The only way that you can solve your problem is to get somehow the data on your local machine, because there is nothing you can do through the UI that they are offering. For gmail that way is by using IMAP or POP.
You could use the Advanced Search options and search for all mail that does not have your primary email address in the
cc fields. (You can't filter on
That way you will get a list of emails within Gmail that you were either:
Depending on the number of emails received this may help you.
Use the following search term:
!to:me !cc:me !from:me
This is what I ended up doing.
The suggestion to use the IMAP service is absolutely correct. The trick is not to pull in all your mail, but use the features of IMAP to retrieve only the information you are interested in. In our case, that is the "Delivered-to:" header. The entire process only takes a few minutes on a full gmail account.
import imaplib # Login to IMAP and get ALL message IDs. mail = imaplib.IMAP4_SSL("imap.gmail.com") mail.login('xxxxxx','yyyyyy') mail.select("[Gmail]/All Mail") result,data = mail.search(None,"ALL") ids = data id_list = ids.split() # Retrieve the delivered-to headers in chunks of 100 and output them while len(id_list)>0: nowlist = id_list[:100] id_list = id_list[100:] result, data = mail.fetch(",".join(nowlist),'(BODY.PEEK[HEADER.FIELDS (DELIVERED-TO)])') for dt in data: if len(dt) == 2: # We actually have such a header if ":" in dt: # And it's not empty print dt.split(":").strip() # print the address