1 邮件内容
假设目前邮件名叫“1.txt”,邮件内容为:
From: Justin-Bieber@entertain.org on behalf of BieberLeader [leader@hello.org]Sent: 2017-07-01 12:48To: 'staff@hello.org'; custom@hello.org;Willim Johnson; John SnowSubject: The battlefield in WinterfellI have just met then. More details as soon as possible. So far, so good.Sent via iPhone 7 plus
2 提取思路
- 要求把邮件头部信息提取出来,需要提取信息:
- 发件人(From:)、发件时间(Sent)、收件人(To)、主题(Subject)
- 初步提取信息所在行的内容即可。
- 使用一个提取函数,把四个关键词放入数组中,用正则提取。
- 四个信息都做了全局函数,如果曾经匹配过,则全局函数 + 1,以做标识。
- 如果一个信息已经匹配过,而下一个信息还没匹配到,这一行的内容也需要读取出来。
- 提取函数的返回值,如果是
None
则不做处理。
# coding: utf-8import refrom_count = 0sent_count = 0to_count = 0subject_count = 0def inspect_string(string): global from_count global sent_count global to_count global subject_count keyword_list = ['From:', 'Sent:', 'To:', 'Subject:'] for keyword in keyword_list: regex_str = ".*({0}.*)".format(keyword) match_obj = re.match(regex_str, string) if re.match(".*(From:.*)", string): from_count += 1 if re.match(".*(Sent:.*)", string): sent_count += 1 if re.match(".*(To:.*)", string): to_count += 1 if re.match(".*(Subject:.*)", string): subject_count += 1 if match_obj: return match_obj.group(1) if from_count > 0 and sent_count < 1: return string if sent_count > 0 and to_count < 1: return string if to_count > 0 and subject_count < 1: return stringwith open('1.txt', 'rb') as f: for line in f: result = inspect_string(str(line)) if result is None: continue print(result)
3 运行结果
From: Justin-Bieber@entertain.org on behalf of BieberLeader [leader@hello.org]Sent: 2017-07-01 12:48To: 'staff@hello.org'; custom@hello.org;Willim Johnson; John SnowSubject: The battlefield in Winterfell
文章转载于:https://www.jianshu.com/p/11de9fc6a74d
原著是一个有趣的人,若有侵权,请通知删除
还没有人抢沙发呢~