sql_analysis
This is an old revision of the document!
Table of Contents
Presence and count of specific mail headers
This query extracts, for each of the header fields from a fixed list, the number of occurrences of all distinct values of the field.
SELECT split_part(substr(LINES, 1+POSITION(E'\n'||FIELD IN LINES), 200), E'\n', 1) AS ct, COUNT(*) FROM header, (VALUES ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'), ('X-MSMail-Priority'), ('X-MS-Priority')) AS h(FIELD) WHERE POSITION(E'\n'||FIELD IN LINES)>0 GROUP BY split_part(substr(LINES, 1+POSITION(E'\n'||FIELD IN LINES), 200), E'\n', 1)
Sample output:
ct | count ---------------------------+------- Importance: high | 32 Importance: High | 130 Importance: low | 1 Importance: Medium | 3 Importance: normal | 44 Importance: Normal | 802 Precedence: bulk | 19987 Precedence: bulk | 1 Precedence: fm-user | 3 Precedence: junk | 3 Precedence: list | 4026 Priority: non-urgent | 11 Priority: Non-Urgent | 1 Priority: normal | 464 Priority: urgent | 603 Priority: Urgent | 12 X-MSMail-Priority: High | 121 X-MSMail-Priority: Low | 1 X-MSMail-Priority: Lowest | 11 X-MSMail-Priority: Medium | 3 X-MSMail-Priority: Middle | 8 X-MSMail-Priority: Normal | 923 X-Priority: 0 | 1 X-Priority: 1 | 102 X-Priority: 1 (High) | 2 X-Priority: 1 (Highest) | 120 X-Priority: 2 | 6 X-Priority: 2 (High) | 1 X-Priority: 3 | 2892 X-Priority: 3 (Normal) | 635 X-Priority: 5 | 14 X-Priority: Normal | 5
Duplicate messages
This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate.
SELECT h1.mail_id FROM header h1, header h2 WHERE h1.lines=h2.lines AND h1.mail_id > h2.mail_id
Messages with specific attachment types
To retrieve all messages containing pdf files or any image file:
SELECT DISTINCT mail_id FROM attachments WHERE content_type='application/pdf' OR content_type LIKE 'image/%';
sql_analysis.1322655072.txt.gz · Last modified: 2011/11/30 12:11 by daniel