====== Presence and count of specific mail headers ======
This query extracts, for each of the header fields from a fixed list, the number of occurrences of all distinct values of the field.

<code sql>
SELECT
 split_part(substr(lines, 1+position(E'\n'||field in lines), 200), E'\n', 1) as ct,
 count(*)
FROM header, (values ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'), ('X-MSMail-Priority'), ('X-MS-Priority')) as h(field)
WHERE position(E'\n'||field IN lines)>0
group by 1
</code>

Sample output:
<code>
            ct             | count 
---------------------------+-------
 Importance: high          |    32
 Importance: High          |   130
 Importance: low           |     1
 Importance: Medium        |     3
 Importance: normal        |    44
 Importance: Normal        |   802
 Precedence: bulk          | 19987
 Precedence: bulk          |     1
 Precedence: fm-user       |     3
 Precedence: junk          |     3
 Precedence: list          |  4026
 Priority: non-urgent      |    11
 Priority: Non-Urgent      |     1
 Priority: normal          |   464
 Priority: urgent          |   603
 Priority: Urgent          |    12
 X-MSMail-Priority: High   |   121
 X-MSMail-Priority: Low    |     1
 X-MSMail-Priority: Lowest |    11
 X-MSMail-Priority: Medium |     3
 X-MSMail-Priority: Middle |     8
 X-MSMail-Priority: Normal |   923
 X-Priority: 0             |     1
 X-Priority: 1             |   102
 X-Priority: 1 (High)      |     2
 X-Priority: 1 (Highest)   |   120
 X-Priority: 2             |     6
 X-Priority: 2 (High)      |     1
 X-Priority: 3             |  2892
 X-Priority: 3 (Normal)    |   635
 X-Priority: 5             |    14
 X-Priority: Normal        |     5
</code>

This alternative implementation uses regular expressions and differs in that it doesn't limit header values to 200 characters or any other fixed length.
<code sql>
select FIELD||':'||arr[1], count(*)
 from (select FIELD, regexp_matches(lines, '(?:^|\n)' || FIELD || ':\s*([^\n]*)', 'gi') as arr
       from header, 
         (VALUES ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'),
             ('X-MSMail-Priority'), ('X-MS-Priority')) AS h(FIELD)
             where strpos(lines,FIELD)>0) l
GROUP BY 1 ORDER BY 1
</code>

In this version, the '' strpos(lines,FIELD)>0 '' condition is not essential: it's introduced only as a first-pass filter to eliminate the headers that don't contain anywhere any of the searched fields.
====== Duplicate messages ======
This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate.
<code sql>
select h1.mail_id from header h1, header h2 where h1.lines=h2.lines and h1.mail_id > h2.mail_id
</code>

A stricter version, comparing the md5 hashes of bodies (text and html parts) in addition to the headers:
<code sql>
select b2.mail_id from body b1, body b2, header h1, header h2
 where b1.mail_id < b2.mail_id
 and h1.mail_id = b1.mail_id
 and h2.mail_id = b2.mail_id
 and md5(h1.lines) = md5(h2.lines)
 and md5(b1.bodytext) is not distinct from md5(b2.bodytext)
 and md5(b1.bodyhtml) is not distinct from md5(b2.bodyhtml);
</code>

The ''IS NOT DISTINCT'' comparator behaves as expected when ''bodytext'' or ''bodyhtml'' is NULL, as opposed
to the simple equality operator, for which ''NULL=NULL'' is false

====== Hierarchical view of tags  ======

Output a list of tags sorted by hierarchy level and names,
with an indentation to represent the hierarchy.

<code sql>
WITH RECURSIVE tagr(a,_tag_id,name,level) as (
 select
  array[row_number() over (order by name)] as a,
  tag_id,
  name,
  1 as level
 from tags where parent_id is null
UNION ALL
 select
   tagr.a || row_number() over (order by tags.name),
   tag_id,
   tags.name,
   tagr.level+1
 FROM tags JOIN tagr ON tagr._tag_id=tags.parent_id
)
select repeat('   ', level-1) || name from tagr order by a;

</code>