postfix causing server to hang after 1-2 days of use

by zsheppard   Last Updated January 12, 2018 23:00 PM

We use Postfix on Debian 9 as an outgoing-only SMTP server for our student information system for parents, teachers, and students in an education setting. This system sends out between 5,000 to 10,000 mails a day which - from Postfix's reputation - should be no problem. We did not want to roll our own SMTP server but due to cloud e-mail restrictions (looking at you G Suite) we were forced to so we could reliably send e-mails.

However, during periods of heavy mail queues, the entire server seems to hang. PRTG sends me a notification during this heavy queue time (usually at midnight or 1AM) that the network interface is down and I cannot SSH nor console into the server after this "freeze." Upon reboot and investigating the syslog I don't see any errors reported: (removed some to addresses and hostname of machine)

Jan 12 16:14:25  postfix/smtp[842]: 44753360A0F: to=<>, 
relay=msn-com.olc.protection.outlook.com[104.47.6.33]:25, delay=2.4, del
ays=0.15/0/0.37/1.9, dsn=2.6.0, status=sent (250 2.6.0 <> [InternalId=3448858767205, Hostname=
VE1EUR02HT049.eop-EUR02.prod.protection.outlook.com] 10100 bytes in 0.194, 50.75
2 KB/sec Queued mail for delivery)
Jan 12 16:14:25  postfix/qmgr[814]: 44753360A0F: removed
Jan 12 16:14:32  postfix/smtp[826]: B41F9360A22: to=<>, 
relay=msn-com.olc.protection.outlook.com[104.47.46.33]:25, delay=16, del
ays=0.16/0/0.23/15, dsn=2.6.0, status=sent (250 2.6.0 <> [InternalId=67813238657136, Hostname
=BN3NAM04HT027.eop-NAM04.prod.protection.outlook.com] 9577 bytes in 0.316, 29.59
2 KB/sec Queued mail for delivery)
Jan 12 16:14:32  postfix/qmgr[814]: B41F9360A22: removed
-- Reboot --
Jan 12 16:27:49  systemd-journald[343]: Missed 48 kernel messages
Jan 12 16:27:49  kernel: e820: last_pfn = 0xc0000 max_arch_pfn = 0
x400000000
Jan 12 16:27:49  kernel: found SMP MP-table at [mem 0x000f6a80-0x0
00f6a8f] mapped at [ffff97d8c00f6a80]
Jan 12 16:27:49  kernel: Base memory trampoline at [ffff97d8c00990
00] 99000 size 24576
Jan 12 16:27:49  kernel: Using GB pages for direct mapping
Jan 12 16:27:49  kernel: BRK [0x28330000, 0x28330fff] PGTABLE
Jan 12 16:27:49  kernel: BRK [0x28331000, 0x28331fff] PGTABLE
Jan 12 16:27:49  kernel: BRK [0x28332000, 0x28332fff] PGTABLE
Jan 12 16:27:49  kernel: BRK [0x28333000, 0x28333fff] PGTABLE

The Postfix mail log just shows ^@ symbols before the restart and during the freeze:

Jan 12 16:14:32 postfix/qmgr[814]: B41F9360A22: removed
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Jan 12 16:27:56 opendkim[697]: OpenDKIM Filter v2.11.0 starting (args: -x /etc/opendkim.conf)
Jan 12 16:27:58 postfix/postfix-script[807]: starting the Postfix mail system

I have no way to logon while the system is in its frozen state and further debug. I have also seen a kernel panic event a few times after rebooting the server from its "frozen" state but see no logs referencing it.

I have set the Postfix configuration parameter default_process_limit=300 to allow it to have more processes and the server has 4 CPU cores and 4GB RAM.

I am really at a loss here not sure what to try next other than rebuilding the server with a fresh ISO. Any ideas on further troubleshooting I can do?



Related Questions


Postfix sent mails are bounced or go to spam

Updated January 04, 2016 14:00 PM


Postfix does not send mails to external aliases

Updated January 01, 2017 08:00 AM


Postfix on linux not sending mail

Updated September 27, 2017 11:00 AM