Skip Navigation

How lemmy.ca took on finding why PostgreSQL was at 100% CPU and crashing the Lemmy website this past weekend. PostgreSQL auto_explain

IIRC, it was lemmy.ca full copy of live data that was used (copy made on development system, not live server - if I'm following). Shared Saturday July 22 on GitHub was this procedure:

...

Notable is the AUTO_EXPLAIN SQL activation statements:

LOAD 'auto_explain';
SET auto_explain.log_min_duration = 0;
SET auto_explain.log_analyze = true;
SET auto_explain.log_nested_statements = true;

This technique would be of great use for developers doing changes and study of PostgreSQL activity. Thank you!

6

You're viewing a single thread.

6 comments
  • Yeah this was the only way I could get visibility into the queries running inside the triggers.

    • Thank you for sharing, I had not come across the auto_explain technique and was manually pulling out SQL from pg_stat_statements and trying to match up to runtime parameters for $1 $2 etc with EXPLAIN

      • Oof yeah, that's not gonna be a fun way to troubleshoot!

        Thanks for taking the ball and running with this, looks like it's getting a lot of attention now and hopefully next release will have some juicy sql improvements!

        • I've really been bothered by the server crashes, especially on lemmy.ml since even before I created my first Lemmy account. so far, I haven't seen anyone able to reproduce the overloads/crashes outside of production servers.

          If we can get pg_stat_statatements with on 0.18.3 with pg_stat_statements.track = all out of a real Lemmy database run for 12 or more hours, that is having crashes, I think we might find something that's being overlooked.

You've viewed 6 comments.