Quantcast
Channel: TimescaleDB and PostgreSQL - Timescale Community Forum
Viewing all articles
Browse latest Browse all 291

Performance issue when refreshing continuous aggregate

$
0
0

Hello!

I’m not quite sure if I’m doing something wrong here or if this is expected. I’m currently using the version 2.19.1 on a self hosted PostgreSql (16.8-1) in the Azure cloud. I’m currently facing some performance issues when refreshing my continuous aggregate.

Currently my raw data is stored in the following table:

CREATE TABLE IF NOT EXISTS public.signals
(
    "timestamp" timestamp with time zone NOT NULL,
    value real NOT NULL,
    equipment_key smallint NOT NULL,
    signal_definition_key smallint NOT NULL
);

CREATE INDEX IF NOT EXISTS ix_signals_equipment_key_signal_definition_key_timestamp
    ON public.signals USING btree
    (equipment_key ASC NULLS LAST, signal_definition_key ASC NULLS LAST, "timestamp" DESC NULLS FIRST)
    TABLESPACE pg_default;

The table itself is compressed (column store) after 3 days. The cag is refreshing itself with data with a start_offset of 90 days (data is dropped in raw data table after 100 days) and an end offset of 3 days. (Same time as compression) The cag has the following definition:

CREATE MATERIALIZED VIEW signals_secondly             
WITH (timescaledb.continuous, timescaledb.materialized_only = false) AS
SELECT
	time_bucket(INTERVAL '1 second', timestamp) AS timestamp_secondly,
	equipment_key,
	signal_definition_key,
	avg(value) as average,
	min(value) as minimum,
	max(value) as maximum
FROM signals
GROUP BY equipment_key, signal_definition_key, timestamp_secondly
WITH NO DATA;

SELECT add_continuous_aggregate_policy('signals_secondly',
  start_offset => INTERVAL '90 day',
  end_offset => INTERVAL '3 day',
  schedule_interval => INTERVAL '1 min');

Currently there are about 1 billion rows added to the raw data table per day (~10+20 equipment with ~150-200 signals with 20 - 1000ms sampling rate), and the cag itself is refreshed every minute to minimize the selection needed in the raw data table. Unfortunately the refresh itself takes far longer than one minute and so the refresh will stack more and more data until it gets stuck after a few hours. (If i manually refresh it, it takes about 1 to 2 hours for 1 day of data - but most of time it fails - crashes even if I try that via policy) Problem is also that currently the system itself is not utilized to its fullest, there will be much more data inserted in the future and I guess the problem will get worse and worse and I’m not sure if just resizing the vm helps here.

When I monitored the refresh itself, I see that a lot of chunks are locked by the policy which do not have anything to do with the refresh window itself. My current chunk time interval for the raw data table is 1 day, same with the cag.

Am I doing something wrong here? Is there anything I can do to improve this or is this expected with this amount of data? I also don’t see any vm limits reached (cpu, memory or disk speed) but maybe I’m wrong there.

2 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 291

Trending Articles