Quantcast
Channel: TimescaleDB and PostgreSQL - Timescale Community Forum
Viewing all articles
Browse latest Browse all 291

300m row ingestion into a hypertable per day

$
0
0

For context, I am building a database that has to deal with ingesting lots of crawled data per day. My schema is similar to the following:

CREATE TABLE public.data (
    timestamp TIMESTAMPTZ,
    id INT REFERENCES public.user(id),
    rank INT,
    score INT,
    mode MODE_ENUM,
    category CATEGORY_ENUM
);

SELECT create_hypertable('public.data', 'timestamp');

ALTER TABLE public.data SET (timescaledb.compress);

The queries I need to be able to do are:

  1. Query records by user ID within some time duration (day granularity)
  2. Query records by mode and category within a certain rank interval within some time duration (ordering the score doesn’t work as the ordering of ties is done via other means which is not captured in the data, hence a rank column is needed). Rank values are unique within a category per mode, and are ordered from 1 to N.

For the sake of this example, we’ll assume that there are 10 modes, and 100 categories per mode (all categories are the same per mode). I tried normalizing out both ways (by mode and by category) but it involved plenty of joins and indexes, and queries of type (1) were still very slow (30s - 60s or more). I am in the midst of continuing experimenting around with compressing, segmenting, and ordering this table, together with different indexes.

Insert times have a lot more flexibility. They happen in bulk with COPY once a day and can take several hours without issue.

Given all the information above, is there an indexing/compression scheme that would speed these queries up? On the current hardware I have (HDD), I am hoping to get it down to around 5s for either of the queries above. The queried data will go through aggressive caching further upstream. Inserted data is never updated.

1 post - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 291

Trending Articles