Issue
I have a table with around 47millions entries. I need to update this table for around 3mil entries very fast, for this I use type_id
and an int inside a range to select records and then update the value on another field, while excluding another record.
UPDATE "table_name" SET "changes" = 12 WHERE ("table_name".
"type_id" = 78 AND "table_name"."rank" BETWEEN 2 AND 2238079 AND NOT ("table_name"."id" = 55423921))
This query was automatically generated by Django using the update
method on a QuerySet
, but it follows the bahavior described above and can be changed to what ever is more efficient.
The reason I store all those records in the same table is to dynamically add new type_id
inside without having to create a new table for each (assuming doing so would improve the speed).
Currently the table has multiple indexes, but let's list the fields first:
id (PK);
type (foreign key containing ID + type name);
score (BigInt);
score_data (jsonb fr metadatas);
changes (IntField);
rank (IntField, used for range selection);
user (foreign key for the user related to this record);
There are different indexes :
- BTree for field
score
andrank
- type for it's PK, probably a BTree (it's the default index)
- user, same as type
As you probably have guessed I'm making a leaderboard system which stores its rank and changes directly inside the scores.
My questions are:
- Should I split the tables into each different type to improve speed
- Should I do the above and make the
rank
a PK - Is there a better index for like an ordered table where I could just order by
score
and don't bother with rank and changes ? - Is there a simple method to improve the update query speed with rawsql ?
Timings for the included query was 4 minutes to complete which is extremely long.
I'm all open for sugestions and this table and whole structure can be changed a bit (as long as we keep the same kind of stored data).
Solution
I think, the best way for you is to make a view-table with this query.
WITH RankedTable AS (
SELECT id as user,
ROW_NUMBER() OVER (ORDER BY score DESC, some_field DESC) AS rank
FROM table_name WHERE score > 0
)
SELECT rank::integer, user::integer
FROM RankedTable
And then you will have a view-table which will automatically update data in case of changing in the main table, it's much faster and you don't have to update the rank field everytime.
Answered By - strd0x
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.