I want to "update" 31 columns of table A (~ 22 million rows) from table B (~ 31 million rows).
I created a temporary table with only the columns I was interested in (i.e. 31) and also in order to eliminate a second condition (number of rows is also reduced to ~ 22 million). Next, I created an index for two columns : "column1" of the temporary table and table A (used for join). Then I created a "multicolumn indexes" on all the columns of the temporary table except for the column "column1". Unfortunately, the update is still very long (I stopped the request after 6 hours of loading)
I don't know much about the indexes. Is there any bad practice in what I have done? Are there solutions to considerably reduce the duration of the update?
Thanks in advance.
Part of the sql script and an explain plan of the update are below :
Create temp table if not exists table_TMP as (
select
"column1",
"column2",
"column3",
all other columns...
from table_B
where one_column_of_table_B=TRUE
);
CREATE INDEX if not exists idx_table_TMP
ON table_TMP USING btree
("column1")
TABLESPACE pg_default;
//multiindex on 31 columns
CREATE INDEX if not exists idx2_table_TMP ON table_TMP (
"column2",
"column3",
all other columns...);
update table_A set
"table_A_column2"=tmp."column2",
"table_A_column3"=tmp."column3",
all other columns...
FROM table_TMP tmp
Where table_A.column1 = tmp.column1;
//explain of the update :
"Update on table_A (cost=5798151.63..24891890.93 rows=21716146 width=4104)"
" -> Hash Join (cost=5798151.63..24891890.93 rows=21716146 width=4104)"
" Hash Cond: ((tmp.column1)::text = (table_A.column1)::text)"
" -> Seq Scan on table_TMP tmp (cost=0.00..770061.46 rows=21716146 width=2342)"
" -> Hash (cost=622658.39..622658.39 rows=22008739 width=1798)"
" -> Seq Scan on table_A (cost=0.00..622658.39 rows=22008739 width=1798)"
question from:https://stackoverflow.com/questions/65904042/updating-a-large-table-from-another-takes-a-long-time-more-than-6-hours-in-pos