This test is designed to show the great improvements in scan performance achieved in 8.3 (currently in beta). Many other areas of performance have also been improved, but these tests are specifically designed to measure the gains in scan performance.
The features in 8.3 that make this possible do not even take effect unless the table is at least 25% of the size of shared_buffers.
This script first tests the performance of a single (no concurrent queries) scan. Then it forks num_procs processes, and each open a connection and wait until it's time to start a scan. The waiting is to stagger the scans, so that they do not start all at once. When it's time, each process starts a scan (which will be at the same time other scans are already in progress, unless it's the first one), and reports how long that scan took.
#!/usr/bin/ruby
require 'postgres'
num_procs = ARGV[0].to_i
iterations = ARGV[1].to_i
offset = ARGV[2].to_i
pids = []
db = PGconn.connect('/tmp',nil,nil,nil,'test','test',nil)
t1 = Time.now
db.exec "SELECT COUNT(*) FROM test"
t2 = Time.now
db.close
puts "single process time: #{t2-t1} seconds"
(0...num_procs).each { |i|
pid = fork
if(pid)
pids << pid
else
db = PGconn.connect('/tmp',nil,nil,nil,'test','test',nil)
sleep i*offset
puts [ Time.now.strftime("[%Y-%m-%d %H:%M:%S]"),
Process.pid.to_s, "starting" ].join("\t")
t1 = Time.now
(0...iterations).each{ |i|
res = db.exec "SELECT COUNT(*) FROM test"
}
t2 = Time.now
puts [ Time.now.strftime("[%Y-%m-%d %H:%M:%S]"),
Process.pid.to_s,
"finished: #{t2-t1}\tseconds"
].join("\t")
db.close
exit 0
end
}
pids.each{ |pid| Process::waitpid(pid) }
All settings in postgresql.conf are default, except autovacuum is disabled. Locale is en_US.UTF-8. No special compile-time options were given. Compiled for x86-64.
The test table: 150M tuples. Each tuple is a simple integer followed by 100 characters (=100 bytes) in a text attribute. To generate the table do:
=> create table test(i int, t text); => insert into test select generate_series, 'abcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxy' from generate_series(1,150000000);
The test hardware:
All tests are on linux with the anticipatory I/O scheduler. The default I/O scheduler for Linux is CFQ (Completely Fair Queue), which does not work well for PostgreSQL in my tests.
Table size in 8.2.5: approximately 20.8 GB
10 scans, staggered to start 30 seconds apart: single process time: 305.897248 seconds [2007-10-06 19:23:45] 16967 starting [2007-10-06 19:24:15] 16968 starting [2007-10-06 19:24:45] 16969 starting [2007-10-06 19:25:15] 16970 starting [2007-10-06 19:25:45] 16971 starting [2007-10-06 19:26:15] 16972 starting [2007-10-06 19:26:45] 16973 starting [2007-10-06 19:27:15] 16974 starting [2007-10-06 19:27:45] 16975 starting [2007-10-06 19:28:15] 16976 starting [2007-10-06 19:32:14] 16967 finished: 508.988534 seconds [2007-10-06 19:32:14] 16968 finished: 479.013674 seconds [2007-10-06 19:37:47] 16976 finished: 571.819937 seconds [2007-10-06 19:37:47] 16971 finished: 721.831606 seconds [2007-10-06 19:37:47] 16972 finished: 681.512183 seconds [2007-10-06 19:37:47] 16974 finished: 631.828676 seconds [2007-10-06 19:37:47] 16973 finished: 661.834094 seconds [2007-10-06 19:37:47] 16975 finished: 601.826697 seconds [2007-10-06 19:37:47] 16969 finished: 781.843272 seconds [2007-10-06 19:37:47] 16970 finished: 751.83741 seconds
20 scans, staggered to start 15 seconds apart: $ ./bench.rb 20 1 15 single process time: 306.405574 seconds [2007-10-06 19:50:50] 17672 starting [2007-10-06 19:51:05] 17673 starting [2007-10-06 19:51:20] 17674 starting [2007-10-06 19:51:35] 17675 starting [2007-10-06 19:51:50] 17676 starting [2007-10-06 19:52:05] 17677 starting [2007-10-06 19:52:20] 17678 starting [2007-10-06 19:52:35] 17679 starting [2007-10-06 19:52:50] 17680 starting [2007-10-06 19:53:05] 17681 starting [2007-10-06 19:53:20] 17682 starting [2007-10-06 19:53:35] 17683 starting [2007-10-06 19:53:50] 17684 starting [2007-10-06 19:54:05] 17685 starting [2007-10-06 19:54:20] 17686 starting [2007-10-06 19:54:35] 17687 starting [2007-10-06 19:54:50] 17688 starting [2007-10-06 19:55:05] 17689 starting [2007-10-06 19:55:20] 17690 starting [2007-10-06 19:55:35] 17691 starting [2007-10-06 20:28:31] 17685 finished: 2065.646385 seconds [2007-10-06 20:28:31] 17684 finished: 2080.654577 seconds [2007-10-06 20:28:31] 17683 finished: 2095.656818 seconds [2007-10-06 20:28:31] 17675 finished: 2215.665656 seconds [2007-10-06 20:28:31] 17676 finished: 2200.662957 seconds [2007-10-06 20:28:31] 17672 finished: 2260.648018 seconds [2007-10-06 20:28:31] 17682 finished: 2110.633246 seconds [2007-10-06 20:28:31] 17677 finished: 2185.664061 seconds [2007-10-06 20:28:31] 17687 finished: 2035.629152 seconds [2007-10-06 20:28:31] 17679 finished: 2155.663292 seconds [2007-10-06 20:28:31] 17674 finished: 2230.665842 seconds [2007-10-06 20:28:31] 17681 finished: 2125.663343 seconds [2007-10-06 20:28:31] 17673 finished: 2245.674556 seconds [2007-10-06 20:28:31] 17680 finished: 2140.665867 seconds [2007-10-06 20:28:31] 17678 finished: 2170.67249 seconds [2007-10-06 20:28:31] 17686 finished: 2050.664088 seconds [2007-10-06 20:28:34] 17690 finished: 1994.382316 seconds [2007-10-06 20:28:34] 17691 finished: 1979.391717 seconds [2007-10-06 20:28:34] 17688 finished: 2024.400597 seconds [2007-10-06 20:28:34] 17689 finished: 2009.437021 seconds
Table size in 8.3beta1: approximately 19.7 GB
10 scans, staggered to start 30 seconds apart: $ ./bench.rb 10 1 30 single process time: 310.992567 seconds [2007-10-07 00:13:39] 24327 starting [2007-10-07 00:14:09] 24328 starting [2007-10-07 00:14:39] 24329 starting [2007-10-07 00:15:09] 24330 starting [2007-10-07 00:15:39] 24331 starting [2007-10-07 00:16:09] 24332 starting [2007-10-07 00:16:39] 24333 starting [2007-10-07 00:17:09] 24334 starting [2007-10-07 00:17:39] 24335 starting [2007-10-07 00:18:09] 24336 starting [2007-10-07 00:18:57] 24327 finished: 318.092037 seconds [2007-10-07 00:19:27] 24328 finished: 317.486874 seconds [2007-10-07 00:19:57] 24329 finished: 317.56133 seconds [2007-10-07 00:20:26] 24330 finished: 316.905676 seconds [2007-10-07 00:20:56] 24331 finished: 316.63102 seconds [2007-10-07 00:21:26] 24332 finished: 316.313321 seconds [2007-10-07 00:21:56] 24333 finished: 316.234245 seconds [2007-10-07 00:22:24] 24334 finished: 314.901694 seconds [2007-10-07 00:22:54] 24335 finished: 314.846691 seconds [2007-10-07 00:23:23] 24336 finished: 314.135473 seconds
20 scans, staggered to start 15 seconds apart: $ ./bench.rb 20 1 15 single process time: 309.949542 seconds [2007-10-07 00:40:44] 25097 starting [2007-10-07 00:40:59] 25098 starting [2007-10-07 00:41:14] 25099 starting [2007-10-07 00:41:29] 25100 starting [2007-10-07 00:41:44] 25101 starting [2007-10-07 00:41:59] 25102 starting [2007-10-07 00:42:14] 25103 starting [2007-10-07 00:42:29] 25104 starting [2007-10-07 00:42:44] 25105 starting [2007-10-07 00:42:59] 25106 starting [2007-10-07 00:43:14] 25107 starting [2007-10-07 00:43:29] 25108 starting [2007-10-07 00:43:44] 25109 starting [2007-10-07 00:43:59] 25110 starting [2007-10-07 00:44:14] 25111 starting [2007-10-07 00:44:29] 25112 starting [2007-10-07 00:44:44] 25113 starting [2007-10-07 00:44:59] 25114 starting [2007-10-07 00:45:14] 25115 starting [2007-10-07 00:45:29] 25116 starting [2007-10-07 00:46:03] 25097 finished: 318.567256 seconds [2007-10-07 00:46:19] 25098 finished: 319.490271 seconds [2007-10-07 00:46:34] 25099 finished: 319.897278 seconds [2007-10-07 00:46:49] 25100 finished: 320.0251 seconds [2007-10-07 00:47:04] 25101 finished: 319.868858 seconds [2007-10-07 00:47:19] 25102 finished: 319.48718 seconds [2007-10-07 00:47:34] 25103 finished: 319.670489 seconds [2007-10-07 00:47:49] 25104 finished: 319.932719 seconds [2007-10-07 00:48:04] 25105 finished: 319.788767 seconds [2007-10-07 00:48:19] 25106 finished: 319.877593 seconds [2007-10-07 00:48:34] 25107 finished: 319.523455 seconds [2007-10-07 00:48:49] 25108 finished: 319.252667 seconds [2007-10-07 00:49:04] 25109 finished: 319.314306 seconds [2007-10-07 00:49:18] 25110 finished: 318.703788 seconds [2007-10-07 00:49:32] 25111 finished: 317.934656 seconds [2007-10-07 00:49:47] 25112 finished: 317.769151 seconds [2007-10-07 00:50:02] 25113 finished: 317.707941 seconds [2007-10-07 00:50:17] 25114 finished: 317.642323 seconds [2007-10-07 00:50:31] 25115 finished: 316.932149 seconds [2007-10-07 00:50:46] 25116 finished: 316.351529 seconds
8.2 degrades quickly when multiple large sequential scans are executing at once, and in 8.3 the scans execute almost as fast as if no other scans were in progress. This is a major win for these types of queries. The performance advantage comes from the Synchronized Scan feature of 8.3.
There is also an unseen benefit: even though 8.3 is much faster, these large sequential scans actually use much less space in the shared buffers. This is due to the buffer recycling feature of 8.3, which avoids "cache pollution" that could happen with large sequential scans.
Notice that the single scans go slightly faster in 8.2 versus 8.3 in this test. When I saw this result, I decided to find out the cause, so I tried reading the files directly.
Reading 8.2.5 data files: $ time cat data82-big/base/16384/16386* > /dev/null real 5m5.673s user 0m0.876s sys 0m15.697s Reading 8.3beta1 data files: $ time cat data83-big/base/16387/16395* > /dev/null real 5m10.196s user 0m0.908s sys 0m15.089sSo, it must be the placement on disk that explains the slight difference. Even though the 8.3 data files are slightly smaller, they take longer to read from disk, regardless of PostgreSQL. So, straight
COUNT(*) on a large table appears to go at disk speed in both 8.2 and 8.3.
This is actually a win for 8.3 also, because 8.3 has reduced row header size (saving either 4 or 8 bytes per tuple, depending on platform), and also varvarlena, which saves space in length headers for variable-length attributes. These two optimizations equal about 1GB of total space savings in this test (or about 5%), meaning that there is less to read from disk. If both databases happened to be placed identically on disk by the filesystem, I would expect these two 8.3 optimizations to show about a 5% advantage over 8.2.
One thing these benchmarks didn't entirely explain is why, in 8.2, all of the scans finished at almost exactly the same time. I suspect that this is due in part to the characteristics of the linux anticipatory I/O scheduler. What I think happened in this test is that the scans very slowly synchronized over the course of the scan, and then eventually all finished at once. It would take more testing to determine exactly what's going on, but it's surely much slower than 8.3.