View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000421 | XMB1 | Research Tasks | public | 2011-01-14 16:52 | 2011-01-15 14:53 |
Reporter | miqrogroove | Assigned To | |||
Priority | low | Severity | minor | Reproducibility | N/A |
Status | new | Resolution | open | ||
Summary | 0000421: LIMIT start, ppp Does Not Scale Well | ||||
Description | During optimization testing on Exhale, I found that the LIMIT command runs about 10 times faster using an index instead of the table data. Thus, we now have two queries instead of one when we need to query paged information. The remaining problem is that this wont be helpful for a forum that grows 10 times larger than Exhale. If a forum had on the order of 10^6 threads or if a thread had 10^6 replies, then even the index scan would become rather onerous for the sake of locating page boundaries. It may be worth researching alternative strategies for paging information with greater magnitude. It may not be compatible with XMB 1.9.11, but it would be nice to know what direction these optimizations will take in the future. Partitioning? Stats tables? Views? | ||||
Tags | No tags attached. | ||||
MySQL Version | |||||
PHP Version | |||||
Web Server | |||||
Browser | |||||
Flags | |||||
Original Reporter | |||||
SVN Revision | |||||
Git Commit | |||||
related to | 0000416 | closed | miqrogroove | Threads Table Optimization |
|
Here is a post confirming the behavior, but does not really offer an alternative strategy for locating an offset. http://forums.mysql.com/read.php?24,112440,112461 |
|
This post has a nice discussion, basically concluding that every situation requires a different strategy. http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/ |
|
This one by Yahoo basically says don't do what we are trying to do :) http://www.scribd.com/doc/14683263/Efficient-Pagination-Using-MySQL |
|
My conclusion so far is that we would have to maintain stats tables for key distribution data. It seems like that shouldn't be necessary, but here's how it would work: Without a specific WHERE condition, the best index scan rate I've seen is 300,000 rows in 0.1 sec. Let's assume a fixed time of 0.0000003 per row. SELECT dateline, pid FROM xmb_posts WHERE tid=288108 ORDER BY dateline ASC, pid ASC LIMIT 300000, 1 If we have a table identifying a key value (date) for each 1,000 rows in the table, then we could query that table to find any record within a range of 1,000 records. In theory, MySQL would be able run the resulting WHERE condition in just 0.0003 sec regardless of how many rows are in the table. This is essentially an index on an index. The above query returned dateline = 1238176252 so what would happen if... SELECT dateline, pid FROM xmb_posts WHERE tid=288108 AND dateline > 1238176252 ORDER BY dateline ASC, pid ASC LIMIT 1000, 1 Actual run time was 0.0008 :D |
Date Modified | Username | Field | Change |
---|---|---|---|
2011-01-14 16:52 | miqrogroove | New Issue | |
2011-01-14 16:53 | miqrogroove | Relationship added | related to 0000416 |
2011-01-14 16:55 | miqrogroove | Description Updated | |
2011-01-15 13:06 | miqrogroove | Note Added: 0000283 | |
2011-01-15 13:22 | miqrogroove | Note Added: 0000284 | |
2011-01-15 13:45 | miqrogroove | Note Added: 0000285 | |
2011-01-15 14:53 | miqrogroove | Note Added: 0000286 |