View Issue Details

IDProjectCategoryView StatusLast Update
0000421XMB1Research Taskspublic2011-01-15 14:53
Reportermiqrogroove Assigned To 
PrioritylowSeverityminorReproducibilityN/A
Status newResolutionopen 
Summary0000421: LIMIT start, ppp Does Not Scale Well
DescriptionDuring optimization testing on Exhale, I found that the LIMIT command runs about 10 times faster using an index instead of the table data. Thus, we now have two queries instead of one when we need to query paged information.

The remaining problem is that this wont be helpful for a forum that grows 10 times larger than Exhale. If a forum had on the order of 10^6 threads or if a thread had 10^6 replies, then even the index scan would become rather onerous for the sake of locating page boundaries.

It may be worth researching alternative strategies for paging information with greater magnitude. It may not be compatible with XMB 1.9.11, but it would be nice to know what direction these optimizations will take in the future. Partitioning? Stats tables? Views?
TagsNo tags attached.
MySQL Version
PHP Version
Web Server
Browser
Flags
Original Reporter
SVN Revision

Relationships

related to 0000416 closedmiqrogroove Threads Table Optimization 

Activities

miqrogroove

2011-01-15 13:06

administrator   ~0000283

Here is a post confirming the behavior, but does not really offer an alternative strategy for locating an offset.

http://forums.mysql.com/read.php?24,112440,112461

miqrogroove

2011-01-15 13:22

administrator   ~0000284

This post has a nice discussion, basically concluding that every situation requires a different strategy.

http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/

miqrogroove

2011-01-15 13:45

administrator   ~0000285

This one by Yahoo basically says don't do what we are trying to do :)

http://www.scribd.com/doc/14683263/Efficient-Pagination-Using-MySQL

miqrogroove

2011-01-15 14:53

administrator   ~0000286

My conclusion so far is that we would have to maintain stats tables for key distribution data. It seems like that shouldn't be necessary, but here's how it would work:

Without a specific WHERE condition, the best index scan rate I've seen is 300,000 rows in 0.1 sec. Let's assume a fixed time of 0.0000003 per row.

SELECT dateline, pid
FROM xmb_posts
WHERE tid=288108
ORDER BY dateline ASC, pid ASC
LIMIT 300000, 1

If we have a table identifying a key value (date) for each 1,000 rows in the table, then we could query that table to find any record within a range of 1,000 records. In theory, MySQL would be able run the resulting WHERE condition in just 0.0003 sec regardless of how many rows are in the table. This is essentially an index on an index.

The above query returned dateline = 1238176252 so what would happen if...

SELECT dateline, pid
FROM xmb_posts
WHERE tid=288108 AND dateline > 1238176252
ORDER BY dateline ASC, pid ASC
LIMIT 1000, 1

Actual run time was 0.0008 :D

Issue History

Date Modified Username Field Change
2011-01-14 16:52 miqrogroove New Issue
2011-01-14 16:53 miqrogroove Relationship added related to 0000416
2011-01-14 16:55 miqrogroove Description Updated
2011-01-15 13:06 miqrogroove Note Added: 0000283
2011-01-15 13:22 miqrogroove Note Added: 0000284
2011-01-15 13:45 miqrogroove Note Added: 0000285
2011-01-15 14:53 miqrogroove Note Added: 0000286