In my blog post on my new sp_helpindex proc (sp_helpindex2</a>), I mentioned that the indexes in my sample were not necessarily a recommended set of indexes – just a test set of indexes. So… in this post, I thought I'd start a series on indexes, limitations and best practices/uses… Especially, why/how to best choose when to use INCLUDE v. having columns in your key. To start, I thought I'd give some background, as well as limitations that exist in various releases from 2005 to 2008 CTP6 (Feb CTP), plus what's expected </em>in the SQL Server 2008 RTM (ah… I did say "expected" so don't come back and yell at me if/when I'm wrong </em>;-))
First, let's go through a few rules and limitations and background:
SQL Server 2005</strong>
- 250 total indexes per table: 1 clustered index and up to 249 nonclustered indexes (no, this is not a goal!)</li>
- The index key can be up to 16 columns OR 900 bytes – whichever comes first…</li>
- The leaf level is limited just as a table is limited to 1024 columns (and, all column types are acceptable in the leaf level of an index – even LOB columns)</li>
- Statistics are kept for every index (so, up to 250 index-related stats) and there can also be statistics on columns or sets of columns in addition to the index-related stats. In earlier releases, statistics used index ids and as a result, the number of statistics were limited to 250 total minus the statistics used by indexes… in SQL Server 2005, they changed to having statistics kept/managed separately (use sys.stats to see them). As a result of using sys.stats, you can now have 2000 statistics on a table, in addition to the 250 (total) indexes and their statistics. If you want to test this out (and check it on various versions of SQL Server), use this script to setup a test database, a test table and then use dynamic string execution to loop through (until it errors) with creating nonclustered indexes and statistics: IndexMax.sql (853.00 bytes)</a>.</li>
SQL Server 2008 CTP6</strong>
- So far, it seems as though most of the maximums have not yet been lifted…</li>
- 250 total indexes per table: 1 clustered index and up to 249 nonclustered indexes (and this number – for CTP6 – includes filtered indexes AND spatial indexes too!)</li>
- The index key limit hasn't changed (it can be up to 16 columns OR 900 bytes – whichever comes first)</li>
- The leaf level is still limited just as a table is limited to 1024 columns (and, all column types are still acceptable in the leaf level of an index)</li>
- Statistics in CTP6 seem to be limited to only 2000 stats per table…</li>
SQL Server 2008 RTM </strong>(expected/target… no guarantees on this one</em> :)
- 30,000 columns per table (mostly to allow sparse columns)</li>
- 1,000 total indexes per table: 1 clustered index and up to 999 nonclustered indexes. This is also not a goal BUT, it makes sense because of both sparse columns and filtered indexes. Both Paul and I will try to post some entries about sparse columns and filtered indexes in the coming days…</li>
- The index key limit won't change</li>
- The leaf level is will be limited just as a table is limited to 30,000 columns (and, all column types are still acceptable in the leaf level of an index)</li>
- Statistics are also said to be increasing and likely to 30,000… And, for having extra statistics just sitting around and possibly not being used – well, outside of a minimal amount of disk space taken by the stat blob (which does start to get interesting at 1,000s I suppose), even stats that don't get used don't really create much of a problem. So, I'm OK with this one increasing – even significantly – but I have to admit I'm somewhat nervous about the significan't increase in indexes………</li>
So… you can have A LOT more indexes in SQL Server 2008 but just because you can – DOES it mean that you should?!
And on that – I'll leave you hanging for my next post where I start to talk about WHY they're increasing this (hint: sparse columns and filtered indexes = more columns/more indexes)….
OK, I first posted on some of the limitations to indexes in SQL Server 2005 and 2008 in part one here</a>. Now, I want to dive into index internals for a post (or two). And, I often get the question “who is the best audience for your blog – or, for this post</em>” and well, that’s a bit hard to answer. At SQL Connections in Orlando</a>, I delivered a session titled: Index Internals & Usage</em></strong> and while we (fyi – Paul</a> and I co-chair the SQL Connections portion of “DevConnections</a>”) put it in the “developer-focused track,” it was more of a Dev/DBA “hybrid” session with the emphasis on </em>database development and best practices in creating and managing indexes (rather than management/maintenace/operational tuning – which is more for DBAs). Here at TechEd this week, I’m going to focus more on the management/maintenace/operational tuning side with a session called Are your Indexing Strategies Working?</strong></em> I’ll also do a complementary blog post for that as well…
Having said that thought, indexes are definitely in a group of topics – very much so related to performance and scalability (index internals, indexing strategies, log maintenance, general database maintenance) which really needs to cross almost all database-related disciplines (dev, admin, ops, etc…). If you work with SQL Server in almost any capacity, you need to get a feel for at least some aspect of indexing for performance</em>. </strong>
So, for this post, I’m continuing with some internals. In the first post (in this series), I wrote about limits. Limits/boundaries are interesting to discuss but it’s also important to remember that good performance takes a lot more than just staying within the bounds of what’s possible. Creating indexes solely because you can – without reason and only with upper limits in mind – can be even worse than under indexing. So, if you find that you’re wanting more about indexes (I have many blog posts that are solely Q&A posts), check out my Indexing category here</a>. Now that you know how many indexes you can create, a better question would be when is it appropriate to create indexes at all?
So, what is “finding the right balance” in indexing? In my opinion, there are three requirements/pre-requisites:
- knowing the data</li>
- knowing how the users use the data</li>
knowing how the underlying structures and database stores/manipulates and uses indexes</li>
Bringing all of these things together is what I try to do in my workshops, seminars and lectures – in this post, I’ll start with a smaller more digestible piece – internals.
Indexes have 2 components: a leaf level and a non-leaf level (or b-tree). The non-leaf level is interesting to understand and discuss (in terms of internals) but simply put, it’s used for navigation to the leaf level (more than anything else). So, we’ll start with the leaf level (as does SQL Server – the leaf level is always built first). The leaf level of an index contains something</em></strong> (I’ll explain more coming up) for every row of the table in indexed order (note:</strong> I am focusing on traditional indexes in every release from SQL Server 2000 up to and including SQL Server 2008 – with the exception of filtered indexes which I will write about in a later post). Once the leaf level is built, non-leaf level(s) can be built to help navigate to the leaf level but the architecture is rather straightforward. The non-leaf level stores something</em></strong> for every page of the level below – and levels are added (each smaller than the previous because each level only contains one the first entry from every page) until the index gets to a root of one page. While it sounds like this could result in a lot of levels (ie. a tall tree), the limitation on the size of the key (which has a maximum of 900 bytes or 16 columns) helps to keep index trees relatively small. In fact, in the example I’ll show coming up – which has a fairly large (large meaning WIDE) index and has a key definition which is at the maximum size – even the tree size of this example index (at the time the index is created) is only 8 levels high/deep…
To see this tree (and the math used to create it – which is the same thing that SQL Server would go through to create it), we’ll use an example where the leaf level of the index contains 1,000,000 “rows.” I put quotes around “rows” because I don’t want to imply that these have to be data rows – these are really just leaf level rows and I’ll explain more on what leaf level rows can be… The leaf level rows are 4,000 bytes per row (therefore only 2 rows per page) or 500,000 pages. This is not ideal but at least the pages are almost full and we’re not wasting a lot of space – if we had two 3000 byte rows we’d still only fit 2 per page and then we’d have 2,000 bytes of wasted space. Now, as for why these are just “rows” and not specifically data rows is because this leaf level could</em> be the leaf level for a clustered index (therefore data rows) OR</em></strong> these leaf level rows could be rows in a non-clustered index that uses INCLUDE (which was new to SQL Server 2005) to add non-key columns to the leaf level of the index (which therefore creates wider leaf rows (wider than the 900 bytes or 16 column maximum). Again, while this doesn’t currently sound interesting, I’ll explain why this can be beneficial coming up (possibly in another post depending on how long this particular post becomes… J).
The leaf level of this index would result in a 4 GB structure (and this is only at the time it’s created – if a lot of rows are added and the key is not ever increasing then this structure could become heavily fragmented and therefore much larger/taller). In this case, it’s relatively large (again because of “row” width) and with an index key of 900 bytes you can even see that in this case, the tree would be relatively</em> small and only result in 8 levels – as shown below.
Root page of non-leaf level (Level 7) = </strong>2 rows = 1 page
Intermediate non-leaf level (Level 6) = </strong>15 rows = 2 pages (8 rows per page at 900 bytes)
Intermediate non-leaf level (Level 5) = </strong>122 rows = 15 pages (8 rows per page at 900 bytes)
Intermediate non-leaf level (Level 4) = </strong>977 rows = 122 pages (8 rows per page at 900 bytes)
Intermediate non-leaf level (Level 3) = </strong>7,813 rows = 977 pages (8 rows per page at 900 bytes)
Intermediate non-leaf level (Level 2) = </strong>62,500 rows = 7,813 pages (8 rows per page at 900 bytes)
Intermediate non-leaf level (Level 1) = </strong>500,000 rows = 62,500 pages (8 rows per page at 900 bytes)
Leaf level (Level 0) = </strong>1,000,000 rows = 500,000 pages (2 rows per page)
Having said that though, this is NOT a goal. :) In more realistic scenarios [where the key is much smaller and] even when there are more rows, there are fewer levels (3-4 is quite normal). Most importantly, the size of an index (and the number of levels) depends on two things – the width of the key (in terms of the number of bytes) and the number of pages in the leaf level of the indexes. The number of pages in the leaf level of an index depends on the number of rows and the size of the rows (again, in terms of bytes) of the rows in the leaf level.
You can see the size of your index by using one of the following commands:
In SQL Server 2000: DBCC SHOWCONTIG … WITH ALL_LEVELS
In SQL Server 2005/2008: querying the dmv: sys.dm_db_index_physical_levels
To see the syntax of these commands and their output, we’ll use some structures created in the credit sample database. Using credit, you can see exactly how these commands work and how they return the details about every level.
NOTE:</strong> you can download a zip of a SQL Server 2000 backup of this database here</a> – and since this is a SQL Server 2000 backup, you can restore this to SQL Server 2000, SQL Server 2005 or SQL Server 2008.