Ask a question. Quick access. Search related threads. Remove From My Forums. Asked by:. Archived Forums. Each and every time a select is sent to SQL Server, the engine determines if the information is available in RAM or if it has to be retrieved from the Disk.
Rares — this reads less like a comment, and more like a blog post. Great post! In that you say that Rebuild Indexes also updates statistics with full scans. Our plan at the moment runs the tasks reorganize indexes, rebuild indexes and update statistics with full scans in that order.
Yes huge transaction log but why does the update statistics job take longer? If the rebuild indexes task does the same thing and also rebuilds the indexes I thought that would take longer. Take longer than what? Task end: T If the rebuild indexes task also updates statistics with full scans I would have expected the rebuild index task to take longer than the update statistics task.
Kevin — yep, something seems odd there. Thanks Brent. This was setup by someone else but I beleive it was using the built in maintenance plan wizard.
The update statistics task updates all stats, which includes stats on columns that are not indexed. A specific case that I have seen before is if you have BLOB columns with statistics on them, they can take in the order of hours to update the stats in a table with a few million rows. If you want to update statistics as part of your regular maintenance plan, there is a catch you should be aware of. Index rebuilds automatically update statistics with a full scan.
This can happen if a sampled scan from the manual update overwrites the full scan generated by the index rebuild. On the other hand, reorganizing an index does not update statistics at all. Just priced out an HP DL for a client. Even if you have gobs of memory, it can still be wasteful to cache horribly fragmented indexes there. On the other hand, pulling it out of buffer to defragment it is quite wasteful also. Not all fragmentation processes are created equal. On top of that, we have rules based on how much of an index is in memory — e.
Finally, we allow concurrent operations, so if you have a small window for index maintenance, you can hammer the crap out of multiple indexes at once instead of doing one at a time and extending your window.
All of this beats stock maintenance plans hands down, and allows you to only care about the process as much as you want to. I also do a lot of checks and apply some business rules to decide what objects should be maintained.
This seem like a promising adding to my maintenance solution. Sorry Michael, I spoke out of turn there. You could also factor in index usage stats — if you have snapshots of this over time you could see if there are patterns with a particular index, like used heavily weekday mornings or at month end or on Saturdays, then you can make more educated decisions about when is the optimal time to rebuild or reorg.
We had talked about that feature enough that I thought it had already been implemented. Our Data Warehouse DBs are a different story…. Hi Brent thanks for your post.
It will go for a table scan. That table, now has to be read into mem. Regarding the logical fragmentation. A lot of the time, page splits are a root cause. Page splits by itself cause a magnitue more translog entries. The cause of that logical fragmentation by itself is something to worry about. Edward — thanks for the note. That actually makes page splits worse due to updates. An exception would be highly used nc indexes.
See my previous comment. You say, that the overhead of pagesplits is small compared to a full blown rebuild. It depends on your load and on your schema design but lets say it is, even when the complete db is in mem, all that extra log overhead still has to be written to disk. And depending on the load, that can be very large overhead.
The rebuilds could be minimal logged. And what about when your DB is closer to 1TB on disk and still working really well sub second queries with only 1. And a MySQL defrag?
Have you blogged about your database setup? Do you have an article on the subject or any proof that what you say is actually in place and running the way you say? And, no… not trying to be a smarty here. I agree that index fragmentation may not deserve as much attention as it sometimes gets. Improved query elapsed time after rebuilding indexes may be from the defragmentation, but may more likely be due to updated stats or the subsequent recompile of query plans dependent on those stats.
However, its simply not true that the SAN is performing random hard drive reads to retrieve data that the database and filesystem consider sequential. There are efficiencies in retrieving as much data as possible in as few transfers as possible. But the SAN admin will notice if the system is busy enough.
And other tenants of the SAN could notice as well. On SSDs, the benefits of contiguity only really extend to the per drive stripe unit size, because each stripe unit is randomly placed within the drive regardless of its positioning within the filesystem or database.
There is still a benefit in retrieving as few stripe units as possible — less of a chance of saturation between the storage controller and the disks. On a very busy SQL server, reasonably defragged large high use indexes and possibly the -E adjustment of proportional fill can make io performance more consistent for SQL server, and make SQL Server a better neighbor.
Saying that sequential access is all basically random anyway — that completely disregards what SAN administrators spend lots of their time planning for, and correcting. Performance disks spin at 15k rpm or 10k. But the heads move more slowly. So go ahead and switch from disk to disk fast. But incur as little head movement as possible.
The disk head weaves from the location for one sequential read track to the location for the other sequential read track — on each and every disk that the data is striped over. In cases of IO weaving, sequential access performance can be WORSE than random read access, depending on the distance the head must travel between the two sequential read locations. But IO weaving results in trouble for them, too. But the read cache accelerator cards they used to call them PAM still benefit greatly from contiguity of access across the disks.
Although there is a conceptual 4k stripe unit, writes occur in 64k chunks per disk. Its an under-the-covers optimization. Assume that an index rebuild in a single database file database is the only current activity. Eighteen 64k writes will take place. Each 64k write is made up of 16 4k theoretical stripe units. If you take the parity stripe units out of the mental picture, each write would have 4k of each of the 16 database extents. The more fragmentation in the files presented from WAFL, the more inode updates have to take place.
VMAX virtual pools use k extent sizes. The k extent size is the minimal size for promotion in the Symmetrix auto tiering. The rules for optimizing database performance on each are slightly different.
But in no case can one simply expect the sequential reads to be randomized and always achieve the benefits of sequential IO. This is a good source for VNX implementation details. This can cause even more unpredictable behavior under the circumstances outlined above, but that should be something to be aware of when using Thin provisioning in general.
Even then, if it grows considerably and the underlying disks are shared, the disk head movement when thin LUN contents from other applications are interleaved can be a performance killer. In that type of environment, buying RAM is a lot cheaper than hiring more SAN admins and sending them to training, and dedicating their time to managing random vs sequential access.
Enjoy, and thanks for the comment. You are misunderstanding me slightly. Not sure what they are like now but 4 years ago I suffered through migrating a 1. Worst 4 months ever. So when a file is stored on disk in a non-contiguous manner on a disk subsystem, the file is considered to be physically fragmented.
This is because the heads on disk drive have to randomly jump around to different physical locations on the disk to find all of the data, instead of reading and writing the data sequentially, as when the data is contiguous. Physical file fragmentation can be slight, say when a single file is divided into only a handful of locations on a subsystem. In other cases, a single physical file might be divided into thousands of fragments, resulting in heavy physical file fragmentation.
The greater the physical file fragmentation, the harder disks have to work, and the greater overhead that is incurred. Another factor that can affect how SQL Server and the disk subsystem interact is if there is a large cache sitting between SQL Server and this subsystem, or if the disk subsystem in made up of SSD or similar drives.
Both of these can significantly mask the negative effects of physical file fragmentation, although it is not eliminated. So what does all of this mean to DBAs? As a DBA, I always assume the worse and plan for it. If there is, then I use defrag. Then, I pre-size my database files to a size that I think will suffice for the next year or so.
For example, if I estimate that my MDF file will be GB in size in the next 12 months or so, then that is the size I create the database now. The same with the other database files. In other words, by creating a large file on a disk array that is defragmented, the file is created contiguously and I can prevent physical disk fragmentation from occurring in the first place.
Now, you may have two questions. First, you may be wondering about how you go about defragging a disk array that is currently being used with production databases. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more.
Asked 10 years, 6 months ago. Active 10 years, 6 months ago. Viewed 3k times. I do not know which option is the best or if it will even work. Thanks for your help. Improve this question. Josh Bond Josh Bond 4 4 silver badges 9 9 bronze badges.
Add a comment.
0コメント