Based on what Joel from Microsoft presented at TechEd 2005, I'm revising my general estimate that your index will be 40% the size of your corpus. Microsoft has found that your index will be closer to 1 - 10% of your corpus. This is because of the following factors:
1. Only the first 16 MB of a file (by default) is indexed
2. Similar document types require less index space than dissimilar document types (and content)
3. In many cases, much of your corpus will be non-indexable content, such as graphics, video and/or audio files
4. This is the experience of most who are using search and indexing robustly.
So, I would like to hear from you. Please take a moment, estimate the size of information you're indexing, then give us the size of your indexes. Let's see if this 1-10% figure is more accurate than the 40% figure.
Bill English, Mindsharp