SharePoint Cafe

All MindsharpBlogs

My Links

Article Categories

Archives

Blog Stats

2007 IT Pro Resources

Mindsharp Instructors

Mindsharp Training

Site Path Rule Anomoly?

I've been building the number and complexity of our content sources for our internal portal.  For at least half of the web sites we crawl, I need to create one or more site path rules.  That's fine.  Site path rules give us a way to ensure that we are crawling just that portion of the content that we need to crawl from the source.  But I noticed an anomoly and wanted to see if anyone else has experienced this or if anyone has some advice/ideas:

Here's the scenario:

I crawl a regular web site, say FOO, at www.foo.com.  But I don't want the entire web site, I just want a subsection of the site, we'll call it products.  Hence, all I want is www.foo.com/products, but all the other parts of the site I don't want.  Now, there are links on the products page that take me to other parts of the site, so it is best that I use site path rules to limit the crawler to just that site, as follows:

www.foo.com                                                    include
www.foo.com/products/product.html                  include
www.foo.com/products/product.html/*               include
www.foo.com/*                                                 exclude

Now, this works, as far as I can tell.  When I enter queries, I receive back that content only in the products portion of the foo web site.

Here's the part I don't get:  When I did the exact same thing to a page that had .aspx pages, the site path rules didn't seem to “kick in“ or “work“ as I expected.

I need to test this some more, but has anyone else seen anything like this?

 

posted on Sunday, March 19, 2006 7:17 PM

Feedback

# çizgi film 8/22/2008 3:37 AM Çizgi Film

very good

# film izle 8/22/2008 3:37 AM film izle

very good

# gelinlikler 8/22/2008 3:38 AM Gelinlikler

very good

# masaüstü resimleri 8/22/2008 3:38 AM masaüstü resimleri

very good

# mercedes yedek parçaları 8/22/2008 3:38 AM Mercedes Yedek Parçaları

very good

# autocad kursu 8/22/2008 3:38 AM autocad kursu

very good

# müzik dinle 8/22/2008 3:38 AM müzik dinle

very good

# Bay 8/22/2008 3:39 AM Havuz

very good

# yemek tarifleri 8/22/2008 3:39 AM yemek tarifleri

very good

# Bay 8/22/2008 3:39 AM havuz

very good

# re: Site Path Rule Anomoly? 8/22/2008 3:39 AM gaziosmanpaşa

very good

# re: Site Path Rule Anomoly? 8/22/2008 3:40 AM ilahi dinle

very good

Title  
Name  
Url
CAPTCHA
Protected by Clearscreen.SharpHIPEnter the code you see:
Comments