Loading...
 
Features / Usability

Features / Usability


Adding <meta name="robots" content="noindex,nofollow"> to History pages

posts: 3

We're using tikiwiki 1.8.1, Polaris, on an intranet that is indexed by a third-party search appliance. If we have the search engine index the tikiwiki (so results will show up when you do a search of the whole intranet), a huge number of pages are added because every version of every page as well as diffs between versions are included. To prevent this, we would like to add <meta name="robots" content="noindex,nofollow"> to the <head/> of certain pages, especially the "History of" pages. Is there an easy way to do that or could someone provide some guidance on where to start hacking?

Thanks,
David

posts: 2881 United Kingdom

in header.tpl you can check the page url for tiki-page_history and display that line on it.

You should really upgrade to 1.8.4 to get the security fixes before your tiki is hacked or killed. Also the newer tiki's use a robots.txt file for better control of search engines.


--
Damian
TikiWiki Development/Support/Hosting Services ** Christmas Specials Now Available **



posts: 3

Well, we're still having problems with this. It looks like google (and our google appliance) aren respecting the robots.txt file. For example, you exclude tiki-pagehistory.php in http://tikiwiki.org/robots.txt, but page history pages still appear in http://www.google.com/search?q=site:tikiwiki.org&hl=xx-elmer&lr=&start=130&sa=N We also hacked in &lt;meta name="robots" content="noindex,nofollow"> to the history pages, but there are still a lot of other little pages that cause our google appliance to get bogged down (so the IT guy won't let us add our tiki to the index), even tho there are a modest number of pages of real content.

One solution would be to put &lt;meta name="robots" content="index,nofollow"> on EVERY page and then have a single page listing links to all pages for the spider to hit. The complete page list of all pages that should be indexed (i.e. all pages having content) would have &lt;meta name="robots" content="noindex,follow">.

I think it would be easy enough for us to hack onto all pages. Is there an easy way to create a list of pages that we could point our spider at?

Thanks,
David


Upcoming Events

1)  18 Apr 2024 14:00 GMT-0000
Tiki Roundtable Meeting
2)  16 May 2024 14:00 GMT-0000
Tiki Roundtable Meeting
3)  20 Jun 2024 14:00 GMT-0000
Tiki Roundtable Meeting
4)  18 Jul 2024 14:00 GMT-0000
Tiki Roundtable Meeting
5)  15 Aug 2024 14:00 GMT-0000
Tiki Roundtable Meeting
6)  19 Sep 2024 14:00 GMT-0000
Tiki Roundtable Meeting
7) 
Tiki birthday
8)  17 Oct 2024 14:00 GMT-0000
Tiki Roundtable Meeting
9)  21 Nov 2024 14:00 GMT-0000
Tiki Roundtable Meeting
10)  19 Dec 2024 14:00 GMT-0000
Tiki Roundtable Meeting