Subject: |
Re: Tech-talk archives excluded from indexing? |
From: |
Ned Arnold <[email protected]> |
To: |
EPICS tech-talk <[email protected]> |
Date: |
Wed, 28 Jul 2010 12:45:19 -0500 |
Does that mean I have to start censoring my replies? or even go back and edit some of my unguarded remarks?
Ned
On Jul 28, 2010, at 12:24 PM, Andrew Johnson wrote:
> Hi Angus,
>
> On Tuesday 27 July 2010 23:18:49 Angus Gratton wrote:
>> When I looked in the /robots.txt file I saw that tech-talk is explicitly
>> excluded from crawler indexing:
>>
>> Disallow: /epics/core-talk
>> Disallow: /epics/mantis
>> Disallow: /epics/tech-talk
>> Disallow: /epics/wiki
>>
>> I'm wondering why this is, and if it can possibly be undone?
>
> It dates back to the days when spam filters were much less effective than they
> are nowadays; I was trying to be kind to the users of tech-talk by keeping
> their email addresses from being scraped off the website. I admit an email
> harvester might ignore the robots.txt file if they are doing their own
> crawling, but at least this prevents them from finding addresses via regular
> search engines. Maybe this doesn't matter as much nowadays, would anyone like
> to comment?
>
> The Mhonarc software that I use to maintain the archive generates mailto: URLs
> in its HTML output wherever it sees an email address in the message header or
> body. Elsewhere on the EPICS site I replace the '@' sign a mailto: URLs with
> '_at_' and I've worked out how to get Mhonarc to do that too, but it doesn't
> provide an obvious way to rewrite the message text that the URL decorates,
> which will still contain the complete email address. Hopefully just changing
> the mailto: is sufficient to deter the spammers, or this is not necessary any
> more.
>
> I have reprocessed all the tech-talk and core-talk archive files to rewrite
> the mailto: URLs as described above; if there are no complaints I will open up
> the tech-talk and core-talk archives to the web crawlers in a day or two.
>
> Mantis has been replaced by the bug tracker at Launchpad.net; I'm trying to
> get the Wiki replaced with a newer version that I don't have to manage, so
> that is likely to move at some point and I'd rather not have Google hit it
> until then.
>
> HTH,
>
> - Andrew
> --
> The best FOSS code is written to be read by other humans -- Harald Welte
>
- Replies:
- Re: Tech-talk archives excluded from indexing? Angus Gratton
- References:
- Tech-talk archives excluded from indexing? Angus Gratton
- Re: Tech-talk archives excluded from indexing? Andrew Johnson
- Navigate by Date:
- Prev:
Re: Developing of record / driver / device support? Asyn? Eric Norum
- Next:
Re: HTML Device Driver emmanuel_mayssat
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
<2010>
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Tech-talk archives excluded from indexing? Andrew Johnson
- Next:
Re: Tech-talk archives excluded from indexing? Angus Gratton
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
<2010>
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|