EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Tech-talk archives excluded from indexing?
From: Ned Arnold <[email protected]>
To: EPICS tech-talk <[email protected]>
Date: Wed, 28 Jul 2010 12:45:19 -0500
Does that mean I have to start censoring my replies?  or even go back and edit some of my unguarded remarks?

   Ned



On Jul 28, 2010, at 12:24 PM, Andrew Johnson wrote:

> Hi Angus,
> 
> On Tuesday 27 July 2010 23:18:49 Angus Gratton wrote:
>> When I looked in the /robots.txt file I saw that tech-talk is explicitly
>> excluded from crawler indexing:
>> 
>> Disallow: /epics/core-talk
>> Disallow: /epics/mantis
>> Disallow: /epics/tech-talk
>> Disallow: /epics/wiki
>> 
>> I'm wondering why this is, and if it can possibly be undone?
> 
> It dates back to the days when spam filters were much less effective than they 
> are nowadays; I was trying to be kind to the users of tech-talk by keeping 
> their email addresses from being scraped off the website.  I admit an email 
> harvester might ignore the robots.txt file if they are doing their own 
> crawling, but at least this prevents them from finding addresses via regular 
> search engines.  Maybe this doesn't matter as much nowadays, would anyone like 
> to comment?
> 
> The Mhonarc software that I use to maintain the archive generates mailto: URLs 
> in its HTML output wherever it sees an email address in the message header or 
> body.  Elsewhere on the EPICS site I replace the '@' sign a mailto: URLs with 
> '_at_' and I've worked out how to get Mhonarc to do that too, but it doesn't 
> provide an obvious way to rewrite the message text that the URL decorates, 
> which will still contain the complete email address.  Hopefully just changing 
> the mailto: is sufficient to deter the spammers, or this is not necessary any 
> more.
> 
> I have reprocessed all the tech-talk and core-talk archive files to rewrite 
> the mailto: URLs as described above; if there are no complaints I will open up 
> the tech-talk and core-talk archives to the web crawlers in a day or two.
> 
> Mantis has been replaced by the bug tracker at Launchpad.net; I'm trying to 
> get the Wiki replaced with a newer version that I don't have to manage, so 
> that is likely to move at some point and I'd rather not have Google hit it 
> until then.
> 
> HTH,
> 
> - Andrew
> -- 
> The best FOSS code is written to be read by other humans -- Harald Welte
> 



Replies:
Re: Tech-talk archives excluded from indexing? Angus Gratton
References:
Tech-talk archives excluded from indexing? Angus Gratton
Re: Tech-talk archives excluded from indexing? Andrew Johnson

Navigate by Date:
Prev: Re: Developing of record / driver / device support? Asyn? Eric Norum
Next: Re: HTML Device Driver emmanuel_mayssat
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Tech-talk archives excluded from indexing? Andrew Johnson
Next: Re: Tech-talk archives excluded from indexing? Angus Gratton
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·