Lucene
|
Lucene is an open source search engine library released by the Apache Software Foundation. It is written in Java and is released under the Apache Software License.
Lucene is just the core of a search engine. As such, it does not include things like a web spider or parsers for different document formats. Instead these things need to be added by a developer who uses Lucene.
Lucene does not care about the source of the data, its format, or even its language, as long as you can convert it to text. This means you can use Lucene to index and search data stored in files: web pages on remote web servers, documents stored in local file systems, simple text files, Microsoft Word documents, HTML or PDF files, or any other format from which you can extract textual information.
Software using Lucene
- Beagle uses a port of Lucene to C# called DotLucene (http://www.dotlucene.net/) as indexer.
- Nutch is a complete search engine implementation that utilises Lucene.
- Red-Piranha (http://red-piranha.sourceforge.net) is a another Lucene based search engine. It is ready to use, deployable as a GUI, Command Line or Tomcat web application and has the ability to 'learn' what the user wants.
- Wikipedia uses Lucene for full-text search.
A more extensive list of software that uses Lucene is in the PoweredBy (http://wiki.apache.org/jakarta-lucene/PoweredBy) page of Lucene's wiki.
Ports
Lucene has been ported or is in the process of being ported to various programming languages other than Java:
- CLucene (http://sourceforge.net/projects/clucene/) - C++
- NLucene (http://sourceforge.net/projects/nlucene/) - .NET
- DotLucene (http://www.dotlucene.net/) - .NET
- pylucene (http://pylucene.osafoundation.org/) - Python
- Plucene (http://search.cpan.org/dist/Plucene/) - Perl
- RubyLucene (http://rubyforge.org/projects/rubylucene/) - Ruby
External links
- Lucene homepage (http://lucene.apache.org)