2022年Lucene总体图_建索引_查询_数据库索引 .pdf-得力文库

资源描述

《2022年Lucene总体图_建索引_查询_数据库索引 .pdf》由会员分享，可在线阅读，更多相关《2022年Lucene总体图_建索引_查询_数据库索引 .pdf（7页珍藏版）》请在得力文库 - 分享文档赚钱的网站上搜索。

1、总体图3.0 版本的结构和之前的版本（2.9 之前）相比，在程序结构上表现出来就只是多了一个message 包，用来专门处理国际化。见上图，可以看到，3.0 和之前的版本一样还是由对外接口、索引核心以及基础结构封装三大部分共八个模块（也即包package），详细介绍详见附件一。我们从上图也可以看到Lucene 搜索时的调用关系：当我们要查询一个词时，在查询模块（search）会先调用语法分析器（queryParser）对查询语句进行分析，语法分析模块调用了词法分析器（ analysis）进行词法分析，如对搜索关键字分词、过滤等，词法分析器在使用时会根据实际情况调用国际化模块（ message

2、）进行一些国际化的处理。当这些前置工作做完之后，才真正进入到搜索核心，首先会调用索引模块（index）,它负责向底层的存储类（store）去读取索引文件里面的数据，然后返回给查询模块。其他模块在整个搜索过程中是作为公共类存在的。附件一、Lucnen3.0 包详细介绍1、analysis Analysis 包含一些内建的分析器，例如按空白字符分词的WhitespaceAnalyzer ，添加了stopwrod 过滤的 StopAnalyzer，最常用的是StandardAnalyzer。2、document Document 包含文档的数据结构，例如Document 类定义了存储文档的数据结构，

3、Field 类定义了 Document 的一个域。3、index Index 包含了索引的读写类，例如对索引文件的segment 进行写、合并、优化的IndexWriter 类和对索引进行读取和删除操作的IndexReader 类，这里要注意的是不要被IndexReader 这个名字误导，以为它是索引文件的读取类，实际上删除索引也是由它完成，IndexWriter 只关心如何将索引写入一个个segment，并将它们合并优化；IndexReader 则关注索引文件中各个文档的组织形式。4、queryParser 名师资料总结 - - -精品资料欢

4、迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 1 页，共 7 页 - - - - - - - - - QueryParser 包含了解析查询语句的类，lucene 的查询语句和sql 语句有点类似，有各种保留字，按照一定的语法可以组成各种查询。Lucene 有很多种Query 类，它们都继承自Query，执行各种特殊的查询，QueryParser 的作用就是解析查询语句，按顺序调用各种Query 类查找出结果。5、search Search 包含了从索引中搜索结果的各种类，例如刚才说的各种Query 类，包括Ter

5、mQuery、BooleanQuery 等就在这个包里。6、store Store 包含了索引的存储类，例如Directory定义了索引文件的存储结构，FSDirectory 为存储在文件中的索引，RAMDirectory 为存储在内存中的索引，MmapDirectory 为使用内存映射的索引。7、util Util 包含一些公共工具类，例如时间和字符串之间的转换工具。8、message 处理国际化的类。- Lucene3.0 学习笔记1（建立索引）默认分类2010-04-25 19:45:52 阅读 136 评论 1 字号：大中小订阅我们首先在d:lucenes 下放置了几个txt 文件作为

6、索引的源。创建d:luceneindex 作为索引文件的存放地址。当然还需要引入lucene3.0 的包。具体步骤简介如下：1、创建 Directory 对象（参数是存放索引的File 类型，根据File 的存放地点选择创建类）2、创建 indexWriter 对象，参数（Directory 对象，分词器，是否创建，分词的最大值）3、获取源文件的File 数组4、通过循环将每个文件写入索引。创建 Document 对象，并创建Field 对象（列名称（文件名、内容等），将Field 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - -

7、 - 名师精心整理 - - - - - - - 第 2 页，共 7 页 - - - - - - - - - 加入到 Dcument 中，通过IndexWriter.addDocument(Document) 写入索引中。 5、关闭 indexWriter 。源码：CODE=java package com.hector.firstlucene; /* * * author Hector * 建立索引lucene3.0 */ import java.io.File; import java.io.FileReader; import java.io.IOException; import jav

8、a.util.Date; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.DateTools; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.Directory; import org

9、.apache.lucene.store.SimpleFSDirectory; import org.apache.lucene.util.Version; public class TextFileIndexer /* * param args * throws IOException */ public static void main(String args) throws IOException / 保存索引文件的地方String indexDir = “ d:index ” ; / 将要搜索TXT文件的地方String dateDir = “ d:s” ; IndexWriter i

10、ndexWriter = null; / 创建Directory 对象，FSDirectory 代表待索引的文件存在磁盘上Directory dir = new SimpleFSDirectory(new File(indexDir); / 创建 IndexWriter 对象 ,第一个参数是Directory,第二个是分词器,第三个表示是否是创建 ,如果为false 为在此基础上面修改,第四表示表示分词的最大值，比如说 new MaxFieldLength(2) ，就表示两个字一分，一般用IndexWriter.MaxFieldLength.LIMITED indexWriter = new

11、IndexWriter(dir,new 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 3 页，共 7 页 - - - - - - - - - StandardAnalyzer(Version.LUCENE_30),true,IndexWriter.MaxFieldLength.U NLIMITED); File files = new File(dateDir).listFiles(); for (int i = 0; i files.length; i+) Document do

12、c = new Document(); / 创建Field 对象，并放入doc 对象中doc.add(new Field(“ contents” , new FileReader(files*i+); doc.add(new Field(“ filename” , files*i+.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED); doc.add(new Field(“ indexDate” ,DateTools.dateToString(new Date(), DateTools.Resolution.DAY),Field.Stor

13、e.YES,Field.Index.NOT_ANALYZED); / 写入IndexWriter indexWriter.addDocument(doc); / 查看IndexWriter 里面有多少个索引System.out.println(“ numDocs” +indexWriter.numDocs(); indexWriter.close(); Lucene3.0 学习笔记2（查询索引）默认分类2010-04-25 19:46:47 阅读 312 评论2 字号：大中小订阅转自 :http:/ d:luceneindex 是上一篇学习笔记（Lucene3.0 学习笔记1（建立索引））中

14、生成的索引文件的存放地址。具体步骤简介如下：1、创建 Directory 对象，索引文件夹2、创建 IndexSearch对象，建立查询（参数是Directory 对象）3、创建 QueryParser 对象（ lucene 版本，查询Field 字段，所用分词器）4、生成 Query 对象，由QueryParser 对象的 parse 函数生成（参数是所查的关键字）5、建立 TopDocs对象（ IndexSearch的 search 函数，参数是Query 查询对象，）6、TopDocs对象数组里存放查询信息7、关闭 IndexSearch 源码：import java.io.File;

15、import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache

16、.lucene.search.Query; 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 4 页，共 7 页 - - - - - - - - - import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.SimpleFSDirect

17、ory; import org.apache.lucene.util.Version; /* 搜索索引Lucene 3.0+ author Hector */ public class Searcher public static void main(String args) throws IOException, ParseException / 保存索引文件的地方String indexDir = “ d:index ” ; Directory dir = new SimpleFSDirectory(new File(indexDir); / 创建IndexSearcher 对象，相比In

18、dexWriter 对象，这个参数就要提供一个索引的目录就行了IndexSearcher indexSearch = new IndexSearcher(dir); / 创建 QueryParser 对象 ,第一个参数表示Lucene的版本 ,第二个表示搜索Field的字段 ,第三个表示搜索使用分词器QueryParser queryParser = new QueryParser(Version.LUCENE_30, “contents ”, new StandardAnalyzer(Version.LUCENE_30); / 生成 Query 对象Query query = queryP

19、arser.parse(“好” ); / 搜索结果TopDocs里面有scoreDocs数组，里面保存着索引值TopDocs hits = indexSearch.search(query, 10); /hits.totalHits表示一共搜到多少个System.out.println( “找到了” +hits.totalHits+ ”个” ); / 循环hits.scoreDocs 数据，并使用indexSearch.doc 方法把Document 还原，再拿出对应的字段的值for (int i = 0; i hits.scoreDocs.length; i+) ScoreDoc sdoc

20、 = hits.scoreDocsi; Document doc = indexSearch.doc(sdoc.doc); System.out.println(doc.get(“ filename” ); indexSearch.close(); Lucene3.0 学习笔记3（给数据库建立索引）Posted by Hector | Posted in 垂直搜索| Posted on 22-03-2010 标签： Lucene 1 给数据库字段建立索引的方法和给文件建立索引的方法类似。（可见这篇文章：）Lucene3.0 学习笔记 1（建立索引）只是需要将待索引的源换为从数据库里面读取的

21、字段值就可以了。代码中用到的数据库操作类在这里：java 通用数据库操作类名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 5 页，共 7 页 - - - - - - - - - 请对照Lucene3.0 学习笔记1（建立索引）加以理解。代码如下：CODE=java package com.hector.firstlucene; import java.io.File; import java.io.IOException; import java.sql.ResultSet; im

22、port java.sql.SQLException; import java.util.Date; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.DateTools; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apa

23、che.lucene.store.Directory; import org.apache.lucene.store.SimpleFSDirectory; import org.apache.lucene.util.Version; /* * author Hector * 建立数据库索引lucene3.0+ */ public class DataBaseIndexer public static void main(String args) throws IOException,SQLException String indexDir = “ d:index ” ; DBConn conn

24、 = new DBConn(); conn.OpenConnection(); ResultSet rs = conn.ExecuteQuery(“ select * from Article” ); / 为表字段建立索引Directory dir = new SimpleFSDirectory(new File(indexDir); IndexWriter indexWriter = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED)

25、; while (rs.next() System.out.println(rs.getString(“ Article_Title” ); Document doc = new Document(); doc.add(new Field(“ Article_Title” , rs.getString(“ Article_Title” ),Field.Store.YES, Field.Index.ANALYZED); doc.add(new Field(“ Article_Content” , rs.getString(“ Article_Content” ),Field.Store.YES,

26、 Field.Index.ANALYZED); doc.add(new Field(“ indexDate” ,DateTools.dateToString(new Date(), DateTools.Resolution.DAY),Field.Store.YES,Field.Index.NOT_ANALYZED); indexWriter.addDocument(doc); 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 6 页，共 7 页 - - - - - - - - - System.out.println(“ numDocs” +indexWriter.numDocs(); indexWriter.close(); /CODE 如果您也有好的资料共享，Email me：SirBSirB*_* 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 7 页，共 7 页 - - - - - - - - -

展开阅读全文