信息检索：算法与启发式方法（英文版·第2版）--聚文网

精选

¥5.83

世界图书名著昆虫记绿野仙踪木偶奇遇记儿童书籍彩图注音版

¥5.39

正版世界名著文学小说名家名译中学生课外阅读书籍图书批发 70册

¥8.58

简笔画10000例加厚版2-6岁幼儿童涂色本涂鸦本绘画本填色书正版

¥5.83

世界文学名著全49册中小学生青少年课外书籍文学小说批发正版

¥4.95

全优冲刺100分测试卷一二三四五六年级上下册语文数学英语模拟卷

¥8.69

父与子彩图注音完整版小学生图书批发儿童课外阅读书籍正版1册

¥24.2

好玩的洞洞拉拉书0-3岁宝宝早教益智游戏书机关立体翻翻书4册

¥7.15

幼儿认字识字大王3000字幼儿园中班大班学前班宝宝早教启蒙书

¥11.55

用思维导图读懂儿童心理学培养情绪管理与性格培养故事指导书

¥19.8

少年读漫画鬼谷子全6册在漫画中学国学小学生课外阅读书籍正版

¥64

科学真好玩

¥12.7

一年级下4册·读读童谣和儿歌

¥38.4

原生态新生代(传统木版年画的当代传承国际研讨会论文集)

¥11.14

法国经典中篇小说

¥11.32

上海的狐步舞--穆时英(中国现代文学馆馆藏初版本经典)

¥22.05

猫的摇篮(精)

¥30.72

幼儿园特色课程实施方案/幼儿园生命成长启蒙教育课程丛书

旧时风物(精)

三希堂三帖/墨林珍赏

寒山子庞居士诗帖/墨林珍赏

苕溪帖/墨林珍赏

楷书王维诗卷/墨林珍赏

兰亭序/墨林珍赏

祭侄文稿/墨林珍赏

蜀素帖/墨林珍赏

真草千字文/墨林珍赏

进宴仪轨(精)/中国古代舞乐域外图书

舞蹈音乐的基础理论与应用

编辑推荐

随着Google、百度等搜索引擎公司的崛起，信息检索已经成为令人振奋的热门研究领域。
本书从发展的角度描述了ad hoc信息检索，讨论了用来实现大规模数据检索的近期新算法，详细介绍了推理网络和系统的效率，并且对每种方法都给出了详细可行的实例。此外，本书整合了结构化和非结构化数据的处理技术，这是其他教材所不具备的。
第2版新增加了IR语言模型和跨语言检索，还讨论了许多当前的热点话题，如XML、P2P信息检索、文本查重、文档并行聚类、不同检索策略的融合、信息中间表示等。
本书兼顾了学科广度和主题深度，把握了近期新的发展趋势，是信息检索领域的一本名著，更为许多有名高校（如美国普林斯顿大学、罗格斯大学）采用为教材。

内容简介

本书是“信息检索”课程的很好教材，书中对信息检索的概念、原理和算法进行了详细介绍，内容主要包括检索策略、检索实用工具、跨语言信息检索、查询处理、集成结构化及数据和文本、并行信息检索以及分布式信息检索等，并给出了阐述算法的大量实例。
本书有一定的深度和广度，而且所有的内容都用当前的技术阐述，是高等院校计算机及信息管理等相关专业本科生和研究生的理想教材，对信息检索领域的科研和技术人员也是很好的参考书。

作者简介

格罗斯曼（David A.Grossman），佐治亚梅森大学博士。现在伊利诺伊理工大学计算机系任教。曾在美国政府部门不错技术服务中心和研究发展办公室担任项目经理。主要研究领域包括信息检索、结构化与非结构化数据集成以及数据挖掘。

1. INTRODUCTION
2. RETRIEVAL STRATEGIES
  2.1 Vector Space Model
  2.2 Probabilistic Retrieval Strategies
  2.3 Language Models
  2.4 Inference Networks
  2.5 Extended Boolean Retrieval
  2.6 Latent Semantic Indexing
  2.7 Neural Networks
  2.8 Genetic Algorithms
  2.9 Fuzzy Set Retrieval
  2.10 Summary
  2.11 Exercises
3. RETRIEVAL UTILITIES
  3.1 Relevance Feedback
  3.2 Clustering
  3.3 Passage-based Retrieval
  3.4 N-grams
  3.5 Regression Analysis
  3.6 Thesauri
  3.7 Semantic Networks
  3.8 Parsing
  3.9 Summary
  3.10 Exercises
4. CROSS-LANGUAGE INFORMATION RETRIEVAL
  4.1 Introduction
  4.2 Crossing the Language Barrier
  4.3 Cross-Language Retrieval Strategies
  4.4 Cross Language Utilities
  4.5 Summary
  4.6 Exercises
5. EFFICIENCY
  5.1 Inverted Index
  5.2 Query Processing
  5.3 Signature Files
  5.4 Duplicate Document Detection
  5.5 Summary
  5.6 Exercises
6. INTEGRATING STRUCTURED DATA AND TEXT
  6.1 Review of the Relational Model
  6.2 A Historical Progression
  6.3 Information Retrieval as a Relational Application
  6.4 Semi-Structured Search using a Relational Schema
  6.5 Multi-dimensional Data Model
  6.6 Mediators
  6.7 Summary
  6.8 Exercises
7. PARALLEL INFORMATION RETRIEVAL
  7.1 Parallel Text Scanning
  7.2 Parallel Indexing
  7.3 Clustering and Classification
  7.4 Large Parallel Systems
  7.5 Summary
  7.6 Exercises
8. DISTRIBUTED INFORMATION RETRIEVAL
  8.1 A Theoretical Model of Distributed Retrieval
  8.2 Web Search
  8.3 Result Fusion
  8.4 Peer-to-Peer Information Systems
  8.5 Other Architectures
  8.6 Summary
  8.7 Exercises
9. SUMMARY AND FUTURE DIRECTIONS
References
Index

摘要

    3.4.1 D'Amore and Mah
    Initial information retrieval research focused on n-grams as presented in[D'Amore and Mah, 1985]. The motivation behind their work was the fact thatit is difficult to develop mathematical models for terms since the potential fora term that has not been seen before is infinite. With n-grams, only a fixednumber of n-grams can exist for a given value of n. A mathematical modelwas developed to estimate the noise in indexing and to determine appropriatedocument similarity measures. D'Amore and Mah's method replaces terms with n-grams in the vector spacemodel. The only remaining issue is computing the weights for each n-gram.Instead of simply using n-gram frequencies, a scaling method is used to nor-malize the length of the document. D'Amore and Mah's contention was that alarge document contains more n-grams than a small document, so it should bescaled based on its length. To compute the weights for a given n-gram, D'Amore and Mah estimatedthe number of occurrences of an n-gram in a document. The first simplifyingassumption was that n-grams occur with equal likelihood and follow a binomialdistribution. Hence, it was no more likely for n-gram "ABC" to occur than"DEE" The Zipfian distribution that is widely accepted for terms is not true forn-grams. D'Amore and Mah noted that n-grams are not equally likely to occur,but the removal of frequently occurring terms from the document collectionresulted in n-grams that follow a more binomial distribution than the terms. D'Amore and Mah computed the expected number of occurrences of an n-gram in a particular document. This is the product of the number of n-gramsin the document （the document length） and the probability that the n-gramoccurs. The n-gram's probability of occurrence is computed as the ratio ofits number of occurrences to the total number of n-grams in the document.D'Amore and Mah continued their application of the bino
    ……

信息检索：算法与启发式方法（英文版·第2版）

库存： {{selectedSku?.stock}} 库存充足