![]() |
|
![]() ![]()
![]() |
|
asp:CaseStudy
Cybergroup Selects dtSearch dtSearch’s Text Retrieval Engine Powers Web-based Business Intelligence Mining Library
Cybergroup’s client requested that Cybergroup develop a Web-based business intelligence mining library, including Web-based searching seamlessly combining both its structured SQL database and its separate document collection.
Project Requirements and Background Cybergroup’s client realized that database information, although critical to its business intelligence, represented only a small portion of all its corporate information. By the client’s estimate, its corporate database contained a mere 20% of business-decision information, while the remaining 80% could be found in other sources — Web site pages, Microsoft Office documents, PDFs, etc. The client needed a single search to cover both the SQL database and the file repository, as well as to return unified results from both sources.
To ensure that a search of the combined database and document repository retrieve all relevant information, the client further required not only basic search functionality, such as word and phrase searching, but also advanced search features. The client wanted search features like stemming and fuzziness for word misspellings, as well as phonic searching. The client also wanted concept searching, including the capability for synonym expansion using both pre-defined thesaurus terms and a user-defined thesaurus/synonym list.
For sorting search results, the client wanted a variety of advanced relevancy ranking options. Finally, for ease of browsing search results, the client specified that the search must return retrieved SQL database entries and documents with highlighted hits (as well as a preferably WYSWYG display of Web pages like HTML, PDF, and XML, along with the highlighted hits).
Going forward in terms of digital library management, the client needed Cybergroup to develop a solution allowing multiple contributors to be able to upload documents to the Web library. Upon document check-in, the client further needed a mechanism to add to the client’s main SQL database metadata regarding the document.
Solution Overview To meet all of the above requirements for the project’s search functionality, Cybergroup chose the dtSearch Text Retrieval Engine for Win & .NET by dtSearch Corp.(http://www.dtsearch.com). A single dtSearch index could include both the SQL database and the separate document repository, including searching with all the above advanced search features, ranking capabilities, and hit-highlighted display options.
To use these built-in capabilities, Cybergroup needed to write custom VB.NET code to “drag along” certain fields from the database that would be associated with each document and stored in the searchable index. Cybergroup also needed to write a custom ASP.NET-based server control using the dtSearch Engine APIs. Cybergroup called this application its “dtResults Control”; screenshots and a detailed description of Cybergroup’s dtResults Control follow.
Cybergroup’s Description of dtResults Control Like any .NET control, a developer can drag and drop the dtResults Control right into a development environment. Cybergroup implemented the dtResults Control by inheriting from the datagrid control, leveraging the existing power of the datagrid. Cybergroup chose the datagrid as a foundation for its server control because it offers built-in paging and a robust programming model.
The following code is from Cybergroup’s sample application, and appears when the user enters a search term or phrase and clicks the Search button:
Private Sub GetResults()
'Setting the location of the index SearchResultList1.IndexPath = "c:\dbconnectorindex" 'Mapping virtual path of documents to physical path Dim rptd As New SearchResultList.SearchResultList.ResultPathTranslationDictionary rptd.Add("c:\testdocs", "./testdocs")
'Setting various search settings SearchResultList1.RelativePathTranslations = rptd SearchResultList1.SortCaseInsensitive = cbCaseInsensitive.Checked SearchResultList1.SortAscending() = ddAscendingFlag.SelectedValue SearchResultList1.SearchType = ddSearchType.SelectedValue SearchResultList1.SortType = ddSort.SelectedValue SearchResultList1.Stemming = cbStemming.Checked
If cbFuzzyness.Checked = True Then SearchResultList1.Fuzzy = True SearchResultList1.FuzzLevel = ddFuzzyness.SelectedValue Else SearchResultList1.Fuzzy = False End If
SearchResultList1.Phonic = cbPhonic.Checked
SearchResultList1.Synonyms = cbSynonyms.Checked
'Defining dtSearch custom fields to be displayed Dim cfn As String() = {"SupplierID", "CompanyName", "Region"} SearchResultList1.CustomFieldNames = cfn
Dim cffn As String() = {"Supplier ID #", "Company Name"} SearchResultList1.CustomFieldFriendlyNames = cffn
If chkSearchWithin.Checked = True Then SearchResultList1.SearchWithin = True SearchResultList1.PreviousSearchFilter = Session("psf") End If
'Executing the search and binding the results SearchResultList1.GetResults(tbSearch.Text)
'Storing the "previous search filter" to be used later if user clicks "Search Within Results" Session("psf") = SearchResultList1.PreviousSearchFilter
Literal1.Text = "Search: <B><I>" & tbSearch.Text & "</I></B> returned: " & CType(SearchResultList1.DataSource, DataTable).Rows.Count & " results"
End Sub
The following provides a flavor of the development and functionality behind Cybergroup’s development of the dtResults Control.
Using the GetResults method of the dtResults Control, Cybergroup reduced the task of creating the search and results display to one line of code in the simplest case. We can execute a search and then display results by passing a search string input by the user on the search form, as in this example:
SearchResultList1.GetResults(tbSearch.Text) 'ONLY ONE LINE OF CODE
Of course, a developer can also leverage the power of the dtResults Control though its properties. Take for example the SortType property. Simply put, the SortType property allows the developer to sort the information in results display. Let’s say the developer wants to have the most recently modified documents appear first in the results display. The developer would set the SortType property to “date” and the Ascending property to “false”; for example:
SearchResultList1.SortType = "date" SearchResultList1.Ascending = false
On the internal side of the control, a canned set of strings like “date”, “hits”, and “title” are checked, and the Ascending variable is checked. Then the control produces a hex variable containing dtSearch flags encoded in a certain way to be passed to its sort function. However, the binary manipulations are abstracted, and the developer can even bind the variables, by single lines of code, to checkboxes or dropdown lists.
Here’s the code in the dtResults Control for the SortType property:
Dim flags As New dtengine.SortType
If Not (sortf = 0) Then flags = sortf ElseIf sortt = "hits" Then flags = dtengine.SortType.stSortByHits ElseIf sortt = "index" Then flags = dtengine.SortType.stSortByIndex ElseIf sortt = "date" Then flags = dtengine.SortType.stSortByDate ElseIf sortt = "timeofday" Then flags = dtengine.SortType.stSortByTime ElseIf sortt = "title" Then flags = dtengine.SortType.stSortByTitle ElseIf sortt = "name" Then flags = dtengine.SortType.stSortByName ElseIf sortt = "filetype" Then flags = dtengine.SortType.stSortByType ElseIf sortt = "size" Then flags = dtengine.SortType.stSortBySize Else flags = dtengine.SortType.stSortByUserField End If
If sascend Then flags += dtengine.SortType.stSortAscending End If
If cinsens Then flags += dtengine.SortType.stSortCaseInsensitive End If
res.Sort(flags, sortt)
Critically important to our project is the ability to extract “custom field” data from the dtSearch index. Custom fields are columns that we have extracted from the database during the indexing process and now wish to present in a search results display.
Through the use of the “CustomFieldNames” and the “CustomFieldFriendlyNames” properties, a developer can easily and attractively display database information in the results display.
The CustomFieldNames property is a string array of the names of custom fields (i.e., database columns) in the index that the developer wishes to include in the results. When defined, the strings in it should appear exactly as they do in the index. For example, {“SupplierID”, “CompanyName”, “Region”}.
The CustomFieldFriendlyNames property is a string array that represents the names of the fields that the developer would like to have appear in the control. This provides for a high degree of customization in results presentation. Rather than display cryptic database column names, the developer can display understandable labels. These names are connected to actual custom fields by their position in the array, with regard to the CustomFieldNames property above. If the string is longer than CustomFieldNames, then the end is discarded. If shorter, then the names of the remaining custom fields default to their actual names. For example, {“ID # of Supplier”, “Supplier Name”, “Supplier’s Region”}.
To return the Custom Field information in the results display the developer would simply set the properties as in the following example:
Dim cfn As String() = {"SupplierID", "CompanyName", "Region"} SearchResultList1.CustomFieldNames = cfn Dim cffn As String() = {"Supplier ID #", "Company Name"} SearchResultList1.CustomFieldFriendlyNames = cffn
Following is a complete list of the dtResults Control properties and methods:
Ascending: If true, the results will be sorted in ascending order by whatever criterion is specified in SortType. If false, results are sorted in descending order. Defaults to false.
CustomFieldNames: This string array represents the names of custom fields in the index that the developer chooses to include in the results. The strings in it should appear exactly as they do in the index; for example, {“SupplierID”, “CompanyName”, “Region”}.
CustomFieldFriendlyNames: This string array represents the names of the fields that the developer wants to appear in the control. These names are connected to actual custom fields by their position in the array, with regard to CustomFieldNames. If longer than CustomFieldNames, then the end is discarded. If shorter, then the names of the remaining custom fields default to their actual names. For example, {“ID # of Supplier”, “Supplier Name”, “Supplier’s Region”}.
Fuzzy and FuzzLevel: These control the tolerance of the search; for example, searching for “alphabet” with Fuzzy = True and FuzzLevel = 1 would also search for “alphaqet” or “albhabet”. Searching for “alphabet” with Fuzzy on and FuzzLevel at 3 would also find “alpkaqet”.
IndexPath: This is the location of the dtSearch index files to use for searching. If it is not set, then SearchResults will look for an “IndexPath” key in Web.config.
Phonic: Controls phonic searching; for example, with Phonic = True, searching for “Smith” would also find “Smythe”.
PreviousSearchFilter: This allows the developer to create “Search Within Results” functionality, in conjunction with the SearchWithin property, described below. This property should be saved to a session variable after the initial search, and restored from it when the user triggers a “Search Within Results”.
RelativePathTranslations: A SearchResultList.ResultPathTranslationDictionary containing the relative paths of the absolute paths to documents stored in the dtSearch index. This allows a URL to be generated for the link to the document, given only an absolute path on the server. For example, one might include the following in an initialization method:
Dim rptd As New SearchResultList.SearchResultList.ResultPathTranslationDictionary
rptd.Add(“c:/Inetpub/website/search/documents”, “documents”) rptd.Add(“c:/Inetpub/website/tutorials”, “../tutorials”)
SearchResultList1.RelativePathTranslations = rptd
SearchType: A string. Valid values are “allwords”, “anywords”, “phrase”, and “boolean”. In the “allwords” setting, dtSearch will search for any document containing each word in the search, in any order or proximity. In the “anywords” setting, dtSearch will search for any documents containing any of the words in the search query, not necessarily all of them in the same document. In the “phrase” setting, dtSearch will consider the entire search query like a single word, and search for documents containing the exact query. In the “boolean” setting, the user can use Boolean logic to specify a query. dtSearch provides the following guidance:
SearchWithin: If this property is set to True, and the PreviousSearchFilter property is set to a value obtained from it after a previous GetResults call, then the results of the current search will be a subset of the results of the previous search.
SortType: A string. Meaningful values are “hits”, “date”, “name”, and “size”. If set to “hits”, the documents containing the most occurrences of the search query, or the highest score, will appear on top. If set to “date”, the most recently modified documents will appear on top. If set to “name”, the documents will be sorted in alphabetical order of their title. If set to “size”, the documents with the largest file sizes will appear on top. If the field has a different value than any of these, it is assumed to be the name of a custom field in the index by which to sort.
Stemming: Controls the word stemming capability of dtSearch. For example, if Stemming = True, searches for “apply”, “applying”, “applier”, or “applies” are all equivalent.
Synonyms: Uses an English thesaurus to search for synonyms of the search query in addition to the search query itself.
GetResults(SearchText As String): Simply put, this function evaluates a search with the arguments determined by properties on the query string passed, and displays the results in a human-readable format, with 10 results per page and a pager control. Until this method is called, the control is invisible to the user.
Greg Bean is President of Cybergroup, Inc., a developer of advanced Internet and intranet developer search tools in Baltimore, MD. E-mail him at mailto:gbean@cybergroup.com.
dtSearch dtSearch offers over a decade of experience in text search and retrieval. Large enterprises typically use dtSearch products for general information retrieval, Internet and Intranet site searching, access to technical documentation, and embedding in applications for distribution. dtSearch is also on the US Government’s GSA Schedule. The company has distributors worldwide, including coverage on six continents. For more information visit http://www.dtsearch.com.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||