Searching for artifacts using text criteria

Project Tracker allows you to search within a specific field where you enter the search string. For example, if you enter the search string in the Summary field of an artifact, the search will restrict itself to that field in all the artifacts.

For example if you were to do the following:

  1. Log in to CollabNet.
  2. Select a project with Project Tracker as its tracking tool.
  3. Click the Query Artifacts link.
  4. Click the Advanced Query tab. The screen displays several sections with each field of the artifact represented as a section.
  5. Enter the search keywords "Defect Information" in the Summary field of one of the sections, for example: Duplicate Check Info.
  6. Click Submit at the bottom of the screen.

The results will be displayed listing all artifacts containing the words "Defect" OR "Information" in the Summary field of the Duplicate Check Info section.

Text queries

Simple text searches

A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases. Single Term is a single word such as "test" or "hello". A Phrase is a group of words surrounded by double quotes such as "hello dolly".

The required minimum that can be searched is a Term or a Word. For example, searching for "hell" will not return the word "hello." Search is case insensitive.

Stopword filtering

The Search system filters some of the commonly found words like a, at, are, is etc which might well be considered noise in searching. This allows the focus to remain on the main Search keyword. The list of stop words that are filtered currently are:
[a, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, s, such, t, that, the, their, then, there, these, they, this, to, was, will, with]

When indexing happens, the stopwords are filtered and are not stored in the index. While indexing the term "print at", the search engine will ignore the stop word "at" and only the word "print" will be stored in the index. While querying, if you search for the word "print at", the "at" will be ignored and the search will be made only for the word "print." Searching for "print at" is the same as searching for "print.

Wildcard searches

Wildcard searches allow you to search on partial terms. The wildcard character can appear in the middle or end of a term but not as the first character. You can perform single or multiple character wildcard searches. To perform a single character wildcard search use the "?" symbol. Single character wildcard searches look for terms with the single character replaced. For example, to search for "text" or "test" you can use the search te?t. To perform a multiple character wildcard search use the "*" symbol.

Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search: test*

Note: - While indexing, the word is stemmed and only the stem is stored in the index. This is important to note when making wildcard queries as stemming is not done for wild card queries. Searching for playe* does not return "player". While indexing, "player" is stemmed and "play" gets stored in the index. So when querying for playe*, Search does not find a word that starts with "playe" in the index. While using wildcard search, a word should be used only till the stem of the word. If Search returns no results with a word you expect it should have found, try the search again using the exact word.

Fuzzy searches

Fuzzy searches are based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "roam" use the fuzzy search: roam~ This search will find terms like foam and roams.

Phrase queries

In Search, a phrase is defined as a group of words surrounded by double quotes. Enclosing words within quotes will search for artifacts with words adjacent to each other. For example searching for "takes prints" will return an artifact containing "He takes prints" but will not return an artifact that contains the text "takes quality prints."
Enclosing within double quotes will allow you to use special characters. For example assume that an artifact contains the hyphenated word audio-video. If you search for this word without quotes, then the search will return all the artifacts which contains "audio," but not "video." Enclosing them within quotes will return the artifact with the whole name audio-video.
Use of wildcards are not supported in quotes. For example, assume that an artifact contains the word "San Francisco." Searching for francis* will return the artifact, but searching for "francis*" will not return the artifact as it is similar to searching for francis. The wildcard is ignored.
Double quotes do not prevent stemming. Searching for "word1 word2" is equal to searching for "stem(word1) stem(word2)", where the stem is the word after stripping off the suffixes.

Note:- Once an artifact is created or modified it will take some time for it to get indexed, approximately 10 minutes. The artifact will be available for search only after this.

Proximity searches

Proximity searches find words that are a within a specific distance from each other. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for "apache" and "jakarta" within 10 words of each other in the text use the search: "jakarta apache"~10.

Range searches

You can search for documents where the field values are within a specified range. For example [120 TO 125] will return all the artifacts whose fields contain values between 120 and 125 both included. [] stands for the lower and upper limits that have been included, {} stands for the lower and upper limits that have been excluded. You can do a Range query with date values as well as string values. For example, [20020101 TO 20030101] will find you documents whose fields have values between 2002/01/01 and 2003/01/01, both dates inclusive. Similarly, {Aida TO Carmen} will find all documents whose titles are alphabetically between Aida and Carmen, but don't include Aida and Carmen.

Boolean operators

Boolean operators allow terms to be combined through logic operators: AND, "+", OR, NOT and "-". Boolean operators must be ALL CAPS. If two terms are entered with no Boolean operator, the OR operator is used by default.

OR The OR operator is used between two terms to search for text that contains either of the terms. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.

AND The AND operator is used to find text that contains both terms anywhere in the text. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.

+ The "+", or required, operator requires that the term after the "+" symbol exist somewhere in the text. To search for results that must contain "jakarta" and may contain "lucene" use the query: +jakarta lucene.

NOT The NOT operator excludes results that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT. To search for text that contains "jakarta" but not "jakarta lucene" use the query: "jakarta" NOT "jakarta lucene." Note: The NOT operator cannot be used with just one term. For example, the following search will return no results: NOT "jakarta apache"

- The "-", or prohibit, operator excludes results that contain the term after the "-" symbol. To search for text that contain "jakarta apache" but not "jakarta lucene" use the query: "jakarta apache" -"jakarta lucene".

Grouping

Grouping is supported using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query. To search for either "jakarta" or "apache" and "website" use the query: (jakarta OR apache) AND website. This eliminates any confusion and makes sure that the website must exist and either term jakarta or apache may exist

Escaping Special Characters

Using any of the previously discussed characters or words, which would normally be interpreted as operators, within a term can be achieved using escaping. Note: some operators exist for the underlying lucene search engine, but are not exposed in Project Tracker's usage. These operators such as colon ":" need to be escaped. The current list special characters contains: && || ! ( ) { } [ ] ^ " ~ * ? : \

You can escape special characters by placing the term in quotes or by placing a "\\" (backslash) before the operator. You should always use quotes for search strings containing numbers. For example, if you are searching for a database error code, such as ORA-00932, you will need to enter the term in quotes in the search input field: "ORA-00932". However if searching for a normal hyphenated word such as twenty-five, entering the search text as twenty\\-five or "twenty-five" will work.

Using quotes can work if AND or OR are used as part of the query text as well, though these words are included in a set of common english words that are ignored in searches.

Boosting a term

The caret symbol, "^" allows you to provide a relevance level so that documents with terms that have a high relevance level appear to be more relevant. The default boost factor is 1, and it can be higher or less than one (for example, .5) as long as it is a positive number. For example, you might want to find the following string:

collabnet environment

If you enter the following, documents that contain the term "collabnet" appear to be more relevant.

collabnet^4 environment

To assign a boost factor to a phrase, use quotes, for example:

"collabnet environment"^4

Field grouping

You can search for groups of words, for example, to find a title that contains both "collabnet" and "enterprise edition" you would enter the following:

title:(+collabnet +"enterprise edition")