Swishd cluster system is an application that will allow swish-e to
scale out to multiple machines. Thus allowing the number of indexes (or
collections) to become almost limitless. By scaling out to multiple
swishd nodes the index sizes can remain small as the number of documents
increase. This it typically measured in millions of documents/files.
Download the latest source via GIT:
git clone git://solar1.net/swishd.git
Architecture Overview
A client makes a TCP connection to the cluster_mgr (default port of 5500). The client sends a query in XML format to the cluster_mgr. Cluster_mgr will in turn connect to each swishd node indicated in the configuration file (TCP 5000) and submit the search query to each node for the collection specified in the client query. The swishd node(s) will run the search against theappropriate index and return results to cluster_mgr. cluster_mgr will in turn assemble and sort the results by rank and return in XML format back to the client. |
![]() |
Search Query Format
The
swishd nodes can house several indexes which can be categorized into
several "collections". For example there can be a document collection
for sports and another forlegal documents. You may want to search for
the phrase "Jason Giambi" and get news about his legal cases but you may
not necessarily want news about games he has played. To do this, you
would specify the collection for your legal documents in the search
query.
The client sends the original query in XML format. An example of the format is as follows:
sports
legal
Jason Giambi
This
would instruct the swishd nodes to search both the legal and the sports
collections for any documents containing the query phrase.
Results Format
Cluster_mgr will return the final results to the client in XML format. An example of the format is as follows:
/documents/LEGAL/7b0000003fbcda.xml
1000
92003
(null)
/index/legal_1.idx
2004-05-06 00:42:01 EDT
1
59971
-
Path : The absolute path to the document.
-
Size : Size in bytes of the document.
-
Title : Title of the document (If applicable)
-
Index : The index that contained the information about the document.
-
Modified : The time stamp (mtime) of when the document was last modified.
-
Record : Not used.
-
File : Not used.
Comments
Add new comment