Ok, so you've got Seeker installed, and you're itchin' to get it working. If you remember, we ignored the "demos" subfolder and the "docs" subfolder.
The docs subfolder contains two files, which provide basic information on what's going with Seeker, and some basic information on how to install Seeker. The install documentation is a "bit light" so hopefully you are following my instructions and you should be fine.
Here is how to understand the demos subfolder. Make sure you place the demos subfolder somewhere where it is accessible via a URL on the server Seeker was installed. For me that URL is http://dev01/temp/seeker/demos/. I'll be referencing everything off this URL.
The first file you want to hit is lucene_test.cfm. This file should spit something like this out to you: "Let's see if we can run Lucene, and what version we have... package org.apache.lucene, Lucene Search Engine: core, version 2.3.2". This means you have Seeker installed properly and that ColdFusion can interact with the JAR file properly. So far so good. If you hit an error, go over the install instructions again.
Next, let's test Lucene's indexing of static text and html files. Inside the demos folder, you'll note two subfolders: "files" and "filesIndex". Files is where the text files to be indexed are stored, and fileIndex is where Lucene creates its own index files for the files folder. If you look at the code Ray put together, you can see this is all customizable, and of course this is just for testing purposes at this point.
So let's do that indexing of the files. Run file_test.cfm. If this is the first time, you should see something similar to this at the top: "Doing index of D:\wwwroot\TEMP\seeker\demos\files to D:\wwwroot\TEMP\seeker\demos\filesindex". This means that Lucene has gone through and created the index files inside the filesIndex folder. If you need to reindex for any reason, simply add this to the URL: file_test.cfm?reindex and it will force a reindex. You should also be seeing a text input box, that will allow you to search against the newly created index. Give it a whirl!
If you want to test further, you can add words, or files, to the files folder, and force a reindex, and do a search again to see if Lucene picked it up. Feel free to try partials like "tes*" without the quotes.
Next, lets take a look at indexing db entries. New ColdFusion installs come with a datasource called "cfartgallery", and that is what Ray uses for his example. But it should be pretty easy to change that and have this example work. So browse to query_test.cfm. You should see a message similar to: "Doing index of db query to D:\wwwroot\TEMP\seeker\demos\dbindex". The query: "select * from art" runs and stores that in Lucene format in the "dbindex" subfolder. Again, you should also be seeing a text input box, that will allow you to search against the newly created index. Give it a whirl!
You can also force the same reindex after adding or editing records in the DB. And the same goes for partial searches.
Next, I'll probably a larger test and compare with Verity, and also look at the inside of Seeker, and see how its functioning. In the mean time, enjoy!


Jul 2, 2008 at 4:31 PM I got Lucene and Seeker installed no problem - thanks for the easy instructions.
One thing I think is missing is a context of the search result. I quickly mocked this up by using cffile to open each file in the search result and get the context manually, then add it to the result query, but that seems to be an inefficient way to do it. Isn't the context stored in the search result somewhere?
Spidering is the other thing that I'd like to see, so I took a look at Nutch, and my eyes quickly glazed over when I read about cygwin...
Overall, the engine seems fast and easy to use. I look forward to more updates.
Jul 2, 2008 at 5:00 PM @Nathan,
I will be looking a bit later at some of the advanced settings of Lucene. Some of the fuzzy searching in Ray's code doesn't seem to be working like it did for me pre-Seeker. Stay tuned! And yeah, spidering always gets crazy.
Jul 2, 2008 at 7:47 PM Sami, thanks for giving Seeker a good run through. Please be sure to email me your findings. I'm following your blog, but I want to be sure I don't miss anything.
Context isn't supported by Lucene... I think. But I keep getting surprised by what is in there.
Jul 2, 2008 at 8:00 PM @Ray,
Will do. I have some of my own custom code which I'll compare with Seeker. More to come!
Jul 6, 2008 at 7:09 AM You can do context pretty easily with Lucene (see http://swem.wm.edu/beta/flathat/?q=classes). That sample uses a SimpleHTMLFormatter and a Highlighter to generate the spans around the word. They've moved some things around, but check out the jars in the contrib folder for a lot of this additional functionality.
One of the other really cool things is that you can generate a spelling index, not of the words in a dictionary, but in your target pages (which significantly improves finding 'things').
Jul 6, 2008 at 2:15 PM Thanks Wayne. Will definitely be looking at that.