Sep 9
Previously, in Part 2 of this series, I blogged about some difficulties in working with Solr. I am following up with some more lessons learned.
This one deals with wildcards. If you look on page 359 of the 2nd WACK book, it states: "A search for ?ar?et would find both Carpet and Target, but not Learjet."
Thanks for Ray Camden for confirming this would actually NOT work. I believe he said it would go in the Errata. Why?
Well, starting with a wildcard value, either * (star) or ? (question mark) will fail with Solr. You will get this nice error: "Error executing query : orgapachelucenequeryParserParseException_Cannot_parse_XXX__or__not_allowed_as_first_character_in_WildcardQuery".
So although it would nice to search for "*ing", it would be impractical according to the Solr folks. Is there a way around this? Well, theoretically yes.
Lets say in column1 you wanted to search for ?ar?et just like in the example. Do the following:
- When you build your SQL query, add a column and do a REVERSE. For example: SELECT column1, column2, REVERSE(column1) AS reverseColumn1..
- Index the results.
- Then when searching, do a reverse of the term if it starts with a wildcard. In this example: "te?ra?"
Sep 9
Previously, in Part 1 of this series, I blogged about some difficulties in working with Solr. I am following up with some more lessons learned.
- In order to index with more than one category, Adobe suggested that instead of category="column1,column2", which places the literal value "column1,column2" in the category instead of getting the respective values, I try: category="#queryName.column1#,#queryName.column2#". When I did this, it transformed the values all right, but of the first record only. So all records in the index has the same value of the first record. My hack?
- Run the query as usual.
- Do a query of query. Do: SELECT *, column1 + ',' + column2 AS indexCategory FROM queryName.
- Instead of category="column1,column2", use category="indexCategory". This will put the appropriate comma-delimited category in place.
- Basically since cfindex will work correctly with a single category, I cfquery to create a special column for this of containing the values I wanted.
- Escape Special Characters. One of the categories I was using was a lookup of state codes. So it has values like CA, VA, NY, and OR. Notice something odd? That's right, Solr didn't like 'OR', since it is a reserved operator word. In fact, you can go here to see a list of reserved words and a notice on escaping characters for Lucene and Solr @ http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Characters. NOTE: This escaping was needed only when searching using CFSearch, not when indexing. Maybe someone should make a UDF for this, but my simple fix was: replace(VALUE,'OR','\OR'). Note the .
- More to come…
 |  | | |
Sep 7
More than anything, these series of posts are some notes and tidbits I've learned as we move our large Verity collection over to Solr. These notes apply to CF 9.0.1.
- Rule #1 – Adobe Docs suck. Be creative in your searches. I found answers for question in the following places:
- Blogs: including but not limited to Ray Camden's blog and various Adobe Engineer blogs.
- Adobe Press Releases, Release Notes, and Change Logs – Whether a feature is enabled has been hidden away.
- Apache Solr's docs
- Google – Sorry, I mean Scroogle.org.
- Rule #2 -Tune your Solr Install. Just like your CF instance, modify the solr.lax file under the solr-install root directory. Look for two lines.
- lax.nl.java.option.additional – this line contains the JVM args. We upped memory to 1024 from 256.
- lax.nl.current.vm – we pointed this to the latest \bin.javaw.exe file under a 64-bit jdk. 64-bit Solr? You bet!
- Rule #3 – Increase Buffer Size – In CF Administrator, go to Solr Server -> Show Advanced Settings. Change Solr Buffer Limit from 40 to 80. For the why on this, use Scroogle.
- Rule #4 – Default Operator – When we used Verity, searching for 'fire water' would in effect search for 'fire and water'. With Solr, 'fire water' searches for 'fire OR water'. If you need to change the default operator between words in keyword searches, don't despair. Go to where your solr data is located (the root directory of it), and go to \conf\schema.xml. Around line 528 you should see: <solrQueryParser defaultOperator="OR"/>, change to <solrQueryParser defaultOperator="AND"/> if need be.
- Rule #5 – Support for Categories Seems Broken (as of 9/7/2010) – I am seeking more data on this. Let's take an example, if you index a query with columns: keyA, columnB, columnC. In your cfindex, you set category = "columnB"… it works ok. But if you set category="columnB,columnC" – it takes the literal value inside of quotes without transforming it and sets it as the category!
- Rule #6 – Support for Categories Sucks – Whoa again? Yes. This time when searching. Let's say you did index with columnB above which can have two values: valueA, and valueB. And in your cfsearch, per docs, category takes a comma-delimited list of categories. Wrong! After much trial and error, I figured out that for valueA, you can use category="valueA", but for multiple categories, you have to use not commas, but search operators. So for either category, use "valueA OR valueB". If you want both, use AND.
- Rule #7 – Operators are CASE-SENSITIVE! Be Warned. So we used to allow users to enter for keywords: 'fire and water'. Now they must use 'fire AND water'. The lower case AND does not count! I had to build a custom UDF to get this to work, part of a larger "solrClean" udf (as opposed to the famous verityClean UDF). I will release this code soon. This is NOT user friendly at all.
- Rule #8 – Custom fields are broken. Oh wait they are not! In your cfsearch, in " and CF_CUSTOM2 <MATCHES> xyz". With Solr, you must re-write this to be " AND custom1:xyz". Note the dropping of "CF_".
- Rule #9 – Don't use custom fields in search like #8 when returning suggestive results. It will say: Did you mean: custom instead of custom2. Ugh.
- To be continued…
 |  | | |
Sep 2
Does anyone else have this annoyance? Whenever I open a large file in CFBuilder (built on Eclipse), I see a little notice at the bottom saying "Refresh Content" and it always tends to freeze my IDE. It can occur several times per minute. I've search and found no solution.
8-27-2010
8-26-2010
8-23-2010
8-19-2010
8-9-2010