Browse Tag


Indexing files in Solr using Tika

After installing the latest Solr release I noticed that the schema.xml file (the XML-file that holds the information about how Solr should index and search your data) already was setup for something called Apache Tika that can be used to index all sorts of documents and other files that holds all kinds of metadata (audio-files for instance).

See this post to learn how to install Solr on Windows: Installing Apache Solr on Windows

The great thing about Tika is that you don’t need to do anything in order to make it work – Tika does everything for you – or almost everything. You can also set it up so that it suits your needs just by adding a config-XML file in the Solr directory and a reference in the Solr-config XML-file. I tried to get it to work, but found that quit difficult also because there wasn’t much help to get out on Google. Because of all my problems getting it to work I created this post on StackOverflow. Before I got a reply on the subject though, I found the solution myself burried deep inside some forum posts about SolrNet.

This how you do in order to get it to work inside Solr using SolrNet as client.

Install Solr using the link from my earlier post above or something similiar – the main thing is that you install it using the release from Solr that is already build.
Create a new folder called “lib” inside your Solr-install folder.
Copy the apache-solr-cell-3.4.0.jar file from the “dist”-folder from the Solr zip-file to the newly created “lib”-folder the folder where you installed Solr.
Copy the content of contrib\extraction\lib from the Solr-zip to the same newly created “lib”-folder the folder where you installed Solr.
Now Tika is installed in Solr! Remember to go to http://localhost:8080/solr and confirm that it is installed correctly.

To use it in a .NET client application you can use the newest release of SolrNet (currently the beta version release) and add the DLLs to your .NET project (all of them – seriously!). This is an example of how to use it in C#:

The response will hold all the metadata that has been extracted from the file using Apache Tika. The ExtractParameters is given a FileStream object and an ID for the Solr index (here just “doc1” – can be anything as long as it is unique). The ExtractOnly property can be set to true if you don’t wan’t to get Tika to index the data, but only wan’t it to extract the metadata from the file that is sent. The file is streamed to the Solr API using HTTP POST. You can read more about that here:

In the above code the data sent to Solr is indexed in the last line, where the data is committed to Solr. If you would like Solr to index and commit the files when sent to the service you can set the AutoCommit property to true inside the initiation of ExtractParameters:

Because the commit is done everytime you send a new file to the Solr-API you can search during the indexing and, of course, you don’t need to call the solr.Commit() method after indexing.

You need a request handler inside your solrconfig.xml (inside {your-solr-install-path}/conf) to make Solr understand the request from the client. Below is an example of how the solrconfig.xml looks when you haven’t changed anything after install of Solr. See this for further information about configuring Tika inside Solr:

Your Solr schema.xml file (inside {your-solr-install-path}/conf) needs some fields in order to index the metadata from the files you send to Solr. You can provide the fields you need and index/store/ the metadata as is required for the files you need to index. This is the fields that Solr is installed with:

The fields above is a fits-all scenario, so with this you can both index audio and document files.

Supported formats in Tika can be found here:

Share this blog post:

CSV parser

I’m currently working on a project where I needed a C# console application that was able to read through a Excel CSV (Comma Separated Values) file.

Basically the CSV file format is just a txt file with rows and each column is then separated by a comma (surprise!) or a semicolon. Besides a comma the data in each column can optionally be ”framed” by quotation marks.

Therefore i started out with the following code, just as I would read through a normal txt file:

This is, as you can see, really straight forward. First of all I declare an object of a StreamReader in a using statement. Using the object ”readFile” I am able then to navigate the file. The using statement is important as this will do the cleanup for me, by calling StreamReader.Dispose(), when the statement finishes. I always wrap this kind of code in a try…catch because when you work with files, errors just occasionally happen.

Now, to read the data from the CSV file I add the following lines of code inside the using statement:

It just declares a new List that can hold an array of strings and the line and row variables is needed when traversing through the file. I then use the readFile object to call the ReadLine() method of the StreamReader class in a while loop. When there is no more lines in the file the line variable will be null. Inside the while loop I use the string.Split() method to split the line into an array of strings (my columns) and I then add this array to my List object (parsedData).

The problem then was that I didn’t know exactly what encoding the file would be in. What to do then? I settled on a solution where I tell the StreamReader what encoding the file probably has and it will then open it in that encoding. This can be done by adding a parameter when calling the constructor on the StreamReader class like this:
using (StreamReader readFile = new StreamReader(path, encoding))

Finally all this can be wrapped in a nice method. I also added a check to be sure that the file I want to parse is actually available. But there you go:

Share this blog post:

WCF-services in a Windows service

Some time ago I ran into one of those problems with no obvious solution. I was on a project where I needed to use a WCF-service for a Silverlight solution. First I started out by making a service that was hosted on the IIS. That was working fine with a connection to a database, but the service was also going to open up some physical files in a specific folder on the Windows server where it was hosted. This could not be done as the service didn’t have access permissions to a folder outside of those under the service. So what to do then?

I searched for a solution to the problem and one that I found was to use Windows impersonation in the service. This “simulates” a user logged in on the server with the rights given to that user. For me this wasn’t an optimal solution for a number of reasons, first of all because it didn’t seem very secure. I quickly started to search for another way to cope with the problem.

The solution I came up with was this: I realized that as I had administrator rights to the server I could host my WCF-services in a Windows service and install it as such. In this way you can run the WCF-service outside of the IIS and run multiple services in the same Windows service as well. Another great thing about it is that if you install it right (as I will show below) you can get it to have access to the file system and it will run as a service under Windows. The example below shows you how the constructor in a Windows service can look like:

What happens here is that I provide the name the service has (in Windows this will be the name of the service) and if it should use the application log under Windows to log in. I also tell it that it should log automatically to the application event log when something happens with the service and I tell it that it’s okay to stop, shutdown and pause and continue among others.

The code example below shows the Main-method of the service. As with any other Windows application this has to be provided as the starting point of the application.

Here I make the service run. Remember the try-catch block because if something bad happens when the service is initialized this will not make everything crash. Also notice the inheritance from the ServiceBase-class. This is what makes our class a service and is needed when we’re going to install it later also and make it run. To make the service do something in certain situations before Windows service events is fired, when it starts, stops, continues, pauses or shuts down (if the server it resides under for instance shuts down), you can override the OnStart, OnStop, OnContinue, OnPause or OnShutdown methods respectively. Up until know I haven’t shown how I combine the Windows and WCF-services, but my next code examples shows just that. What you need first of all is to make your WCF-services start when the Windows service starts. This is done by overriding the OnStart method as mentioned above and then inside this hosting the WCF-services in some service hosts and open them. I learned that a good way to be able to control every service host with the WCF-service inside is to have it declared as variables inside the class. An example on this is provided below.

In my OnStart method I do the following:

First of all for every WCF-service I find out whether it has been initialized before. If it has, I close it so I don’t get an exception thrown when I try to open a service that is already open. Afterwards I declare the service host given the type of WCF-service that it should open and open it. As with everything else I pack it all inside a try-catch block to prevent the Windows service from crashing.

Finally you need to provide the normal configuration settings for your WCF-services inside an App.config file in the Windows service class library. The App.config file can look like the one below:

There you have it! The service works fine like this and if you have provided the correct base address the service can be called from there (remember that what you provide as address in the baseAddress property as the full address for the service – so when you call it you don’t need the .svc prefix). Tip: You can provide multiple endpoints if you need multiple bindings.

The service worked great until I started to test it with the Silverlight application and realized that the service, of course, needed a clientaccesspolicy.xml file to have the right access rights for the WCF-service, but where do you put this when you don’t have your service on the IIS and you can’t just put the access policy file in the root directory? The solution for this is to understand how the Silverlight application asks for the clientaccesspolicy.xml file when calling the WCF-service. When a Silverlight application calls the service in another domain it automatically assumes that the clientaccesspolicy file is located in the root of the domain where the service resides (the base address provided in the configuration file without the service name). If it isn’t there it gives you the standard 404 error (”File not found” or something like that) when you try to call the service from the application. So if you can ”broadcast” a clientaccesspolicy.xml file on the root address this is the solution, but how to do that? The solution is to add a new WCF-service that, by using the HTTP GET-protocol stream the xml-file as a message. The interface for the new service should look like below. The UriTemplate-property of the WebGetAttribute tells what the URI should be. In our example it is the name of the clientaccesspolicy file.

Then, in the implementation of the service interface, you just open and read the clientaccesspolicy.xml file as a stream, load it into a StringReader, add this to a XmlReader, and make a new instance System.ServiceModel.Channels.Message, add your Xmlreader object to the message and return it.

There you have it! When you now call your service from a Silverlight application, you will see that it gets the clientaccesspolicy.xml from the service.

After my service is done I can install it using a Windows installer or a basic console application.

Share this blog post:

Installing a Windows service

To install a service you need to use the ServiceProcessInstaller and the ServiceInstaller classes. These handle the installation of the Windows service as a process with the provided information. The Account-property on the ServiceProcessInstaller is used to tell what security context the service should run under when installed. I use LocalSystem here because I need my service to have access to my file system, but here you have to choose what best fits your situation of course. The username and password properties are used to tell what user the service should run as. Next for the properties of the ServiceInstaller you can specify how and when your service is started with the StartType-property and you can set the service name, display name, a description and then you need to assign the ServiceProcessInstaller instance as parent to the ServiceInstaller instance. Then you need to specify the context in which the service should be installed: the full path to the executable that holds the service as a commandline and a path to a log file if needed. Then you install it by calling the Install-method on the ServiceInstaller instance. Then, because we need to start up the service, we takes control of the installed service with the ServiceController-class and call the Start-method. That’s it! Your service is installed 🙂


If you need to uninstall your service to update it for instance (that’s right, the service executable cannot be altered as long as the service is installed), you can use the below code to do just that. It should be quite straight forward. You just take control of the service as in the code above, figure out if the service is running, if it’s running then it has to be stopped before we can uninstall it, wait for it to stop and then gives the need context-information and then we uninstall by calling Uninstall on our ServiceInstaller object.

Share this blog post: