Integrating NER with Solr.

Integrating NER with Solr.

What is Solr?

Solr is Searching platform, that is built upon Lucene and is supported by the Apache Software Foundation.
Used Mainly for providing blazing fast searches by indexing the content that requires searching.
By implementation the structure of the Solr is highly modular, and its functionality can be further increased by adding Solr Plugins.
We used a similar plugin for our problem, in which we had to identify a Person name(This is where NER comes in) in PDF documents.

What is NER?

Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. In our particular case we were interested in finding only the person names in a given text.
So we were looking around for some of the NER implementations that had already been created.
And we discovered that The Stanford Natural Language Processing Group, has developed a JAVA based library for NLP including a Named Entity Recognizer.

Our Approach :

We had clarity now of what all was required now to solve our problem , which was the following :-
– The Stanford NER library
– A way to get text out the PDFs
– A Solr Plugin that would allow us to process our NER request and give us the Person names from the text.

The Stanford NER library is available freely, which you can download from here :
http://nlp.stanford.edu/software/CRF-NER.shtml#Download

You can play around with this, and get to know what all it can do.

Getting text from the PDF seemed pretty straightforward as well. Another software(also Provided by Apache Foundation), TIKA does exactly the same job that we were looking for. We simply feed our PDFs to TIKA and it returns to us the Text Content from the PDFs
Now we needed the Solr plugin to do the Name Recognition, to make one we headed over to this article.

http://www.searchbox.com/developing-a-solr-plugin/

This will give you a pretty neat idea of what you need to do to create a Solr Plugin.
And they even had their own implementation of NER already built for integration with Solr.

This can be found here :
http://www.searchbox.com/named-entity-recognition-ner-in-solr/

We used their wisdom to build a similar plugin for us, that would give out the person names from the text document.

The Searchbox package used Maven, so it handled all the dependencies for us. We built the source and placed the generated JAR file in the bin folder of the required Solr-Collection, and updated the Solr config file to enable NER with Solr(This part is mentioned in the article).

So this is How we achieved NER using Solr, and solved our problem of finding Person Names in a document.

Android KitKat v/s Android Lollipop

07 Feb

By myweb | In Android

Google has posted new Android distribution numbers on its Android Developers site, revealing that KitKat is now installed on almost a third of Android devices that connect to the Google Play Store, just as it’s getting ready to roll out Android 5.0 Lollipop. Android 5.0 Lollipop has now launched officially and will be initially available […]

IMPART Framework

01 Feb

By myweb | In Uncategorized

Successive used it’s IMPART framework to deliver this project. IMPART stands for the Innovative Mock-up based Prototype Analyzed to deliver Re-engineered Technology. Under this framework we work with our clients to create innovative mock up designs and then provide a prototype of the solution which is analyzed using our Intellectual property to create the right […]


[fbcomments width="100%" count="off" num="3" countmsg="wonderful comments!"]