Though we have various NER models available in OpenNLP, but entity extraction doesn’t end here with the existing one only. We may need to find the entities based on Clinical, Biological, Sports, Banking domain etc.
So should we restrict ourselves with the models already provided? - No,
We can build our own Name Finder model. Steps required doing this are: Get the sample training dataset, build the model and test it.
What type of data should we have for training a model:
Sentences should be separated with new line character (\n). Values should be separated from <Start> and <END> tags with a space character.
You can refer a sample dataset for example. Training data should have at least 15000 sentences to get the better results.
Model can be trained via command line tool as well as Java Training API :
Command Line tool :
There are various argument which you need to pass while building the model as follows :
Now lets say, we want to build a model “en-ner-drugs.bin” for data “drugsDetails.txt” in English language.
Now we’ll see, how can we train the same model using JAVA API.
- Open a sample data stream
- Call the NameFinderME.train method
- Save the TokenNameFinderModel to a file
Here is the example.
Above code will generate the “en-ner-drugs.bin” model.
Now you are all set to use this model for finding entity like other NER models…!!!!!!!!
For more details, you can go through OpenNLP Documentation.