Quantcast
Channel: SCN : All Content - SAP HANA Cloud Platform Developer Center
Viewing all articles
Browse latest Browse all 3285

who is famous on SAP netweaver cloud community?

$
0
0

Initially, I had named this blog post as "Extract Person names from unstructured data using Apache UIMA running on Netweaver cloud" - but looking at all the interesting titles we are finding on Netweaver Cloud Developer center community blog- I decided to be cooler. :-) 

 

My previous blog post was on- how to deploy a UIMA annotator on Netweaver cloud as REST service. I have been playing around with various UIMA annotators that do some wonderful stuff extracting information from unstructured data. I started with concept mapper which finds concepts in the source text by comparing it with concept dictionary loaded in memory. It helps in identifying and enriching the concepts of your interest. It’s been widely used in medical field to analyze medical records and patient history where medical terms and its properties are well documented (National Library of Medicine).

 

I was thinking hard to find an application for concept mapper running on Netweaver cloud that could demonstrate its strength. I started with SAP netweaver cloud glossary as source of dictionary terms and was playing with some blog text to find the named entities. Idea was to classify blogs based on glossary terms. Like – find all the blogs which are related to Cloud connectivity services, blogs related to document services etc. However, I was not very happy with the dictionary based on glossary terms. Results received from Concept mapper based on this dictionary had many problems. I had to abandon this idea for a newer one- finding person names in text.

 

NameFinder service is implemented as a REST service running on Netweaver cloud. You can try it at-

https://namefinders0007950666trial.nwtrial.ondemand.com/uima-simple-server-concept/?mode=form

 

Enter some sample text with person names in it. Here some content taken from wiki.

namefinder 1.JPG

Submit the text to NameFinder annotator. You will see result with <NameAnnotation> xml tags.

 

namefinder 2.JPG

NameFinder annotator in based on OpenNLP toolkit that uses machine learning process for analyzing natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning. I will try to cover all the technical details about NameFinder annotator in subsequent blogs.

 

This rest service can be easily consumed using following SAUI5 javascript code to show the NameAnnotation in a table–

              var oModel = new sap.ui.model.xml.XMLModel();

              $.ajax({

                  url: 'http://localhost:8080/uima-simple-server-concept/',

                  type: 'POST',

                  data: 'text='+InputText+'&mode=inline',

                  dataType: "xml",

                  success: function(xml) {

                     oModel.setData(xml);

                     sap.ui.getCore().byId("cTable").setModel(oModel);

                     sap.ui.getCore().byId("cTable").bindRows("/NameAnnotation"); 

                  }

 

Next challenge was to find a good source of unstructured data that can be programmatically fed to NameFinder REST api. ‘scnReader’ project by Tom Van Doorslaer which is documented in this blog came handy. Thank you Tom; for your wonderful work here. In no time, I was able to import this SAPUI5 project into eclipse and add my own code to read RSS feed items and pass it to NameFinder REST api and show the names in a table.

 

I added couple of input fields to SCN reader to take any RSS feed link and number of items to fetch.

scnReader 0.JPG

As a sample, I used Netweaver cloud developer center RSS feed for blogs.

 

RSS link.JPG

On entering RSS feed link and submitting – it fetches recent 10 items and shows in the table.

scnReader 1.JPG

 

Now we have blog list with its content that can be sent to NameFinder REST service for analysis. On clicking ‘Get Names’ following result in shown in ‘Identified English Names’ table:

scnReader 2.JPG

On scrolling down-

names scroll1.jpg

Further scroll down-

name scroll2.jpg

names scroll3.jpg

Total 18 names are identified from recent 10 blog posts in Cloud community. With some more javascript, number of occurrences of a name can be calculated for comparison. With some efforts, these results can be persisted into database (HANA?) with some additional information like blog category(tags), blog link, results from concept mapper, date and time information, etc to come with some real world application.

 

So now you know who is famous on SAP netweaver cloud community. Some of the NameAnnotations are not person names. These miss hits are due to machine learning algorithm and name finder model.  These models can be enriched and algorithm can be retrained using OpenNLP library to improve the accuracy.


Viewing all articles
Browse latest Browse all 3285

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>