Friday, December 21, 2018

Selection Sort

Algorithm:
  • Find smallest element with linear scan
  • Move it to the front
  • Continue until all elements are in place
Time Complexity:
Space Complexity:
  • O(1) - usually no extra space is required 
In-place: yes
Stable: yes

Python Implementation:
def selection_sort(arr):
    n = len(arr)
    for i range(n):
        min_index = i
        for j in range(i+1, n):
            if a[min_index] > a[j]:
                 min_index = j
    a[i], a[min_index] = a[min_index], a[i]

Bubble Sort

Algorithm:
  • Start at the beginning
  • Swap first two element if first is greater than second
  • Continue to next pair
  • Repeat until sorted
Time Complexity:
  • Best: O(n) - when the list is already sorted
  • Worst: O(n2) - when required to reverse the list
Space Complexity:
  • O(1) - usually no extra space is required
In-place: yes
Stable: yes

Python Implementation

def bubble_sort(array):
    n = len(array)
    for j in range(n):
        for i in range(n-1-j):
            if a[i] > a[i+1]:
                a[i+1], a[i] = a[i], a[i+1]

Thursday, December 20, 2018

Stable Sorting Algorithms

A sorting algorithm is called stable sorting algorithms if the order of the original values with same precedence is retained over in the final output.

e.g. a list of names [alex, bob, alice, judy, tom] has the output [alex, alice ,bob, judy, tom]. Here the precedence of alex over alice is retained in the output.

Sorting Algorithm that are stable:
Not stable sorting algorithms:
  • Heap Sort
  • Quick Sort

In Place Algorithm

In place algorithms:
  • focus on space efficiency
  • uses no auxiliary data structures and uses very small extra space
  • have O(1) space complexity (strict definition) to O(log n) space complexity (broad definition)
Side Effects:
  • usually overwrites the original data to create the modified output
  • may need more time in some cases
Examples:
Reversing a sorted list in-place.
a = [1,2,3,4,5]
n = len(a)
temp = 0
for i in range(n/2):
     temp = a[n-1-i]
    a[n-1-i] = a[i]
    a[i] = temp

In-place sorting algorithms:

Sunday, October 21, 2018

Relation Triples Extraction with Stanford OpenIE


Setting up Stanford-OpenIE for Relation Triples:
For extracting relation triples from a sentence, we can use the unofficial cross-platform Python wrapper for the state-of-art information extraction library from Stanford University.

More details on: https://github.com/philipperemy/Stanford-OpenIE-Python.git

First, we can clone the project.

git clone https://github.com/philipperemy/Stanford-OpenIE-Python.git

Then, I had to remove "-" from folder name to import the folder as a module.

mv
Stanford-OpenIE-Python/ StanfordOpenIEPython/

cd StanfordOpenIEPython/



To use the wrapper as module, we create __init__.py file.

touch __init__.py 

Example code:

from StanfordOpenIEPython.main import stanford_ie
import argparse
import os

FILE_PATH = 'StanfordOpenIEPython/sentences.txt'

def triples_extractor(sentence):
    try:
        os.remove(FILE_PATH)
    except:
        pass
    with open(FILE_PATH, 'a') as text:
        text.write(sentence)
    triples_raw = stanford_ie('sentences.txt', verbose=True)
    triples = [[trip.lstrip() for trip in triple] for triple in triples_raw]
    return triples


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-s", "--sentence", default='Bill Gates is married to Melinda Gates.')
    args = parser.parse_args()
    sentence = args.sentence
    print triples_extractor(sentence)



Output:
[['Bill Gates', 'is', 'married'], ['Bill Gates', 'is married to', 'Melinda Gates']]
 

More on: https://github.com/apogre/python_ner/blob/master/triples.py

Monday, October 8, 2018

Text Analysis (Named Entity Recognition) using Stanford NLP and Python NLTK

As a beginner, the first step towards Natural Language Processing (NLP) is to work on identifying Named Entities in a given natural language text. I came across this task in one of my projects and I will be writing about the steps I took to come up with my solution using references available on the internet.

I used Stanford NER tool for Named Entity Recognition.

Setting Up Stanford NER Tool with python nltk:

First, download the NER Package: (For Mac, please follow https://brew.sh/ before proceeding)

wget https://nlp.stanford.edu/software/stanford-ner-2018-02-27.zip

Unzip Them:

unzip stanford-ner-2018-02-27.zip

rm stanford-ner-2018-02-27.zip

Place these packages in your folder. I will put it in home folder for convenience.

Next step is to set up the environment paths.

export STANFORDTOOLSDIR=$HOME

export CLASSPATH=$STANFORDTOOLSDIR/stanford-ner-
2018-02-27/stanford-ner.jar

export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-ner-2
018-02-27/classifiers

We need to to have Java installed to run the NER package.
 
Install Java JDK: 


Let's get into writing the actual text analysis code.

Before that, lets install python nltk package

pip install nltk

 >>> import nltk
  >>> nltk.download('punkt')

Identifying Named Entities:

#importing packages

from nltk.tag import StanfordNERTagger
import argparse
from nltk import word_tokenize
from re import sub
from itertools import groupby

#Stanford NER comes with 3 (Location, Person, Organization), 4 (Location, Person, Organization, Misc) and 7 (Location, Person, Organization, Money, Percent, Date, Time)-class classifiers.

# st_ner = StanfordNERTagger('english.muc.7class.distsim.crf.ser.gz')
st_ner = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')

#tags each word to one of the NER labels
def sentence_tagger(sentence_list):
    named_entities = st_ner.tag_sents(sentence_list)
    return named_entities
#merge adjacent words with same tags
def get_nodes(tagged_words):
    ent = []
    for tag, chunk in groupby(tagged_words, lambda x:x[1]):
        if tag != "O":
            tuple1 = (sub(r'\s+([?.!,"])', r'\1', " ".join(w for w, t in chunk)), tag)
            ent.append(tuple1)
    return ent


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-s", "--sentence", default='Bill Gates is married to Melinda Gates.')
    args = parser.parse_args()
    sentence_lis = [args.sentence]
    sentence_list = [word_tokenize(sent) for sent in sentence_lis]
    print sentence_list
    named_tags = sentence_tagger(sentence_list)
    print named_tags
    for ne in named_tags:
        named_entities = get_nodes(ne)
        print named_entities

Full Project on Github: https://github.com/apogre/python_ner
 

Tuesday, June 26, 2018

Training and Test Data for Knowledge Graph Completion


Positive Examples:

Positive examples for predicate P are any entity pairs (x,y) in DBpedia such that (x,P,y) is in it. For example, the positive example for the spouse relation are the people who are married to each other. And, the type of entities for both x and y are Person type.

The example SPARQL query to generate the positive examples for spouse relation is given below:

PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?subject ?object
FROM <http://dbpedia.org>
WHERE { ?object rdf:type dbpedia-owl:Person.
?subject rdf:type dbpedia-owl:Person. 
?subject ?targetRelation ?object.  
FILTER (?targetRelation = dbpedia-owl:spouse)
} ORDER BY RAND() LIMIT 10000