Large-scale Entity Extraction and Probabilistic Record Linkage

Large-scale Entity Extraction and Probabilistic Record Linkage

Tuesday, August 19, 2014

Large-scale entity extraction, disambiguation and linkage in Big Data can challenge the traditional methodologies developed over the last three decades. Entity linkage, in particular, is cornerstone for a wide spectrum of applications, such as Master Data Management, Data Warehousing, Social Graph Analytics, Fraud Detection and Identity Management. Traditional rules based heuristic methods usually don’t scale properly, are language specific and require significant maintenance over time.

We will introduce the audience to the use of probabilistic record linkage, also known as specificity based linkage, on Big Data, to perform language independent large-scale entity extraction, resolution and linkage across diverse sources. We will also present a live demonstration reviewing the different steps required during the data integration process (ingestion, profiling, parsing, cleansing, standardization and normalization), and show the basic concepts behind probabilistic record linkage on a real-world application.

About the Author

Dr. Flavio Villanustre is the Vice President of Infrastructure and Products for HPCC Systems, the open source Big Data processing platform for LexisNexis. In this position, Flavio is responsible for Information and Physical Security, overall infrastructure strategy and new product development. Prior to 2001, Dr. Villanustre served in different companies in a variety of roles in infrastructure, information security and information technology. In addition, Dr. Villanustre has been involved with the Opensource community for over 15 years through multiple initiatives. Some of these include founding the first Linux User Group in Buenos Aires (BALUG) in 1994, releasing several pieces of software under different Opensource licenses, and evangelizing Opensource to different audiences through conferences, training and education. Prior to his technology career, Dr. Villanustre was a neurosurgeon.
Posted in Meetings
AJUG Meetup

Building and Deploying 12 Factor Apps in Scala and Java

April 18, 2017

The twelve-factor app is a modern methodology for building software-as-a-service apps:

• Use declarative formats for setup automation, to minimise time and cost for new developers joining the project.

• Have a clean contract with the underlying operating system, offering maximum portability between execution environments.

• Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration.

• Minimise divergence between development and production, enabling continuous deployment for maximum agility.

• And can scale up without significant changes to tooling, architecture, or development practices.

We will build a RESTful web service in Java and deploy the app to CloudFoundry. We will go over how to build a cloud manifest, how to keep our database credentials and application configuration outside of our code by using user-provided services and go over what it takes to build a 12 Factor application in the cloud. This presentation will be heavy on code and light on slides!

Location:


Roam Dunwoody

1155 Mount Vernon Highway NE
Atlanta, GA 30338 (map)

AJUG Tweets

Follow @atlantajug on twitter.

Recent Jobs