Conference Training

Courses

Solr Unleashed, April 22-23
Introduction to Applied Natural Language Processing, April 26

Solr Unleashed

A Hands-On Workshop for Building Killer Search Apps

Is your Solr installation bullet-proof? Will it really scale the way you think it will? Is your relevancy not cutting it? Let the Solr experts show you the right way to implement all the platform capabilities you need to be using, but probably aren’t. You will walk away confident that your Solr installation is implemented in the best possible way—rock solid and scalable. This course is taught using Solr 4.x.

Who Should Attend?

This course is intended for Developers. System Administrators are welcome to attend, but it is primarily designed for people who have experience developing web applications in Java, PHP, Ruby or similar languages.

Course Overview

Having consulted with clients on Lucene and Solr for the better part of a decade, we’ve seen the same mistakes made over and over again: applications built on shaky foundations, stretched to the breaking point. In this two day class, learn from the experts about how to do it right and make sure your apps are rock solid, scalable, and produce relevant results.

Course Outline

The Fundamentals

About Solr
Installing and running Solr
Adding content to Solr
Reading a Solr XML response
Changing parameters in the URL
Using the browse interface

Searching

Sorting results
Query parsers
More queries
Hardwiring request parameters
Adding fields to default search
Faceting
Result grouping

Indexing

Adding your own content to Solr
Deleting data from Solr
Building a bookstore search
Adding book data
Exploring the book data
Dedupe updateprocessor

Updating your schema

Adding fields to the schema
Analyzing text

Relevance

Field weighting
Phrase queries
Function queries
Fuzzier search
Sounds-like

Extended features

More-like-this
Geospatial
Spell checking
Suggestions
Highlighting
Pseudo-fields
Pseudo-joins
Multilanguage

Multicore

Adding more kinds of data

SolrCloud

Introduction
How SolrCloud works
Commit strategies
ZooKeeper
Managing Solr config files

Learning Objectives

This class is all about best practices. The end goal is for students to walk away confident that their Solr installation is implemented in the best possible way.

Prerequisites

This is a technical class for technical people. Experience with Solr is not required, but you should at minimum be comfortable with a command line (console, shell) to execute basic commands.

Introduction to Applied Natural Language Processing (NLP)

The automated processing of text data is now being successfully applied to many diverse types of mission-critical tasks in industries as varied as medicine, finance, law, advertising, engineering, and many others. The tutorial will cover the best-practices in many of them from the perspective of proven applications, methods, practices, tools and resources.

Course Overview

Text Preprocessing such as tokenization, lemmatisation, and end-of-sentence detection.
Shallow Syntactic and Semantic Analysis such as semantic role labeling, and named entity recognition,
Text Classification & Clustering such as spam detection and topic modeling.
Information Extraction such as relation extraction in open and closed-domains.
Word Sense Disambiguation such as linking to an ontology.
Word Relatedness Functions such as from continuous word embeddings.
Text Summarization

After attending this tutorial, participants will be able to build their own NLP systems for each of these topics by themselves and be able to achieve good baseline results in a short time.

Bio

Gabor Melli is the Chief Scientist at VigLink.com where he leads their initiatives to automate mission-critical semantic-rich processes. This work largely involves the training of predictive models for classification, sequence labeling, and estimation for tasks such as named entity recognition and disambiguation in user generated text using techniques and tools such as: CRFs, SVMs, HMMs, Logistic, LDA, NLTK, Python, R, Scala, Java; Hive, Hadoop, Cassandra, RedShift and AWS EC2/S3/EMR. He has led and delivered large-scale data-driven initiatives at organizations ranging from Microsoft, AT&T, T-Mobile, ICBC, Washington Mutual, and Wal*Mart to start-ups such as Datasage, Meals.com, PredictionWorks and now at VigLink.

Gabor holds a PhD in Computing Science from Simon Fraser University in the topic of document to ontology interlinking. He has been active in the data science community for over fifteen years and is the recipient ACM SIGKDD's Service Award in 2013. His current research interest include iterative semantic semi-supervised text analysis and automated business process optimization.

Citations