My Portfolio

About Me

I'm open to opportunites in the areas of Machine Learning and Data Science. In particular, I'm interested in using machine learning (especially deep learning) to solve real world problems. I have created this site as my expanded portfolio where I describe my previous work in detail. If you want just the highlights please download my resume above.

DO NOT contact me about ANY data engineering positions! I'm NOT interested in roles that focus on building pipelines or supporting data science teams. The same goes for low level analytics roles that primarily just revolve around writing SQL queries or dashboards.

Primary Areas of Research/Specialties

Transfer and multitask learning
Natural Language Processing
Time Series Forecasting
Deployment of deep learning models to production

Background

TLDR: Variety of experiences in the software engineering and data space. However, now focused solely on Data Science and Machine Learning.

I have an interesting variety of experiences from mobile development with XCode and Objective-C in my high school days, to full-stack development with Ruby on Rails and NodeJS in my first few years of college, to natural language processing/information retrieval with NLTK, Word2Vec, and ElasticSearch starting my junior year of college, to large-scale data processing with Spark and Hadoop begining in 2017, and data science with NumPy, XGBoost, Bokeh, Keras, and Tensorflow most recently. However, I have found that no matter what language or part of the technology stack I work on, my talents and interests almost invariably come back to data and analytics. Whether it is learning a new theoretical data structure, designing an interactive data visualization with D3.js, creating a streaming data pipeline with Flink and Kafka, or building a predictive model with XGBoost and neural networks, I fundamentally enjoy both learning and working with data. The computer science courses that I have taken have given me a strong background in CS fundamentals, while my Physics and Math courses have provided me with the mathematical foundations necessary for machine learning. Additionally, my many courses in the liberal arts have enabled me to communicate clearly and convincingly in both English and Spanish.

My Work

Loading the data just for you.

The majority of my projects and contributions are open-source and on GitHub. I founded and currently lead the PyData Orono Meetup group. I'm also a Kaggle competitions expert, however I have not competed in several years. You can see my data science results on Kaggle by clicking here . Finally, I have written many articles on Medium in Towards Data Science some of which are listed in the writing section .

Feel free to contact me regarding opportunities, networking, my projects/PaddleSoft, or just to discuss technology in general.

Experience

CoronaWhy

Lead Machine Learning Researcher

When Coronavirus struck I started volunteering at CoronaWhy. At CoronaWhy I've worked on several projects including extracting adverse drug events , clustering COVID-19 literature , and forecasting COVID-19 spread . I'm currently leading a team of data scientists, data engineers, and epidemiologists to create models to forecast COVID-19 spread and develop casual models to gauge policy impacts. We are also spending significant time to continue to develop Flow Forecast, an open-source deep learning for time series forecasting framework built in PyTorch.

Monster.com

Data and Machine Learning Engineer

I developed and fine-tuned PyTorch models (specifically variations of the transformer) to add new triplets to the company knowledge graph in order to enhance downstream applications such as search, job recommendation, and autocompletion. I also help educate and set standards regarding writing PyTorch models, Python code quality, and data science best practices. Additionally, part of my time went to building out the company data lake on GCP with tools like Terraform, Cloud Functions, Dataflow, and BigQuery

Hudson Bay Company (Contract)

Data Engineer

My role involved a mix of data engineering and machine learning tasks. On the data engineering side I helped develop the company's unified data platform. This platform is built with Apache Airflow (running on Kubernetes), Spark (running on EMR clusters), and Hive tables stored on S3. Specifically some of my tasks included developing Airflow DAGs to run pipelines, creating custom Airflow operators to launch EMR clusters with the proper dependencies, and translating old SQL jobs to SparkSQL. On the machine learning side I refined models to forecast retail demand with Spark MLlib (later experimented with deep learning architectures in PyTorch), trained/tested models to cluster products for better product categorization with Tensorflow, and researched techniques to improve personalization (also with Tensorflow).

Assorted Projects/Contracts

Mostly Machine Learning Engineering

From July to January I worked on a variety of contracts and open-source projects. This included designing Peelout, a set of tools aimed at easing the process of deploying deep learning models to production, creating a chatbot with Spacy, Tensorflow, and Redis, and writing a series of articles for EyeOnAI, an online magazine focused on A.I.

Eastern Maine Medical Center

Data Analyst (July 2017-June 2018)

I designed interactive charts with Bokeh in Python for hospital administrators and doctors, created ETL pipelines to pull data from the hospital's decentralized data sources (such as Cerner, 3M, "side" SQL databases, and manually maintained Excel notebooks), and employed data driven approaches to improve hospital performance. Finally, I'm also working on a "modern" data architecture built on Kafka and PostgreSQL in order to automate tedious manual processes and provide realtime analytics for clients.

Data Analyst Intern (June 2016-July 2017)

I assisted the data analytics team at EMMC during the summer months (June-August) and over the winter break (December-January). I analyzed a variety of data including both patient and financial data. I used various tools such as SQL and Altova MapForce to extract, transform, and load data (ETL) and then created data visualizations.

PaddleSoft

Founder

I founded PaddleSoft in order to help paddlers plan their whitewater adventures. Over the course of the last two years I have created many paddling related services for paddlers. Some of my favorites included using D3.js, NodeJS, and Kafka to build a real time river flow map , using Neo4j and CQL queries to to recommend rivers and paddling partners to our users , and using MATLAB to create a time series neural network to predict the flow of the Kenduskeag stream. More details can be found below in the project section of this page and on the PaddleSoft blog . All blog entries are written by myself unless otherwise noted.

Eastern Maine Health Services

IT Intern

I collaborated with HR in order to develop and fine-tune HRIS systems. A few of my tasks included analyzing LAWSON reports with Excel, using SQL and Microsoft Access to automate Affordable Care Act and COBRA reporting, creating custom Excel functions and Macros with VB, and collaborating with a larger team in the implementation of manager self-service.

University of Maine

Research Assistant

I conducted research for the University of Maine Chemistry department. I worked mainly on computer modeling of tungsten-oxide molecules. Specifically, I wrote bash scripts in order to run jobs on the univeristy supercomputer. We were the theoretical part of a larger team working to convert forest biproducts to gasoline grade fuel oil. The overall goal of my specific team was to simulate chemical reactions and the formation of chemical compounds using computational chemistry software. I also attended weekly meetings and collaborated with the larger research team.

Writing and Presentations

Blog posts

Workshops/Preprints (Non-Archival)

CNN based techniques for diagnosing multiple conditions in Chest X-Rays
presented at Medical Imaging Meets NIPS. Poster available here.

Presentations

Boston Data Science Meetup: Reproducibility and extendbility best practices

Flink Forward San Francisco 2019: ONNX meets Flink

PyData Orono: Data Visualization Night

PyData Orono: Paying Attention in PyTorch

Projects

Flow Forecast Repository

Initially, a repository to forecast flash floods and stream flows, flow evolved to serve as a general time series forecasting libray. We are currently using flow forecast to forecast COVID cases around the U.S. as well as study transfer learning training on flow, wind, and solar data.

View Project

Game of Thrones chatbot

I developed a Game of Thrones chatbot from scratch using Flask, Redis, ElasticSearch, PostgreSQL, Spacy, and Tensorflow. The bot operated through a Flask based REST-API where incoming user messages (from the Slack-API) were combined with prior cached messages in Redis and fed to a combination of rule-based NLP methods and Tensorflow models. These methods in turn constructed queries to the appropiate data sources (ElasticSearch or PostgreSQL) and synthesized a response based on the returned information. Full conversation history for the bot was stored in PostgreSQL and periodically analyzed and reintegrated into the training data to improve performance. This app ran primarily on AWS (ElasticSearch service, ECS, and EC2), while a few of the Tensorflow components ran on GCP instances. Bot integrated with the Slack-API and used OAuth for authorization.

View Project

Peelout

This project aims to ease the difficulties surrounding utilzing deep learning models in a production environment. This involves three parts: creating a set of model agnostic tools to rapidly adapt models to the business use case, developing a set of scripts/extensions to existing frameworks to actually deploy the models, and designing a set of tools to monitor models and automatically adapt/continue to train the models. The focus now is on the deployment phase. Specifically, this consists of automatically packaging deep learning models into a Docker container and creating a Kubernetes based auto-scaling microservice that can integrate with other applications. There is also work to embed DL models in Flink (and other Java applications) directly with Java Embedded Python or JEP.

View Project

Detecting and classifying conditions in medical imaging with limited annotated data

This research explored localizing and classifying a variety of conditions in lung X-Rays given only a small dataset through the use of transfer learning and meta-learning. I for the most part have abandoned this project to focus on my NLP research and developing Peelout.

View Project

Isaac McKillen-Godfried

Deep Learning Researcher

About Me

Experience

CoronaWhy

Lead Machine Learning Researcher

Monster.com

Data and Machine Learning Engineer

Hudson Bay Company (Contract)

Data Engineer

Assorted Projects/Contracts

Mostly Machine Learning Engineering

Eastern Maine Medical Center

Data Analyst (July 2017-June 2018)

Data Analyst Intern (June 2016-July 2017)

PaddleSoft

Founder

Eastern Maine Health Services

IT Intern

University of Maine

Research Assistant

Education

Brandeis University

Bachelor of Arts in Hispanic Studies

Minors in Computer Science and Near Eastern and Judaic Studies

Writing and Presentations

Blog posts

Workshops/Preprints (Non-Archival)

Presentations

Projects

Flow Forecast Repository

Game of Thrones chatbot

Peelout

Detecting and classifying conditions in medical imaging with limited annotated data

FBLYZE: A Facebook Scraping and Analysis Engine

Kenduskeag Stream Flow Prediction

ACA Whitewater Nationals Team Competition

U.S. river flow map

Primary Skills

Other Skills

Get in Touch