Close

Isaac McKillen-Godfried

Data and Machine Learning Engineer

Download Resume

About Me

I'm looking for opportunites in the areas of Machine Learning and Data Science. In particular, I'm interested in using machine learning and big data technologies to solve real world problems. I have created this site as my expanded portfolio where I describe my previous work in detail. If you want just the highlights please download my resume above.

Primary Areas of Research/Specialities

  • Transfer and multitask learning
  • Deployment of deep learning models to production
  • Stream processing and realtime analytics
  • Natural Language Processing


Background

I have an interesting variety of experiences from mobile development with XCode and Objective-C in my high school days, to full-stack development with Ruby on Rails and NodeJS in my first few years of college, to natural language processing/information retrieval with NLTK, Word2Vec, and ElasticSearch starting my junior year of college, to large-scale data processing with Spark and Hadoop begining in 2017, and data science with NumPy, XGBoost, Bokeh, Keras, and Tensorflow most recently. However, I have found that no matter what language or part of the technology stack I work on, my talents and interests almost invariably come back to data and analytics. Whether it is learning a new theoretical data structure, designing an interactive data visualization with D3.js, creating a streaming data pipeline with Flink and Kafka, or building a predictive model with XGBoost and neural networks, I fundamentally enjoy both learning and working with data. The computer science courses that I have taken have given me a strong background in CS fundamentals, while my Physics and Math courses have provided me with the mathematical foundations necessary for machine learning. Additionally, my many courses in the liberal arts have enabled me to communicate clearly and convincingly in both English and Spanish.



My Work

Loading the data just for you.


The majority of my projects and contributions are open-source and on GitHub. I founded and currently lead the PyData Orono Meetup group. I'm also a Kaggle competitions expert, however I have not competed in several years. You can see my data science results on Kaggle by clicking here . Finally, I have written many articles on Medium in Towards Data Science some of which are listed in the writing section .



Feel free to contact me regarding opportunities, networking, my projects/PaddleSoft, or just to discuss technology in general.

Experience

Hudson Bay Company (HBC)

Data Engineer

My role has involved a mix of data engineering and machine learning tasks. On the data engineering side I helped develop the company's unified data platform. This platform is built with Apache Airflow (running on Kubernetes), Spark (running on EMR clusters), and Hive tables stored on S3. Specifically some of my tasks included developing Airflow DAGs to run pipelines, creating custom Airflow operators to launch EMR clusters with the proper dependencies, and translating old SQL jobs to SparkSQL. On the machine learning side I refined models to forecast retail demand with Spark MLlib (later experimented with deep learning architectures in PyTorch), trained/tested models to cluster products for better product categorization with Tensorflow, and researched techniques to improve personalization (also with Tensorflow).

Assorted Projects/Contracts

Mostly Machine Learning Engineering

From July to January I worked on a variety of contracts and open-source projects. This included designing Peelout, a set of tools aimed at easing the process of deploying deep learning models to production, creating a chatbot with Spacy, Tensorflow, and Redis, and writing a series of articles for EyeOnAI, an online magazine focused on A.I.

Eastern Maine Medical Center

Data Analyst (July 2017-June 2018)

I designed interactive charts with Bokeh in Python for hospital administrators and doctors, created ETL pipelines to pull data from the hospital's decentralized data sources (such as Cerner, 3M, "side" SQL databases, and manually maintained Excel notebooks), and employed data driven approaches to improve hospital performance. Finally, I'm also working on a "modern" data architecture built on Kafka and PostgreSQL in order to automate tedious manual processes and provide realtime analytics for clients.



Data Analyst Intern (June 2016-July 2017)

I assisted the data analytics team at EMMC during the summer months (June-August) and over the winter break (December-January). I analyzed a variety of data including both patient and financial data. I used various tools such as SQL and Altova MapForce to extract, transform, and load data (ETL) and then created data visualizations.

PaddleSoft

Founder

I founded PaddleSoft in order to help paddlers plan their whitewater adventures. Over the course of the last two years I have created many paddling related services for paddlers. Some of my favorites included using D3.js, NodeJS, and Kafka to build a real time river flow map , using Neo4j and CQL queries to to recommend rivers and paddling partners to our users , and using MATLAB to create a time series neural network to predict the flow of the Kenduskeag stream. More details can be found below in the project section of this page and on the PaddleSoft blog . All blog entries are written by myself unless otherwise noted.

Eastern Maine Health Services

IT Intern

I collaborated with HR in order to develop and fine-tune HRIS systems. A few of my tasks included analyzing LAWSON reports with Excel, using SQL and Microsoft Access to automate Affordable Care Act and COBRA reporting, creating custom Excel functions and Macros with VB, and collaborating with a larger team in the implementation of manager self-service.

University of Maine

Research Assistant

I conducted research for the University of Maine Chemistry department. I worked mainly on computer modeling of tungsten-oxide molecules. Specifically, I wrote bash scripts in order to run jobs on the univeristy supercomputer. We were the theoretical part of a larger team working to convert forest biproducts to gasoline grade fuel oil. The overall goal of my specific team was to simulate chemical reactions and the formation of chemical compounds using computational chemistry software. I also attended weekly meetings and collaborated with the larger research team.

Education

Brandeis University

September 2013 - May 2017

Bachelor of Arts in Hispanic Studies

Minors in Computer Science and Near Eastern and Judaic Studies

Relevant course work: Extracurriculars

Writing and Presentations

Blog posts

Workshops/Preprints (Non-Archival)

Presentations

  • Flink Forward San Francisco 2019: ONNX meets Flink
  • PyData Orono: Data Visualization Night
  • PyData Orono: Paying Attention in PyTorch
  • Projects

    Game of Thrones chatbot

    I developed a Game of Thrones chatbot from scratch using Flask, Redis, ElasticSearch, PostgreSQL, Spacy, and Tensorflow. The bot operated through a Flask based REST-API where incoming user messages (from the Slack-API) were combined with prior cached messages in Redis and fed to a combination of rule-based NLP methods and Tensorflow models. These methods in turn constructed queries to the appropiate data sources (ElasticSearch or PostgreSQL) and synthesized a response based on the returned information. Full conversation history for the bot was stored in PostgreSQL and periodically analyzed and reintegrated into the training data to improve performance. This app ran primarily on AWS (ElasticSearch service, ECS, and EC2), while a few of the Tensorflow components ran on GCP instances. Bot integrated with the Slack-API and used OAuth for authorization.

    View Project

    Peelout

    This project aims to ease the difficulties surrounding utilzing deep learning models in a production environment. This involves three parts: creating a set of model agnostic tools to rapidly adapt models to the business use case, developing a set of scripts/extensions to existing frameworks to actually deploy the models, and designing a set of tools to monitor models and automatically adapt/continue to train the models. The focus now is on the deployment phase. Specifically, this consists of automatically packaging deep learning models into a Docker container and creating a Kubernetes based auto-scaling microservice that can integrate with other applications. There is also work to embed DL models in Flink (and other Java applications) directly with Java Embedded Python or JEP.

    View Project

    Detecting and classifying conditions in medical imaging with limited annotated data

    This research explored localizing and classifying a variety of conditions in lung X-Rays given only a small dataset through the use of transfer learning and meta-learning. I for the most part have abandoned this project to focus on my NLP research and developing Peelout.

    View Project

    FBLYZE: A Facebook Scraping and Analysis Engine

    This is a project to automate scraping of Facebook data. The end goal is create a continous scraping and analysis engine of posts from Facebook groups and pages. For the purposes of PaddleSoft we want to use it to extract information about which rivers people are paddling, records of trip, and flow information about rivers that do not have gauges. However, we are working very hard to make our solution generalizable to anyone who wishes to extract meaningful information from Facebook.

    View Project

    Kenduskeag Stream Flow Prediction

    This was a project I originally undertook in 2016 in order to predict the flow of the Kenduskeag stream in Bangor ME. This involved many components including initially collecting flow and weather data in a PostgreSQL database, training a time series neural network (NARX) to predict flow in MATLAB, and finally displaying the predictions in ChartJS. Unfortunately, as you may know, MATLAB is not easily deployable and closed source; as such I do not have a working demo at the moment. However, I'm working at recreating our model in Python for both the Kenduskeag and other streams as well. In the process, I hope to make meaningful contributions and provide valuable insights to Python frameworks like PyFlux and PyNeurGen.

    View Project

    ACA Whitewater Nationals Team Competition

    This is a Ruby on Rails application that I built on request for a friend/race coordinator of the ACA nationals. He wanted an application where racers in the compeition could form teams and earn points for their respective teams. So I built a simple ROR application where users could sign-up, browse teams, and join teams. The application used Devise for the authentication system and a PostgreSQL database for storing team data. Gradually, I added more features such as Facebook login and ways for teams to filter/search for prespective members. It is still online at the below link (though some of the links are now stale). Code is also availible on my GitHub.

    View Project

    U.S. river flow map

    A real-time map of river flows across America that renders a graph of the river's flow information when selected. It also, queries our whitewater search engine for results on the selected river. The project uses ElasticSearch (for search results), NodeJS, D3.js, and SocketIO.

    View Project

    Primary Skills



    Other Skills

    Get in Touch