I'm looking for opportunites in the areas of Machine Learning and Data Science. In particular, I'm interested in using machine learning and big data technologies to solve real world problems. I have created this site as my expanded portfolio where I describe my previous work in detail. If you want just the highlights please download my resume above.
Primary Areas of Research/Specialities
Feel free to contact me regarding opportunities, networking, my projects/PaddleSoft, or just to discuss technology in general.
My role has involved a mix of data engineering and machine learning tasks. On the data engineering side I helped develop the company's unified data platform. This platform is built with Apache Airflow (running on Kubernetes), Spark (running on EMR clusters), and Hive tables stored on S3. Specifically some of my tasks included developing Airflow DAGs to run pipelines, creating custom Airflow operators to launch EMR clusters with the proper dependencies, and translating old SQL jobs to SparkSQL. On the machine learning side I refined models to forecast retail demand with Spark MLlib (later experimented with deep learning architectures in PyTorch), trained/tested models to cluster products for better product categorization with Tensorflow, and researched techniques to improve personalization (also with Tensorflow).
From July to January I worked on a variety of contracts and open-source projects. This included designing Peelout, a set of tools aimed at easing the process of deploying deep learning models to production, creating a chatbot with Spacy, Tensorflow, and Redis, and writing a series of articles for EyeOnAI, an online magazine focused on A.I.
I designed interactive charts with Bokeh in Python for hospital administrators and doctors, created ETL pipelines to pull data from the hospital's decentralized data sources (such as Cerner, 3M, "side" SQL databases, and manually maintained Excel notebooks), and employed data driven approaches to improve hospital performance. Finally, I'm also working on a "modern" data architecture built on Kafka and PostgreSQL in order to automate tedious manual processes and provide realtime analytics for clients.
I assisted the data analytics team at EMMC during the summer months (June-August) and over the winter break (December-January). I analyzed a variety of data including both patient and financial data. I used various tools such as SQL and Altova MapForce to extract, transform, and load data (ETL) and then created data visualizations.
I founded PaddleSoft in order to help paddlers plan their whitewater adventures. Over the course of the last two years I have created many paddling related services for paddlers. Some of my favorites included using D3.js, NodeJS, and Kafka to build a real time river flow map , using Neo4j and CQL queries to to recommend rivers and paddling partners to our users , and using MATLAB to create a time series neural network to predict the flow of the Kenduskeag stream. More details can be found below in the project section of this page and on the PaddleSoft blog . All blog entries are written by myself unless otherwise noted.
I collaborated with HR in order to develop and fine-tune HRIS systems. A few of my tasks included analyzing LAWSON reports with Excel, using SQL and Microsoft Access to automate Affordable Care Act and COBRA reporting, creating custom Excel functions and Macros with VB, and collaborating with a larger team in the implementation of manager self-service.
I conducted research for the University of Maine Chemistry department. I worked mainly on computer modeling of tungsten-oxide molecules. Specifically, I wrote bash scripts in order to run jobs on the univeristy supercomputer. We were the theoretical part of a larger team working to convert forest biproducts to gasoline grade fuel oil. The overall goal of my specific team was to simulate chemical reactions and the formation of chemical compounds using computational chemistry software. I also attended weekly meetings and collaborated with the larger research team.
I developed a Game of Thrones chatbot from scratch using Flask, Redis, ElasticSearch, PostgreSQL, Spacy, and Tensorflow. The bot operated through a Flask based REST-API where incoming user messages (from the Slack-API) were combined with prior cached messages in Redis and fed to a combination of rule-based NLP methods and Tensorflow models. These in turn constructed queries to the appropiate data sources (ElasticSearch or PostgreSQL) and then synthesized a response based on the returned information. Full conversation history for the bot was stored in PostgreSQL and periodically analyzed and reintegrated into the training data to improve performance. This app ran primarily on AWS (ElasticSearch service, ECS, and EC2), while a few of the Tensorflow components ran on GCP instances. Bot fully integrated with the Slack-API including providing custom formatting for messages and utilizing OAuth..View Project
This project aims to ease the difficulties surrounding utilzing deep learning models in a production environment. This involves three parts: creating a set of model agnostic tools to rapidly adapt models to the business use case, developing a set of scripts/extensions to existing frameworks to actually deploy the models, and designing a set of tools to monitor models and automatically adapt/continue to train the models. The focus now is on the deployment phase. Specifically, this consists of automatically packaging deep learning models into a Docker container and creating a Kubernetes based auto-scaling microservice that can integrate with other applications. There is also work to embed DL models in Flink (and other Java applications) directly with Java Embedded Python or JEP.View Project
This research explored localizing and classifying a variety of conditions in lung X-Rays given only a small dataset through the use of transfer learning and meta-learning. I for the most part have abandoned this project to focus on my NLP research and developing Peelout.View Project
This is a project to automate scraping of Facebook data. The end goal is create a continous scraping and analysis engine of posts from Facebook groups and pages. For the purposes of PaddleSoft we want to use it to extract information about which rivers people are paddling, records of trip, and flow information about rivers that do not have gauges. However, we are working very hard to make our solution generalizable to anyone who wishes to extract meaningful information from Facebook.View Project
This was a project I originally undertook in 2016 in order to predict the flow of the Kenduskeag stream in Bangor ME. This involved many components including initially collecting flow and weather data in a PostgreSQL database, training a time series neural network (NARX) to predict flow in MATLAB, and finally displaying the predictions in ChartJS. Unfortunately, as you may know, MATLAB is not easily deployable and closed source; as such I do not have a working demo at the moment. However, I'm working at recreating our model in Python for both the Kenduskeag and other streams as well. In the process, I hope to make meaningful contributions and provide valuable insights to Python frameworks like PyFlux and PyNeurGen.View Project
This is a Ruby on Rails application that I built on request for a friend/race coordinator of the ACA nationals. He wanted an application where racers in the compeition could form teams and earn points for their respective teams. So I built a simple ROR application where users could sign-up, browse teams, and join teams. The application used Devise for the authentication system and a PostgreSQL database for storing team data. Gradually, I added more features such as Facebook login and ways for teams to filter/search for prespective members. It is still online at the below link (though some of the links are now stale). Code is also availible on my GitHub.View Project
A real-time map of river flows across America that renders a graph of the river's flow information when selected. It also, queries our whitewater search engine for results on the selected river. The project uses ElasticSearch (for search results), NodeJS, D3.js, and SocketIO.View Project