• # To Hell And Back

06 Jan 2020

This summer I learned a new card game called "To Hell and Back". Similar to "Oh Hell" and "Rats!", "To Hell and Back" is a trick-taking card game where you bid the number of tricks you intend to take, and you must take exactly that number of tricks per hand in order to win. Bid correctly, and you earn your bid and 10 extra points. Lose your bid and you get zilch. Unlike its other variations, "To Hell and Back" starts with all players being dealt one card, and each hand the number of cards dealt per hand increases until we hit our maximum for all players. In the case of one-card hands, the differences between success and failure are luck of the draw and careful bidding.

Figure 1: Total non-sequitur, but I learned how to shuffle cards while playing "To Hell and Back". I've only become decent at shuffling cards this last year after a considerable amount of practice. Now I can bridge in addition to a ruffle shuffle!

I've become fascinated with this one-card case. To better understand the odds of winning this specific scenario, I've constructed a hand simulator, wherein a group of players are dealt one card, and the winner is determined. Using this simulation, we can understand just how likely each card and position is to win a hand.

• # Python Buffalo: Web Scraping for Good

20 Sep 2017

The topic of the September joint Python Buffalo/Data Science meet-up is data scraping. To finish up our conversation of how we can use python to scrape data from public sources, I presented a short slide deck on the ethics of web scraping. The general thesis is that it very much depends on what you are doing and how you do it, and that in the end we should all strive to be good members of the data community. Presentation.

• # Visual Guide to DBSCAN Clustering

12 Mar 2017

Clustering has become an everyday process for grouping together observations based on similar factors. This is particularly true when working with spatial data. For some of my ongoing research into applying spatial Statistics to fluorescence microscopy, I've been applying DBSCAN to binary images of fluorescence-tagged chromosomes to localize chromosomes. The Scikit Learn Pnython library provides a blisteringly fast DBSCAN implementation that can cluster 78 million observations in 6 seconds.

Figure 1: Real time DBSCAN clustering of two sets of normally distributed points in a field of noise. A JS implementation of DBSCAN classified sets of two-dimensional coordinates as being either noise or one of two (or more) clusters. As a general warning, the data used for this example are randomly generated on page load, so it's possible to identify more than two clusters in this dataset due to the non-deterministic nature of both the data and DBSCAN.

As I continued working with the algorithm, I started to think that it would be interesting to see the process unfold step by step for a set of data. To that end, I've created an annotated step-by-step guide to how DCSCAN clusters data.

• # The Great Photo Escape: Freeing Images from Kodak's Digital Prison

25 Feb 2017

Starting in 2011, Kodak brought to market the Kodak Pulse line of digital photo frames. In addition to SD card and USB support, this line of photo frames had an email address which could receive image attachments, store the images on Kodak's servers, and display the images hassle free on the digital photo frame. While this feature is a boon for people who like to receive photos from friends and family with minimal latency, there is one very important feature missing -- the ability to download these images in bulk.

Figure 1: Steve McQueen as Captain Virgil Hilts in the 1963 film _The Great Escape_. Captain Virgil Hilts is one of many prisoners of war imprisoned in a high security POW camp during World War II. While I acknowledge that using this allusion is dangerously close to invoking Godwin's Law, the movie is a masterpiece that deserves the occasional mention.

While it was possible to manually download each image using a web browser, this is not an acceptable means for backing up images, particularly if your photo album contains thousands of photographs. Instead of trying to download all of these images manually, I decided that a programmatic solution must exist. To that end, I created a bulk image crawling script using the python library scrapy.

• # Rememberall: CLI Document Retrival using Bayesian Inference

17 Sep 2016

A long standing tradition in scientific research is to keep detailed notes on everything as it happens. This studious attention to detail not only makes analysis and paper writing much easier, but also serves as a record of exactly how an experiment was performed should it need to be repeated in the future. By looking though a lab notebook, an experiment can be repeated exactly, and results can be verified.

Figure 1: A laboratory notebook used to record experiment setup, observations, ideas, data, and analysis results. Laboratory notebooks are permanent records of the events that transpired during an experiment, an experimenters thoughts and observations during an experiment, and the experimental results. These records are an invaluable resource when communicating research, and are often a legally binding record of research that was conducted.

Although I have since moved away from the laboratory, I still keep detailed records of my work in a series of markdown files detailing the steps taken as I perform data analyses and develop software. Over the last few months, however, I've found that the volume of my notes has grown to large to simply grep for keywords.

To make it easier for me to find project- or task-specific development notes, I developed a Rust-based CLI tool called Rememberall, which uses term frequency and Bayesian inference to retrieve documents relevant to a query of keywords.