This summer I learned a new card game called "To Hell and Back". Similar to "Oh Hell" and "Rats!", "To Hell and Back" is a trick-taking card game where you bid the number of tricks you intend to take, and you must take exactly that number of tricks per hand in order to win. Bid correctly, and you earn your bid and 10 extra points. Lose your bid and you get zilch. Unlike its other variations, "To Hell and Back" starts with all players being dealt one card, and each hand the number of cards dealt per hand increases until we hit our maximum for all players. In the case of one-card hands, the differences between success and failure are luck of the draw and careful bidding.
I've become fascinated with this one-card case. To better understand the odds of winning this specific scenario, I've constructed a hand simulator, wherein a group of players are dealt one card, and the winner is determined. Using this simulation, we can understand just how likely each card and position is to win a hand.Read more...
The topic of the September joint Python Buffalo/Data Science meet-up is data scraping. To finish up our conversation of how we can use python to scrape data from public sources, I presented a short slide deck on the ethics of web scraping. The general thesis is that it very much depends on what you are doing and how you do it, and that in the end we should all strive to be good members of the data community. Presentation.
Clustering has become an everyday process for grouping together observations based on similar factors. This is particularly true when working with spatial data. For some of my ongoing research into applying spatial Statistics to fluorescence microscopy, I've been applying DBSCAN to binary images of fluorescence-tagged chromosomes to localize chromosomes. The Scikit Learn Pnython library provides a blisteringly fast DBSCAN implementation that can cluster 78 million observations in 6 seconds.
As I continued working with the algorithm, I started to think that it would be interesting to see the process unfold step by step for a set of data. To that end, I've created an annotated step-by-step guide to how DCSCAN clusters data.Read more...
Starting in 2011, Kodak brought to market the Kodak Pulse line of digital photo frames. In addition to SD card and USB support, this line of photo frames had an email address which could receive image attachments, store the images on Kodak's servers, and display the images hassle free on the digital photo frame. While this feature is a boon for people who like to receive photos from friends and family with minimal latency, there is one very important feature missing -- the ability to download these images in bulk.
While it was possible to manually download each image using a web browser, this is not an acceptable means for backing up images, particularly if your photo album contains thousands of photographs. Instead of trying to download all of these images manually, I decided that a programmatic solution must exist. To that end, I created a bulk image crawling script using the python library scrapy.Read more...
A long standing tradition in scientific research is to keep detailed notes on everything as it happens. This studious attention to detail not only makes analysis and paper writing much easier, but also serves as a record of exactly how an experiment was performed should it need to be repeated in the future. By looking though a lab notebook, an experiment can be repeated exactly, and results can be verified.
Although I have since moved away from the laboratory, I still keep detailed
records of my work in a series of markdown files detailing the steps taken
as I perform data analyses and develop software. Over the last few months,
however, I've found that the volume of my notes has grown to large to simply
grep for keywords.
To make it easier for me to find project- or task-specific development notes, I developed a Rust-based CLI tool called Rememberall, which uses term frequency and Bayesian inference to retrieve documents relevant to a query of keywords.Read more...