2020 was an odd one. There was jubilation, and there was grief. There was frustration, and there was pride. While I have always been one to reflect, 2020 is the first year that I've collected a full year of mood tracking data. In this case, the data can paint a very accurate portrait of the rollercoaster ride that was 2020.
Figure 1: Daily average mood heatmap for 2020. Mood data was collected at multiple times per day on a 1 (Awful) to 5 (Rad) scale and aggregated at a daily grain. Significant days -- highlighted using red borders -- indicate the day my daughter was born (2/6/2020) and the day I was diagnosed with a bone tumor (10/13/2020).
Read more...
This summer I learned a new card game called "To Hell and Back". Similar to "Oh Hell" and "Rats!", "To Hell and Back" is a trick-taking card game where you bid the number of tricks you intend to take, and you must take exactly that number of tricks per hand in order to win. Bid correctly, and you earn your bid and 10 extra points. Lose your bid and you get zilch. Unlike its other variations, "To Hell and Back" starts with all players being dealt one card, and each hand the number of cards dealt per hand increases until we hit our maximum for all players. In the case of one-card hands, the differences between success and failure are luck of the draw and careful bidding.
Figure 1: Total non-sequitur, but I learned how to shuffle cards while playing "To Hell and Back". I've only become decent at shuffling cards this last year after a considerable amount of practice. Now I can bridge in addition to a ruffle shuffle!
I've become fascinated with this one-card case. To better understand the odds of winning this specific scenario, I've constructed a hand simulator, wherein a group of players are dealt one card, and the winner is determined. Using this simulation, we can understand just how likely each card and position is to win a hand.
Read more...The topic of the September joint Python Buffalo/Data Science meet-up is data scraping. To finish up our conversation of how we can use python to scrape data from public sources, I presented a short slide deck on the ethics of web scraping. The general thesis is that it very much depends on what you are doing and how you do it, and that in the end we should all strive to be good members of the data community. Presentation.
Clustering has become an everyday process for grouping together observations based on similar factors. This is particularly true when working with spatial data. For some of my ongoing research into applying spatial Statistics to fluorescence microscopy, I've been applying DBSCAN to binary images of fluorescence-tagged chromosomes to localize chromosomes. The Scikit Learn Pnython library provides a blisteringly fast DBSCAN implementation that can cluster 78 million observations in 6 seconds.
Figure 1: Real time DBSCAN clustering of two sets of normally distributed points in a field of noise. A JS implementation of DBSCAN classified sets of two-dimensional coordinates as being either noise or one of two (or more) clusters. As a general warning, the data used for this example are randomly generated on page load, so it's possible to identify more than two clusters in this dataset due to the non-deterministic nature of both the data and DBSCAN.
As I continued working with the algorithm, I started to think that it would be interesting to see the process unfold step by step for a set of data. To that end, I've created an annotated step-by-step guide to how DCSCAN clusters data.
Read more...Starting in 2011, Kodak brought to market the Kodak Pulse line of digital photo frames. In addition to SD card and USB support, this line of photo frames had an email address which could receive image attachments, store the images on Kodak's servers, and display the images hassle free on the digital photo frame. While this feature is a boon for people who like to receive photos from friends and family with minimal latency, there is one very important feature missing -- the ability to download these images in bulk.
Figure 1: Steve McQueen as Captain Virgil Hilts in the 1963 film _The Great Escape_. Captain Virgil Hilts is one of many prisoners of war imprisoned in a high security POW camp during World War II. While I acknowledge that using this allusion is dangerously close to invoking Godwin's Law, the movie is a masterpiece that deserves the occasional mention.
While it was possible to manually download each image using a web browser, this is not an acceptable means for backing up images, particularly if your photo album contains thousands of photographs. Instead of trying to download all of these images manually, I decided that a programmatic solution must exist. To that end, I created a bulk image crawling script using the python library scrapy.
Read more...