Presenting at hackNY Masters: 3 statistical tricks
If you're in NYC today around NYU, I'm presenting at the HackNY Masters conference. I'll be presenting at 8:30---the topic being "3 Statistical Tricks Every Hacker should know."
Read more…If you're in NYC today around NYU, I'm presenting at the HackNY Masters conference. I'll be presenting at 8:30---the topic being "3 Statistical Tricks Every Hacker should know."
Read more…Since moving to New York from Chicago, every time I mention to New Yorkers that Chicago is a much better city, without fail they almost always reply, "But Chicago is so cold!" I have brushed this off as a parochial New York-ism for some time ("But isn't everywhere west of the Hudson cold and miserable?"), feeling that in my experience in the two cities, they're both about the same amount of cold. New York is at 40.7N, Chicago is at 41.8N---this is all of 60 miles, or about an hour's drive difference. I'd hardly believe that amounts to a significant change in temperature.
Read more…I made this point with Venkatesh Rao in our Future of Data Project for the CSC, but found myself at a table at the MongoDB NYC conference talking through it again---I think it's a useful enough idea, and maybe even an original enough one, to devote a post to.
Read more…Just some quick news: I've recently taken a job as a Data Scientist at a future-of-education startup in New York. This means Design & Analytics is no longer taking new paid clients, but I will keep posting to the site blog on techy things related to design, data visualization, R, networks, and amateur cartography. Also, still feel welcome to shoot me an email if you have any questions about anything I've posted.
Read more…...and maps are great. As I learn mapping software, I've shown the following map to three smart people, who didn't notice anything out of the ordinary, except the sharp cut-off of Antarctica:
Read more…It's beginner's night in Chicago. Alongside Chase Carpenter, Paul Teetor, and Jeffrey Ryan, I'm teaching Data Munging 101 in R at Jak's tap. The meetup information is available here.
Read more…Continuing on my recent project of making interactive network graphs (on projects where other brilliant people have already done the difficult querying), here's a visualization of the CRAN package dependencies. Sure, the philosophers graph is maybe more interesting, but this is a very real map of people power, too---and maybe as influential in these statistical times.
Read more…Resuming from last time, I've made some updates to the philosophers' social network including publishing two interactive maps. Quick introduction: you know that sidebar on wikipedia where it tells you someone was influenced by someone else, linking to them? These graphs are generated from asking wikipedia for a comprehensive list of every philosopher's influence on every other. There are some sample-bias issues and data problems I went over in the first part of the series, but overall it's both beautiful and interesting.
Read more…I was surprised I hadn't seen this graphic at Drunks and Lampposts made with Gephi until a friend posted it on facebook last week. The original is here, and here's my version:
Read more…According to the ishares blog, the Chicago Fed National Activity Index (CFNAI) was recently named the most underrated index for measuring the US economy's health. I wholeheartedly agree. I co-ran this beauty when I worked at the Fed in 2008-2009 and automated its graphing in Matlab. If you look back in the archives from around then, you'll see...
Read more…I really liked this story on the 37signals blog yesterday where Jason Fried explained the process of seemingly serendipitous events that led to his being asked to write an opinion piece in the New York Times. I've been working with network visualization lately and turned his story into the graphic below, which they've kindly posted back to the 37 signals blog.
Read more…Mr. Tempo himself is working on a new project called Game of Pickaxes. You can sign up for his early-phase mailing list on the book's pre-launch site.
Read more…In Part 2, we showed how to add recession shading to a plot of American Beards over time, and did some diagnostics to check whether 19th Century Americans grew recession beards. (Spoiler alert: it appears they did not.) In Part 1, we showed how to plot the series in the first place. Today, we're going to look at the beardly trend over the period. We all know about the gilded age popularity of mutton chops and sideburns, but were full beards on the rise or on the decline between 1866 and 1911? And more importantly, what can this period tell us about beards of the future (in the past)?!
Read more…Checking my web analytics, I noticed that Design & Analytics is now in the coveted #4 google hit position for "weird data sets." That means I'm probably pretty close to getting Nike sponsorship and my face on a Wheaties box in the data olympics category.
Read more…In Part I, we showed how to plot a time series of the change in American beards over time, using a dataset from Robert Hyndman's time series data library. Today, we're going to look at whether the dramatic changes in American male beardfulness seem related to the economy. Did Americans grow recession beards in response to the Panic of 1873? Out of work, did they forgo their frequent trips to the barber (since Gillette didn't invent the personal safety razor until 1904 so they could do it themselves)? Did they go on their job interviews with a face full of mutton chops and just never get called back (by telegram)?
Read more…The short answer is there are about 21 million American Veterans as of 2012. As a total, that's about 6.8% of the US population.
Read more…If you use R for time series analysis, chances are you've used Robert J. Hyndman's excellent forecast tools. I recently stumbled on his time series data library where I found just the data set I've been looking for to show some R time series plotting tricks:
Read more…Just a minor note for the observant---I'm reprinting and reposting the ARIMA sector reports since March 12. I was using ggplot2 as my graphics package (in R), which changed the way it formatted dates in a recent upgrade. Since March, this meant the ARIMA forecast plots had the date format "2012-03-12" rather than simply listing full months by name, such as "March." Past data was rerun, so everything else is identical, but the format of the date axis has been updated to the original, monthly format for purely aesthetic reasons.
Read more…Mentioned near the end, Reuters picked up our ECN launch last week: http://www.reuters.com/article/2012/05/09/finance-crowdfunding-idUSL5E8G50RB20120509
Read more…Gather 'round, R-users, forecasters, algo traders, and financial analysts. The R in Finance conference is this Friday and Saturday, May 11 & 12, in Chicago. All your favorite buzzwords, from algo trading, to data scientist, to big data, will be there---with talks given by the top experts in the R landscape. Jeff Ryan's pre-conference session on processing full-market data looks particularly solid. Sign up now while there's still space. I'll be attending, and you're welcome to contact me before to arrange a meet-up. If you're an analyst or data professional, but finance isn't your field, I still recommend going. The R finance community has been a big source of innovation on large data set processing, real-time processing, and time series analysis.
Read more…This is a project I'm pleased to have worked on with the Loft Finance team: http://europecrowdfunding.org/ The European Crowdfunding Network launched last week. Its mission is to provide the groundwork for founding a European-wide Crowdfunding Association for funding startups and small businesses---using the power of P2P to get new ventures up and running. You can follow them on twitter, join the mailing list to learn about events, or contact your country's ECN ambassador to learn more.
Read more…I spent the weekend playing with R's mapping capabilties. For most purposes, I think you can accomplish more writing raw SVG wrapped in a flash interface to allow easy web interactivity like what we did for CAFf. However, there are some applications for which a good static cloropleth map can be very useful---like when you want quick situational awareness of important numbers that change at slow frequency.
Read more…On the topic of Generativity and Creativity: entrepreneurials doing impulsive, crazy things and their stories making that leap. Take a look at the full event write-up on Ribbonfarm. I intend to speak there about a design philosophy---and how it impacts my approach to numbers, art, and risk. The event is fully booked, but I'm sure Venkat will post an excellent summary.
Read more…This is a repost from the R-bloggers mailing list, with a quick script showing credit rating on a global map. It displays sovereign credit ratings by S&P, Fitch, Moody's, and Chinese rating firm, Dagong, and demonstrates how easy it's become to create beautiful data visualizations.
Read more…For regular watchers of the ARIMA Sector Forecast report, you'll notice that the release date has changed slightly, and a couple of recent market dates have been omitted. Apologies. Since this is a demo report, it's actually being run from a standard laptop on a cron schedule, rather than server-hosted. While this should show how easy it is to automate reports like this even with simple hardware, it's also been the case that my machine is occasionally not turned on at 5:15pm EST due to a hectic travel schedule. I can rerun historical reports on request, so please let contact me if there's one you'd like to see. Also, please notice that the actual release time has been updated to reflect an erratic and unpredictable travel schedule. Thanks!
Read more…The sector forecasts will be down for maintenance tomorrow, Thursday, December 15. However, for historical purposes, the Thursday forecast will be posted afterwards on Friday, December 16.
Read more…There is a fantastic analysis using the Lending Club datastore to investigate and visualize the marketplace Lending Club blogged over at Dataspora. As I've mentioned before, I'm intrigued with P2P finance as an emerging industry, and if we're lucky, as an alternative asset class. As an emerging industry, we're seeing a number of different competitors rapidly enter and exit the industry, and regulations that fit it like a hand-me-down sweater, a little too big, clearly made for the problems of a different industry: the financial older brother they most resemble, but are distinct from.
Read more…For several months now, I've been putting together an automated econometric forecasting platform. Using a simple ARIMA model, I've created a forecast for stock sectors to inform my own short-term option trades. Despite being a statistical model, rather than a foundational one about informationally poor stocks, I found it useful for my purposes---though I make no promises about yours---and developed this infrastructure in order to share it online. Even if you're not interested in stock market movements, the further utility of the platform is as a demonstration of the capabilities of reports automation: this platform runs an analysis every day, makes forecasts, charts, typesets instructions and accompanying advertizements, and publishes them to PDF and in an animated gallery online. This is an analysis that once automated, never again needs human intervention. That's a powerful thing, especially in how it frees this human to solve new problems.
Read more…Working on a project on asset allocation right now, and discovered two excellent references. The first is a solid guide and introduction to implementing the Black-Litterman model: http://www.blacklitterman.org/. This reference provides useful implementations in both Excel and Matlab, as well as a discussion of the inputs. The discussion of the controversial and elusive Tao term is particularly helpful. The second item is a data source, a comprehensive ETF reference, http://etfdb.com/, which lets you search the ETF universe along several parameters. ETFs can provide great simplification in an asset allocation model, and this reference allows you to sort by issuer, and by country exposure.
Read more…This is the story of a foreign language data mashup, and how thinking about study-time as an asset with returns can make your language-learning more efficient---in theory.
Read more…I've posted before about Google correlate in Google Correlate for fun and profit. It's a fantastic platform, but I have not yet discovered any practical use cases for it. This is still the case, but after experimenting with it more, I now have a better idea of what they would need to do (and why they can't do it), to make it more useful. First off, you might want to take a look at their white paper, available here, or attached at the end of this article below. The relevant point is that according to their methodology:
Read more…I'm optimistic about the efficiency gains coming from direct P2P entering the finance market, and with some skeptical reservation believe that this is where finance is heading generally---disintermediation can be a beautiful thing for market efficiency and participatory transparency. As a survey: sites like Lendingclub and Prosper have gone far already in establishing the market for P2P credit. Kiva and Grameen have done impressive things for P2P donation and microcredit models, and Kickstarter has been exceptional in artistic donation services. Rather than waiting for banking institutions to evaluate the profitability of their ideas or causes, individuals are gaining increasing access to funding the opportunities they support, on terms they define themselves.
Read more…There was a New York Times Science article yesterday on an offering from Stanford Engineering. In the spirit of the Khan Academy, which I greatly admire, an artificial intelligence course is being offered free and online, taught by Sebastian Thrun and Peter Norvig. There are currently about 63,000 registrants.
Read more…Searching for this report, I found more meta-coverage than links to the source. Below is the full text of the document itself, and the decision rendered by S&P in the downgrade, titled "Research Update: United States of America Long-Term Rating Lowered To 'AA+' On Political Risks And Rising Debt Burden; Outlook Negative." Reposted directly from S&P.
Read more…...and great google artists duplicate independently.
Read more…I was pleased to see this posted in the Alumni Notes of my Alma Mater recently---Justin Joque, a Data Librarian at the University of Michigan, put together a Data "Auralization" of the Dow from 1928 to 2011. He uses tick data to create two audio bands, one with closing price, and one with trade volume. Here's the Vimeo Link:
Read more…On several quick projects and tasks, Wrangler has saved me days of brute manipulation, and helped me explore new methods faster. It's a fantastic tool for getting your data into the format you need it, so you can start doing more interesting things with it: http://vis.stanford.edu/wrangler/
Read more…...well, if you find the "profit" application, let me know.
Read more…Great article at Design Festival on using prime numbers to make the period of repetitive patterns very large, making tiling less visible:
Read more…One of the first things that drew my attention to combining math and art, and signaled to me that their combination might still be useful to both was teaching myself Action Script. With little Flash programming experience, I had to fall back on models that I knew---methods for financial series, simulation, and state-space models, including forecasting and credit risk. So while I was learning a new tool, I wondered whether there was any insight to be gained from making them beautiful, something you might see by visualizing the same data differently than before.
Read more…Thanks everyone, and early clients for helping Design & Analytics get off the ground. This page will feature stories, analyses, and perspectives on topics relevant to both Design & Analytics fields. We'll aim to post here things that are either...
Read more…As an overall goal, Design & Analytics aims to makes things faster, more efficient, and cleaner.
Read more…I'm Adam Hogan. I'm a technology generalist working with freelance consultants in several fields to solve problems, often for startups.
Read more…The fish-tank below showcases design aesthetics and mathematical modeling working together. Refresh the page to see it randomize differently, or go here for an in-depth explanation of the model, and how it relates to both art and finance.
Read more…Design and Analytics makes broken things work, puzzling things intelligible, slow things fast, and ugly things beautiful---with applications in time series finance, geographic information systems, and social network analysis.
Read more…