{"id":2560,"date":"2015-08-29T23:52:06","date_gmt":"2015-08-29T23:52:06","guid":{"rendered":"https:\/\/ideasillustrated.com\/blog\/?p=2560"},"modified":"2018-04-18T18:18:10","modified_gmt":"2018-04-18T18:18:10","slug":"anatomy-of-an-analysis-part-1","status":"publish","type":"post","link":"https:\/\/ideasillustrated.com\/blog\/2015\/08\/29\/anatomy-of-an-analysis-part-1\/","title":{"rendered":"Anatomy of an Analysis (Part 1)"},"content":{"rendered":"<p>A few weeks ago, the BBC News produced a list of <a href=\"http:\/\/www.bbc.com\/culture\/story\/20150720-the-100-greatest-american-films\" target=\"_blank\" rel=\"nofollow noopener\">the top 100 greatest American films<\/a> based on input from critics from around the world.<\/p>\n<p>Here are the top ten films presented in rank order:<\/p>\n<ol>\n<li>Citizen Kane (Orson Welles, 1941)<\/li>\n<li>The Godfather (Francis Ford Coppola, 1972)<\/li>\n<li>Vertigo (Alfred Hitchcock, 1958)<\/li>\n<li>2001: A Space Odyssey (Stanley Kubrick, 1968)<\/li>\n<li>The Searchers (John Ford, 1956)<\/li>\n<li>Sunrise (FW Murnau, 1927)<\/li>\n<li>Singin\u2019 in the Rain (Stanley Donen and Gene Kelly, 1952)<\/li>\n<li>Psycho (Alfred Hitchcock, 1960)<\/li>\n<li>Casablanca (Michael Curtiz, 1942)<\/li>\n<li>The Godfather Part II (Francis Ford Coppola, 1974)<\/li>\n<\/ol>\n<p>There is really nothing too surprising here. Perennial favorite <em>Citizen Kane<\/em> tops the list followed by <em>The Godfather<\/em> and <em>Vertigo<\/em> &#8212; two of the most famous films (by two of the most famous directors) ever produced. Perusing the full list, you might recognize a few other titles and maybe think about adding some of them to your Netflix queue. But that&#8217;s about it. Aside from a handful of <a href=\"http:\/\/www.bbc.com\/culture\/story\/20150720-whats-so-good-about-citizen-kane\" target=\"_blank\" rel=\"nofollow noopener\">ancillary stories<\/a>, there was little additional commentary to draw you deeper into the story. Sensing an opportunity, I decided to use this list to demonstrate the steps involved in a quick and simple analysis of data found \u201cin the wild.\u201d<\/p>\n<p>Here follows a demonstration of my 5-step program for data analysis:<\/p>\n<p><strong>Source<\/strong><\/p>\n<p>The BBC asked each critic to submit a list of the ten films they felt were the greatest in American cinema (\u201c\u2026 not necessarily the most important, just the best.\u201d). For the project, an \u201cAmerican film\u201d was defined as any movie that received funding from a U.S. source. This criteria included many films by foreign directors as well as films shot outside of the country. The highest ranking films on each list received ten points, the next film down received nine points, and so on. The tenth pick received one point. All the points were then tallied to produce the final list.<\/p>\n<p><strong>Processing<\/strong><\/p>\n<p>Even though the resulting \u201clisticle\u201d is fairly simple, it contains a lot of interesting information just waiting to be freed from its icy confines. I pulled the list into Excel and used some very basic string (text) manipulation to create four basic fields from each row of information:<br \/>\n<a href=\"https:\/\/ideasillustrated.com\/blog\/wp-content\/uploads\/2015\/08\/List_String_Manipulation_1-e1440885417957.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2561\" src=\"https:\/\/ideasillustrated.com\/blog\/wp-content\/uploads\/2015\/08\/List_String_Manipulation_1-e1440885417957.png\" alt=\"List_String_Manipulation_1\" width=\"700\" height=\"228\" \/><\/a><\/p>\n<p>Additional manipulation of the \u201cYear\u201d field yields a useful grouping category:<br \/>\n<a href=\"https:\/\/ideasillustrated.com\/blog\/wp-content\/uploads\/2015\/08\/List_String_Manipulation_2-e1440885398194.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2562\" src=\"https:\/\/ideasillustrated.com\/blog\/wp-content\/uploads\/2015\/08\/List_String_Manipulation_2-e1440885398194.png\" alt=\"List_String_Manipulation_2\" width=\"700\" height=\"228\" \/><\/a><\/p>\n<p>With the creation of these five fields, I now have a flexible database instead of a rigid list. <\/p>\n<p><strong>Organization<\/strong><\/p>\n<p>The final data set was \u201cstored\u201d as a table in a simple spreadsheet. Although I have many problems using Excel for data storage (more on that in a future post), it is a quick and easy way to organize small sets of data.<\/p>\n<p><strong>Transformation<\/strong><\/p>\n<p>Once the data was in the format I wanted, I created a pivot table that allowed me to manipulate information in different ways. I was particularly interested in answering questions like &#8220;Who are the top directors?&#8221;, &#8220;When were most of these films made?&#8221;, and &#8220;Was there ever a &#8216;Golden Age&#8217; of modern cinema?&#8221; Most of these questions can be answered through simple grouping and summarization.<\/p>\n<p><strong>Serve<\/strong><\/p>\n<p>After all that work, it&#8217;s time to pull together the results and display them in some way. For this exercise, that means a few simple tables and charts:<\/p>\n<p>TOP 10 DIRECTORS IN AMERICAN FILM<\/p>\n<table id=\"hor-minimalist-b\" summary=\"Top Directors\">\n<thead>\n<tr>\n<th scope=\"col\">Director<\/th>\n<th scope=\"col\"># of Films in the Top 100<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Stanley Kubrick<\/td>\n<td>5<\/td>\n<\/tr>\n<tr>\n<td>Steven Spielberg<\/td>\n<td>5<\/td>\n<\/tr>\n<tr>\n<td>Alfred Hitchcock<\/td>\n<td>5<\/td>\n<\/tr>\n<tr>\n<td>Billy Wilder<\/td>\n<td>5<\/td>\n<\/tr>\n<tr>\n<td>Francis Ford Coppola<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td>Howard Hawks<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td>Martin Scorsese<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td>John Ford<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>Orson Welles<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>Charlie Chaplin<\/td>\n<td>3<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>TOP YEARS IN AMERICAN FILM<\/p>\n<table id=\"hor-minimalist-b\" summary=\"Top Years\">\n<thead>\n<tr>\n<th scope=\"col\">Year<\/th>\n<th scope=\"col\"># of Films in the Top 100<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>1975<\/td>\n<td>5<\/td>\n<\/tr>\n<tr>\n<td>1980<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td>1974<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td>1959<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td>1939<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>1941<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>1977<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>1946<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>1994<\/td>\n<td>3<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>TOP DECADES IN AMERICAN FILM<br \/>\n<a href=\"https:\/\/ideasillustrated.com\/blog\/wp-content\/uploads\/2015\/08\/Top_Film_Decades_1-e1440891216444.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/ideasillustrated.com\/blog\/wp-content\/uploads\/2015\/08\/Top_Film_Decades_1-e1440891216444.png\" alt=\"Top_Film_Decades_1\" width=\"700\" height=\"441\" class=\"aligncenter size-full wp-image-2572\" \/><\/a><\/p>\n<p>These simple presentation tools start to tell some interesting stories and &#8212; like all good analysis tools &#8212; start to hint at additional avenues of exploration. For example, while two of the directors with five films in the Top 100 (Kubrick, Hitchcock) also made it into the Top 10, the other two (Spielberg and Wilder) did not &#8230; why? The year with the most films on the list was 1975 &#8230; what were the films? The 1970s account for over 20% of the films on the list &#8230; what was going on in the culture that lead to this flowering of expression?<\/p>\n<p>It would have been really great if the BBC article had included some sort of interactive tool that allowed readers to explore the database themselves. I will see what I can do to tackle this in an upcoming post.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few weeks ago, the BBC News produced a list of the top 100 greatest American films based on input from critics from around the world. Here are the top ten films&#8230;<\/p>\n","protected":false},"author":1,"featured_media":2775,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,7],"tags":[155,156,19,2,157,142,154,158],"class_list":["post-2560","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-information","category-programming","tag-american-film","tag-bbc","tag-charts","tag-data","tag-excel","tag-list","tag-string-manipulation","tag-tables"],"jetpack_featured_media_url":"https:\/\/ideasillustrated.com\/blog\/wp-content\/uploads\/2015\/08\/Movie_Audience_2.png","_links":{"self":[{"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/posts\/2560","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/comments?post=2560"}],"version-history":[{"count":10,"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/posts\/2560\/revisions"}],"predecessor-version":[{"id":2581,"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/posts\/2560\/revisions\/2581"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/media\/2775"}],"wp:attachment":[{"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/media?parent=2560"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/categories?post=2560"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ideasillustrated.com\/blog\/wp-json\/wp\/v2\/tags?post=2560"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}