@newsapps_jobs: A Meditation on Twitter Bots

image of @newsapps_jobs twitter

what is a twitter bot

It’s not a robot at all but an automated manipulation of Twitter facilitated by the Twitter API, and it’s all about communication. The Twitter API (or any API) is a set of commands so your code and Twitter’s code can interact with predictable results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  class Twitterbot
  attr_accessor :jobs, :twitter_config, :gdrive_session, :job_worksheet

    def initialize
      @twitter_config = Psych.load_file('config/config.yml')

        Twitter.configure do |config|
           config.consumer_key = @twitter_config['consumer_key']
              config.consumer_secret = @twitter_config['consumer_secret']
              config.oauth_token = @twitter_config['oauth_token']
              config.oauth_token_secret  =  @twitter_config['oauth_token_secret']
         end
     end
end

Twitterbot.new.update("I'm tweeting!!!")

The key is to start small. You have to first register your “application” with Twitter at apps.twitter.com. In the process of registering your application, you will get four crazy looking strings: your consumer_key, consumer_secret, oauth_token and oauth_token_secret. These four strings help Twitter know that you are you and not some other bot with malicious intent.

(Registering your application and getting all the particulars just so can be confusing. I always google for examples to mimic or just to point me in the right direction. This blog post helped me out.)

@newsapps_jobs was written in Ruby and so the above configuration follows the convention specified by the Ruby Twitter Gem. Whichever language you choose for your bot, there is almost assuredly a library that will help you interact with Twitter’s API.

As soon as you verify that your bot can do the basics, you’re on your way.

@newsapps_jobs uses a number of different Ruby libraries to get it’s job done: the bitly gem to shorten links, the google-drive gem to access the google spreadsheet, the twitter gem for tweeting and the whenever gem to manage the Cron job for timely Tweeting. All of these gems have their own respective documentations that can be found online.

The complete code for the bot can be found here and, previously, I wrote about the bot here.

why a bot?

I wrote this bot in the summer of 2012. I felt the state of affairs regarding jobs in the newsapps/data j world was unruly at best, chaotic at worst. The best offering up to that point was this embedded google spreadsheet. It was a decent solution, a start, but, obviously, not an endpoint. I thought that the bite-sized information delivery of Twitter was perfect for a jobs board. With a job ad, the information necessary for a reader to assess their level of personal interest easily fits within 140 characters: job title, location, employer is really all a person needs to know if they want to click. Scrolling the tweets on your Twitter client would take seconds even if you checked every day. Easy! So I set to work. (I also knew that writing a Twitter bot was a great exercise! And fun!)

The tally so far: the bot has 170 followers, 252 tweets and I’ve shared the spreadsheet with 16 folks. I’m not sure, though, if the bot has actually connected employer with future employee.

Nevertheless, in the week before NICAR 2014, I did see this Twitter account appear. image of source news jobs twitter

Look familiar?

image of @newsapps_jobs twitter but bigger

Just sayin….

I encourage you to write your own bot. It’s fun and great practice. If you have any questions, comments or concerns or are struggling with your bot, hit me up in the comments. I’m helpful.

Ruby Code Quiz for AlphaSights

Here’s a little ditty I found from AlphaSights. I give you the directions verbatim.

A local variable named log contains an array of hashes with timestamped events like so

1
2
3
4
5
6
7
8
log = [
  {time: 201201, x: 2},
  {time: 201201, y: 7},
  {time: 201201, z: 2},
  {time: 201202, a: 3},
  {time: 201202, b: 4},
  {time: 201202, c: 0}
]

Please collapse the log by date into an array of hashes containing one entry per day

1
2
3
4
[
  {time: 201201, x: 2, y: 7, z: 2},
  {time: 201202, a: 3, b: 4, c: 0},
]

Ordering in the input is not defined. Ordering in the result is not important. Do not rely on the names of the hash keys other than :time. The field below executes your code in a Sandbox with $SAFE=4, so you can’t define new classes, use global variables, etc. The field will turn green if your solution returns a correct object.

My solution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
final_arr = []

log.group_by { |hsh| hsh[:time] }.each do |key, value|
    hash = { :time => key }
    value.each do |hsh|
      hsh.keys.each do |key|
         if key != :time
          hash[key] = hsh[key]
         end
      end
    end
    final_arr << hash
end

final_arr

There was a cool little environment at the bottom of the page in which you could paste your solution and return green if it were correct. Nice.

Any thoughts? How would you do it?

Nested Ternary

Nested ternary? Paste into an irb session. Who knew? Neat.

1
2
3
4
(1..100).to_a.each do | num |
  puts num % 3 == 0 && num % 5 == 0  ? "FizzBuzz" : (num % 3 == 0 ? "Fizz" : (num % 5 == 0 ? "Buzz" : num))
  
end

@newsapps_jobs a Twitterbot Retrospective

I’ve long had a fascination with TwitterBots. I mean…they’re robots, after all. So, I decided to hack together a Bot to auto-tweet jobs from this Google spreadsheet.

It proved to be my most satisfying coding exercise, yet.

The attributes of the Class were straightforward: some jobs to tweet (:jobs), a configuration file (:twitter_config) to access the Twitter account of the bot (setting this up is a whole other post…), a Google drive session to access the spreadsheet (:gdrive_session), the jobs spreadsheet itself (:job_worksheet), and an instance of bitly (:bitly) to shorten the addresses that link back to the full job descriptions.

The class’s attributes are initialized when Twitterbot.new is called and an instance of Twitter is configured (Twitter.configure) to tweet the tweet after it is written.

The jobs worksheet is parsed by the make_jobs method. This method accesses the rows of the spreadsheet and, using a regular expression, plucks out all the jobs for 2013.

The only real tricky part of the script was when I used the bitly gem to shorten the provided link associated with each job.

Sometimes folks had pasted an already shortened link into the Google spreadsheet. Calling bitly’s shorten method on an already shortened link returns an error instead of the unchanged, already shortened bitly link (that was the behavior I had anticipated.)

So, I wrote this method

1
2
3
4
5
6
7
def shorten_link(link)
      begin
          shorter_link = @bitly.shorten(link).short_url
      rescue => e
          e
      end
end

Then, called it in this conditional

1
2
3
4
5
  unless shorten_link(row[2]).class.to_s == 'BitlyError'
        @jobs << "#{row[0].strip.chomp}, #{row[5]}: #{row[4]} #{shorten_link(row[2])} #newapps #ddj #{short_time}"
    else
        @jobs << "#{row[0].strip.chomp}, #{row[5]}: #{row[4]} #{row[2]} #newapps #ddj #{short_time}"
  end

The shorten_link method returns a shortened link if the link hasn’t already been shortened. If the link has already been shortened, the shorten_link method returns an error message. The unless method checks whether shorten_link returns ‘BitlyError’ or not. If not, I go ahead and shorten the link, if so, I do nothing and simply insert the already shortened link. (N.B. I’m not sure what happens if the link is already shortened but by another service. Also, submitting a patch to the bitly gem repo to get around the begin/end block here is a great idea…)

Constructing the actual tweet, then, is simply a matter of picking out the necessary array indices, arranging them in the correct order and adding some text for hashtags. I added a timestamp because I was originally tweeting every couple hours and Twitter won’t let identical tweets occur within a certain timeframe. That amount of tweeting got a little ridiculous, though. So, I changed the cronjob to once a day. I just left the timestamp in because I like it.

Overall, after a couple weeks to ponder it, the class is bit messier than I would like. Ideally, I’d just initialize the bot and it would take care of everything. Also, it doesn’t feel right to configure an instance of Twitter when the bot is initialized but then call Twitter.update outside the class.

1
2
bot = Twitterbot.new
Twitter.update(bot.make_jobs)

The complete Twitterbot code:

Diary of a News App_3

Parsing the accident reports proved easier than I had anticipated.

There are eight sections of each report in which I’m interested: location (LOCATION), vehicles involved (UNIT), persons injured (PERSON_INJURED), the environment of the accident (ACCIDENT_ENVIR), the provided diagram of the accident (DIAGRAM) (which I’m not fetching quite yet), the narrative description of the accident (NARRATIVE), the details of the responding officer (OFFICER) and any properties damaged (PROPERTY_DAMAGE) . These sections are represented by constants at the top of the script and an html table or tables within the accident report.

The main method within the parsing script is the ‘dispatcher’ method at the bottom of the script. This method loads all the downloaded files and opens each with the HTML parsing library Nokogiri. Each HTML table is then analyzed to discover what type of table it is. If it’s a table with LOCATION data, the td elements of the table are passed to the parse_location method. It it’s a table with UNIT information, the tds are passed to the parse_unit method.

The main object of the script is the @rpt_hash instance variable. Each individual parsing method cycles through the td elements of the table for which it was designed and drills down into each td element to extract the relevant text. This text is then assigned as a value to a key within the @rpt_hash.

The script utilizes the powerful HTML parsing powers of the Nokogiri library. For an excellent tutorial of this library, see The Bastard’s Book of Ruby.

In essence, this code ‘tds.css(‘td’)[1].children[2].text.strip.chomp’ first tells Nokogiri where to look: the second td element (css(‘td’)[1]), then look at the second ‘thing’ within that element .children[2], next get the text (.text) then throw away any white space at the end of this text (.strip), finally throw away any newlines (.chomp).

The whole script is really just a bunch of assigning text as values to hash keys: @rpt_hash[:time_of_accident] = tds.css(‘td’)[2].children[2].text.strip.chomp. Because we’re going to use this data to do some some analysis on the Iowa State Patrol’s response to accidents on Iowa highways, I wanted to take a look at all the data pieces as I went. It was more time consuming this way, but I was able to learn some things about the data. One, for example, was that we are going to have to do some serious work on the dates in each report, but more on that later….

Some tables, UNITS, for example, occurred in multiples, i.e while a crash could only involve one LOCATION or one NARRATIVE, any crash could involve multiple vehicles and/or multiple persons injured.

For these tables, for every accident report, instead of assigning each individual data point to a key within the main @rpt_hash object, I assigned each data point to a key within a temporary hash: tmp_hsh[:unit] = tds.css(‘td’)[0].text.rm_space_tab_nline. When all the unit tables or persons injured tables are parsed, I pushed all the assigned values into the main @rpt_hash object, @units << tmp_hsh. In this way, I preserved the nested structure that makes json so useful.

Finally, I had the initial idea to accrue all the data from the reports into one giant @rpt_hash, then dump this data into Mongo all at once with the mongoimport command. That approach didn’t work very well. Mongo literally choked.

So, instead, I saved every @rpt_hash for each report separately (File.open….), then called mongoimport on each .json file I just saved (system…). This approach will work much better in the long run because, eventually, I won’t run the parsing script on every accident report, but, instead, just the new ones I haven’t previously saved.

saves report JSON then imports to mongo
1
2
3
4
File.open("data/json/#{@rpt_hash[:url_id]}.json", 'w') {|f| f.write(@rpt_hash.to_json) }


system("mongoimport --db accidents --collection reports < /path/to/json/#{@rpt_hash[:url_id]}.json")

Finally, I reset all the little holder objects, @units, @injuries, @properties and @rpt_hash before beginning the next accident report.

resets objects for next report
1
2
3
4
 @units = []
 @injuries = []
 @properties = []
 @rpt_hash = {}

Next on my list is to clean the dates. The dates are somewhat irregular within each accident report and their format needs to be standardized or else any analysis involving Dates and Time will be difficult.

But, that’s for next time…

For the complete parsing code, click:

Diary of a News App_2

Having downloaded all the necessary pages from the Iowa State Patrol’s Crash Reports site, it was time to roll up my proverbial sleeves.

This day, I had two major issues to resolve.

  1. I needed to parse the html of each report into a format that is not crazy html.

  2. Once parsed, the data needed to be entered into a database.

For the database, I selected the NoSql database mongoDB. A discourse on the differences between a NoSql and a SQL database is beyond the scope of this post, but this Stack Overflow answer not only provides useful links to compare the NoSql databases mongoDB and couchDB, but also provides a succinct summary of why I chose a NoSql db over a SQL db.

Namely, “for most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back” choose a NoSql db.

That’s me to a “T.” I need basic SQL functionality but I can’t use predefined columns because each accident report is a distinct document with an unknown structure.

One document may have only a single vehicle (unit) involved in the crash whereas the next document may have ten units. One document could have a section for “Property Damage” or even multiple sections for damaged property, but, on the other hand, most vehicle crashes resulted in no property damage.

So, I needed a database that was flexible and could expand and contract with each report.

Having selected mongoDB, I next needed to install it.

This process was absolutely painless thanks to the Mac OS X package manager, Homebrew.

If you develop on Mac OS and don’t use Homebrew, you’re making your life needlessly difficult.

1
$ brew install mongodb

And, that’s that.

To test the install, at the shell prompt, type:

1
mongod

Then, open a new shell tab and type:

1
2
3
mongo 
db.test.save( { a:1 } )
db.test.find()

And, you’re good to go.

The next step is the processing of the raw HTML.

Diary of a News App_1

My mom has two sayings she’s fond of.

  1. Soonest begun, soonest done.

  2. It’s a “want to” kind of thing.

I thought these were dumb when I was growing up, but I thought of them again when I re-re-began to work on parsing the reports from http://accidentreports.iowa.gov.

I had never made much progress because I had never really wanted to make much progress, i.e. I had not decided, definitively, to “do or do not”.

So, I got up early last Saturday, brewed some coffee and started hacking.

Although the actual Iowa Accident Reports are difficult to parse, fetching the reports themselves is pretty straightforward.

I figured I’d start there. Fetch all the reports, store them locally and work from that base.

The URL for each report breaks down to a base URL and URL id number.

Getting all the report pages is simply a matter of cycling through all the id numbers and then downloading the corresponding report.

Easy right?

Not so fast.

For some reason, reports don’t actually start appearing until the id number 29734. Consequently, the reports only stretch back to July 12, 2005.

Now, this situation raises our first data journalism questions. Are there more reports? What happened to the reports prior to July 12, 2005? Can I get those reports? Why aren’t those reports online?

We might want to contact the government offices now and ask them these very questions.

But, having worked with this data previously, I know that I’ll have to contact the Iowa State Patrol at some point anyway because of a problem with the geographic data associated with each report. So, I’ve made note of these questions and put them in a safe, fire-proof place for a later date.

To deal with those empty reports, I decided to check for the presence of the Law Enforcement Case Number:

Check for Law Enforcement Case Number
1
   html.css('table')[0].css('td')[5].children[1].text.empty? == false


If this returns “false”, a Law Enforcement Case Number is present and I’ll download the file; if it’s empty (returns true), there’s no report and we move on to the next URL id number.

While I might be able to reasonably assume that all URL id numbers prior to 29734 are empty (I spot checked quite a few), it’s easy for me to have the script start at zero and run through all possible crash reports, so that’s what I’ll do.

Eventually, I’d like to automate this script to start and stop on its own and to run every two weeks to check for new reports, but that’s for later. For now, I hard coded the script to start at 0 and stop at 63304, which, as of this writing, is the most recent full accident report.

Here’s the full script:

Full Script to Download Iowa Accident Reports
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'mechanize'

MAIN_URL = "http://accidentreports.iowa.gov/index.php?pgname=IDOT_IOR_MV_Accident_details&id="


@url_id = 0
@counter = 0


def fetch_page
  puts "On page: #{MAIN_URL}#{@url_id}:: Counter: #{@counter}."

  uri = "#{MAIN_URL}#{@url_id}"
  page = open(uri).read
  page
end


def write_to_raw_file(html)
  fh = File.open("data_hold/#{@url_id}.html", "w" )
  fh.write(html)
  fh.close
end


def save_page
  html = Nokogiri::HTML(fetch_page)
  if html.css('table')[0].css('td')[5].children[1].text.empty? == false
    write_to_raw_file(html)
    @url_id += 1
    @counter += 1
  elsif
    @url_id += 1
    @counter += 1
  end
end


def loop_through_pages
  last = 63304
  until @counter == last do
    save_page
  end
  exit
end

loop_through_pages

The script is straightforward. A fetch_page method grabs the page, a save_page method checks for the presence of the Law Enforcement Case Number, and, if present, calls write_to_raw_file and passes in the html of the fetched page and a loop_through_pages method cycles through the url_id numbers and calls save_page.

I ran the script and it took awhile, ~10 hours, but it worked like a charm.

Next on my list is to get a list of the Law Enforcement Case Numbers and associated X and Y coordinate values.

My previous experience with this data taught me that the geo information is cut off for each of the reports. I contacted the government officials about this problem in 2011 and they said they’d be happy to update the geo information if I provided them a list of case numbers and X and Y values.

That’s pretty helpful and I wish I had been in a position to jump on that opportunity when it was presented. (Alas…le Sigh…)

They might not be so helpful this time. We’ll see.

Stay tuned.

Diary of a News App_0

My as yet unconsummated relationship with the Iowa State Patrol’s Crash Report Site began two years ago on a cold day in January, 2011.

I had just started my first day as a data journalist at the venerable Des Moines Register and my new boss and I, the equally venerable @jameswilkerson, were chatting about web scraping and he mentioned the Iowa State Patrol’s Crash site. James said that he and my predecessor at the Register, the also venerable @mikejcorey, had kicked the tires on the idea of scraping the site but had deemed it impractical/undoable/just not worth it.

Being the cocky young buck that I was two years ago and wanting to make a good impression, I said “It shouldn’t be too hard to scrape that.” James looked at me quizzically (or like I was nuts) and showed me the HTML.

Sample html from Iowa State Patrol’s Crash Site
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<table style="text-align: left; width: 750px;" border="1"cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td style="text-align: center; vertical-align: middle; width: 33px;" colspan="1" rowspan="18"><big style="font-weight: bold;">
<big>U<br>
N<br>
I<br>
T<br>
<br>
1</big></big>
</td>
<td colspan="3" rowspan="1" style="width: 83px;" align="left" valign="top"><small><small>Driver's Name - Last</small></small><b><br><span class="formdata">KROHN</span></td>
<td colspan="3" rowspan="1" style="width: 82px;" align="left" valign="top"><small><small>First</small></small><b><br><span class="formdata">JUDITH</span></td>
<td colspan="2" rowspan="1" style="width: 80px;" align="left" valign="top"><small><small>Middle</small></small><b><br><span class="formdata">DIANE</span></td>
<td colspan="2" rowspan="1" style="width: 64px;" align="left" valign="top"><small><small>Suffix</small></small><b><br><span class="formdata"></span></td>
</tr>
<tr>
<!-- <td colspan="3" rowspan="1" style="width: 83px;" align="left" valign="top"><small><small>Address</small></small><b><br><span class="formdata">************************</span></td> -->
<td colspan="3" rowspan="1" style="width: 87px;" align="left" valign="top"><small><small>City</small></small><b><br><span class="formdata">MAPLETON</span></td>
<td colspan="4" rowspan="1" style="width: 64px;" align="left" valign="top"><small><small>State</small></small><b><br><span class="formdata">IA - Iowa, US</span></td>
<td colspan="3" rowspan="1" style="width: 55px;" align="left" valign="top"><small><small>Zip</small></small><b><br><spanclass="formdata">51034</span></td>
</tr>

Gross for sure. But not impossible. But gross. And hard.

But, worth it.

So, I took a spin at processing the html, but I didn’t get far/never really tried and was soon distracted by my new job, vacations, college football…..etc.

Then, layoffs struck the Des Moines Register and folks were scattered to the winds.

I never forgot about those Iowa State Patrol Crash reports, though, and even mentioned them again to @mikejcorey at NICAR 2012. He gave me @jameswilkerson’s “You’re nuts” look, but agreed that that data would be great to get and mentioned some questions/analysis he’d like to run on the data.

So, I dusted off that Ruby script and set to work processing that cringe-worthy HTML. I made good progress but was distracted by my lack of a certain goal. I wasn’t sure what I wanted to do with the accident report data once it was processed, so I again set the script aside.

This time that Ruby script might have remained forever dusty and forgotten if not for a course of events set in motion by this talk by Ben Welsh (@palewire), a database producer at the LA Times, at NICAR 2013.

Ben’s talk was great and made an impression that I mentioned to Ben’s LA Times colleague, Ken Schwencke (@schwanksta), in the lobby of our hotel on the final day of the conference.

Ken told me that Ben had done the analysis for that project in Django. Ken didn’t expound on that point, as it didn’t really need any explanation. I knew exactly what he was talking about.

I had never thought about doing the analysis for a story or project within a Web framework, be it Django or Rails, but the obviousness of this idea and the fact that I didn’t do it made me feel like I’d been riding a bicycle with triangle wheels for the last few years.

A few weeks after NICAR 2013, this article on How The Data Sausage Gets Made by Jacob Harris (@harrisj) of the New York Times and a subsequent conversation with Troy Thibodeaux (@tthibo) of the Associated Press reinforced the idea of doing all the data work for a project within one’s framework of choice.

At this point, I was obsessed and walking around like Howard Hughes muttering “The way of the future…”. And, I knew my Iowa Crash Site reports project was a perfect test case with which to experiment. This project would have it all: scraping, a FOIA (or at least some back and forth with government officials), mapping, graphs and even a NoSQL database.

Perfect. (Plus, the college football season was far enough off that I could finish before the first kickoff….)

And so, I began: Diary of a News App_1

Re Vera Films on Rails

I deployed my first official Rails application this weekend. (By official, I mean actually serves a purpose in it’s existence.)

ReVeraFilms.com started out as a static site for a production company based in Los Angeles, and I had no real desire to take it any further once I had finished it.

After I had finished Michael Hartl’s Rails Tutorial book, I began thinking about a good first project. I’ve worked extensively with data and databases before and even wrote some models and a Rake import for an aborted application in 2010. So, I felt confident with my backend skills in Rails, confident enough, at least, to get started without too much fear.

Where I knew I needed more practice in Rails was in the front-end matter, i.e. the asset pipeline, partials, layouts, views and the relationship of all and sundry. I was tired of nibbling around the edges of Rails, though, so I was reluctant to start another “tutorial/exercise” type project.

Enter ReVeraFilms.com.

An image of a reverafilms.com home page.

I figured I’d just convert a static html site to Rails and that would give me some practice on the front-end portion of Rails.

It turned out to be a great idea and went pretty smoothly. (I even wrote a few tests.)

The only real problem was a niggling error related to the Asset Pipeline. The application.js file already requires JQuery and I mistakenly called the library again along with the js files for the Anything Slider. So, I kept getting a mysterious error and the Anything Slider didn’t work at all.

These sorts of errors are exactly what makes learning a framework so difficult and exactly why I wanted to (and am glad I did) start small and manageable.

Deployment to Heroku went as easily as everyone says.

What wasn’t a breeze was getting the domain name of reverafilms.com to point to the Heroku app. Bluehost doesn’t allow configuration of cname records unless the person also hosts his or her site on Bluehost. So, first I had to transfer the domain name registration to GoDaddy.

Then, I had to muck around in the guts of my Rackspace account to get the proper settings, change them in the GoDaddy DNS zone editor, then wait an hour or so for the changes to take effect to see if I had it right. If not, I had to return to the GoDaddy zone editor and try it again. (I now have no problem admitting that I have a lot to learn when it comes to DNS blah, blah, blah.) Late Sunday afternoon, my apparent triumph was cut short when an email I had sent to the owner of the site (personx@reverafilms.com) was returned as undeliverable. I thought, hoped, and prayed that maybe I just needed to point the right record in GoDaddy to the mx server on Rackspace. Thankfully, I was right and there was only a little downtime for the email addresses associated with the site.

All in all, it was a great exercise. The lesson: never underestimate the value of fully implementing a seemingly trivial version of a project. You can always add refinements, refactor, build out, etc. But, taking a trivial version of a project from 0 to fully implemented forces you to deal with that last, most difficult 10 percent of a project, but on a more digestible scale.

D3: Husker Wins Over Time

I’ve been messing around with d3 as much as possible lately and I’m impressed. I’ve a backlog of posts about this great library but I thought I’d start off small and give a shout out to Scott Murray, @alignedleft and his great tutorials.

Scott also wrote a book and I’m working my way through the electronic version. But, if you’re interested in d3, just start with Scott’s tutorials. d3 is not necessarily easy, especially if you’re pretty new to Javascript, but it allows you to set clearly defined goals for projects to move your skills forward. Everyone can find some real numbers to make a simple graph.

My first real project was to graph the Nebraska football wins over the history of the program. (I got this data from Wikipedia.)

Scott’s tutorials gave me enough knowledge to work through this example project and helped clear up some confusion I had regarding scales.

I plan on expanding this example Husker graphic to include all Big Ten teams and to add some interactivity as well.

Also, if you’re just starting out, don’t worry so much about “data” just yet. After working through most of another d3 book, Mike Dewar’s Getting Started With d3, I came to realize that the proverbial “rest of the iceberg” with d3 is the data parsing. You can make spectacular visuals with d3, as long as you can parse your data into the format that you want. Enter the scripting language of your choice….

Nevertheless, for now, just check out Scott’s tutorials. They really are fantastic. As for data, follow Scott’s lead and hard code your values in an array. (Always, start small, I say.)

Regarding Scott’s book, I can attest that it’s on the same level as the tutorials. So, if you dig the tutorials, you’ll dig the book too.

Also, if you have any questions on the Husker graphic above, hit me up in the comments.