Project | Hackaday statistics

« Back to project details Sort by:

Prolific authors/submitters and other tidbits

06/09/2016 at 23:46 • 0 comments

I'll continue to put up interesting things as I think of them. Here are a few interesting tidbits.

Most often used post tags:

misc hacks	2327
Arduino Hacks	1792
news	1492
classic hacks	1291
robots hacks	1248
tool hacks	1200
home hacks	1024
led hacks	1024
Microcontrollers	893
Hackaday Columns	813
peripherals hacks	778
Featured	750
transportation hacks	742
slider	711
3d Printer hacks	696
hardware	661
security hacks	657
Raspberry Pi	634
digital cameras hacks	589
home entertainment hacks	587

Perhaps unsurprisingly, arduino hacks are near the top of the list.

If you look at the most prolific authors you get:

Mike Szczys	5716
Brian Benchoff	3834
Caleb Kraft	1567
Eliot	1332
James Hobson	1063
Mike Nathan	1039
Will O'Brien	805
Adam Fabio	530
Elliot Williams	405
Al Williams	401
Kristina Panos	322
Rich Bremer	290
Jakob Griffith	269
Eric Evenchick	265
Rick Osgood	225
Gerrit Coetzee	215
Marsh	213
Jeremy Cook	199
Dan Maloney	198
Bryan Cockfield	187
Kevin Dady	187
Mathieu Stephan	180
Anool Mahidharia	160
Juan Aguilar	160
Vine Veneziani	137

Plotting the number of articles per week, segregated by the top ten authors, over time gives the following picture:

You can clearly see where submitters became active and when when they stopped. Brian had a early submission somewhere in 2006 before he joined HAD. Mike Szczys was active early and then starting tailing off around 2013-- other behind the scenes activities I imagine.

Featured articles over time
06/09/2016 at 18:08 • 0 comments

Here is the data requested: featured per week and %featured.

The above was for articles with the "Featured" post marker. If you include "Featured","Retrotechtacular","Hackaday Columns", "The Hackaday Prize", "Ask Hackaday", "Hackaday Store", "Interviews", that roughly triples the number of articles, but the overall shape looks the same.
Fris Plot
06/09/2016 at 08:19 • 0 comments

OK, first plot of the data before I go to bed. I munged the data and plotted posts per day as a function of time. Not surprisingly, the number of posts per day have been going up since the early days. Somewhat surprisingly the maximum posts per day was way back in Feb 28, 2011 when there were no less than 16 posts! Here you go:

Staying true to its name, most days early on had one article per day. Now the mode appears to be 8 per day.

Scraping the HAD website

06/09/2016 at 06:48 • 1 comment

I started off knowing nothing about web scraping. I found a good link which shows how to scrape using python:

http://docs.python-guide.org/en/latest/scenarios/scrape/

Found a few websites that explain the xtree syntax and I was off to the races. So a few baby steps first.

from lxml import html
import requests
page = requests.get('http://hackaday.com/blog/page/3000/')
tree = html.fromstring(page.content)
# get post titles
tree.xpath('//article/header/h1/a/text()')
# get post IDs
tree.xpath('//article/@id')
# get Date of publication
tree.xpath('//article/header/div/span[@class="entry-date"]/a/text()')

Eventually wrote a script to scrape the entire HAD archives. On Wednesday June 8th at 11PM Pacific time, it had 3223 pages. Decided to include article ID, date of publication, title, author, #comments, "posted ins", and tags. Here is a quick and dirty python script to output all data to a tab delimited file:

from lxml import html
import requests

fh = open("Hackaday.txt", 'w')
for pageNum in xrange(1,3224,1):
    page = requests.get('http://hackaday.com/blog/page/%d/'%pageNum)
    tree = html.fromstring(page.content)

    titles = tree.xpath('//article/header/h1/a/text()')
    postIDs = tree.xpath('//article/@id')
    dates = tree.xpath('//article/header/div/span[@class="entry-date"]/a/text()')
    authors = tree.xpath('//article/header/div/a[@rel="author"]/text()')
    commentCounts = tree.xpath('//article/header/div/a[@class="comments-counts comments-counts-top"]/text()')
    commentCounts  =[i.strip() for i in commentCounts]
    posts = []
    tags = []
    for i in xrange(len(titles)):
        posts.append(tree.xpath('//article[%d]/footer/span/a[@rel="category tag"]/text()'%(i+1)))
        tags.append(tree.xpath('//article[%d]/footer/span/a[@rel="tag"]/text()'%(i+1)))
    for i in xrange(len(titles)):
        #print postIDs[i] + '\t' + dates[i] +'\t' +titles[i] +'\t' + authors[i]+'\t'+commentCounts[i]+ '\t' + ",".join(posts[i]) + '\t' + ",".join(tags[i])
        fh.write(postIDs[i] + '\t' + dates[i] +'\t' +titles[i] +'\t' + authors[i]+'\t'+commentCounts[i]+ '\t' + ",".join(posts[i]) + '\t' + ",".join(tags[i]) + '\n')
fh.close()

I felt a bit guilty about scraping the entire website but Brian said it was OK. The html file for each page is ~60KB times 3223 pages is about 193 MB of data. This was distilled down to 3.5 MB of data and took about 25 minutes.

The latested post is #207753 and the earliest is post # 7. The numbers are not sequential and there are total of 22556 articles. The file looks like this

post-207753	June 8, 2016	Hackaday Prize Entry: The Green Machine	Anool Mahidharia	1 Comment	The Hackaday Prize	2016 Hackaday Prize,arduino,Coating machine,grbl,Hackaday Prize,linear motion,motor,raspberry pi,Spraying machine,stepper driver,the hackaday prize
post-208524	June 8, 2016	Rainbow Cats Announce Engagement	Kristina Panos	1 Comment	ATtiny Hacks	attiny,because cats,blinkenlights,RGB LED,smd soldering,wedding announcements
post-208544	June 8, 2016	Talking Star Trek	Al Williams	8 Comments	linux hacks,software hacks	computer speech,natural language,speech recognition,star trek,text to speech,voice command,voice recognition
.....
post-11	September 9, 2004	hack the dakota disposable camera	Phillip Torrone	1 Comment	digital cameras hacks
post-10	September 8, 2004	mod the cuecat, and scan barcodes…	Phillip Torrone	1 Comment	misc hacks
post-9	September 7, 2004	make a nintendo controller in to a usb joystick	Phillip Torrone	22 Comments	computer hacks,macs hacks
post-8	September 6, 2004	change the voice of an aibo ers-7	Phillip Torrone	10 Comments	robots hacks
post-7	September 5, 2004	radioshack phone dialer – red box	Phillip Torrone	38 Comments	misc hacks

I'll upload a zipped version. Hopefully this will save HAD from being scraped over and over again.I'll start slicing and dicing the data soon.

Addendum: for whatever reason, two articles were missing the posts/tags fields. I fixed them manually and uploaded the corrected file.

Hackaday statistics

Prolific authors/submitters and other tidbits

Featured articles over time

Fris Plot

Scraping the HAD website