-
Broken script
09/25/2016 at 15:02 • 0 commentsThe layout of the site has changed and I didn't see that my regular backups didn't work well.
Until today.
I'm updating the pattern matching, please stay tuned...
pages: check. Nor sure how to deal with more pages though but the limit is not yet reached.
projects: a few things have been broken, let's investigate step by step.
- main project page: OK (though could still get some more cleanup)
- components: OK
- instructions: OK
- images/gallery: OK
TODO:
- ensmarten wget (inside a function/procedure) so it detects failures and errors that are not reported as 404 by the server !
- cleanup the HTML files to remove most of the HaD formatting and boilerplates. AT LEAST remove that huge HaD logo in ASCII art that is easily compressed but takes a lot of room anyway... (done with a grep oneliner)
- Test if a file has already been downloaded and remove previous identical versions of the past backups... (meaningful for the big files !)
lun. sept. 26 00:50:45 CEST 2016 : New version online ! It took only 8h to polish it but it's well worth it. UPDATE YOUR SCRIPTS !
-
404
05/10/2016 at 04:14 • 2 commentsDuring my last run, I briefly saw 404 errors but couldn't make sense of them because the script output was scrambled between different commands.
These last days/weeks, I've noticed more transient errors on hackaday.io and I have to find a way to wait and retry if the page fails to load the first time...
Until then, I made a different version with all parallelising removed and the output is also saved to a log file, for easy grepping. The new file backup_serial.sh is slower but apparently safer.
Actually, 404 errors are becoming endemic. One script run can get a few or more and there is no provision yet to retry... I have to code this because several independent runs are required to get a good sampling of the data.
Some wget magic should be done ...
New twist !
No 404 error this time. The page migh load but the contents will be "something is wrong. please reload the page." I should have made a screenshot and saved the page to extract its textual signature...
I must find a way to restart the download when this error occurs too.
-
Some more script fun
04/13/2016 at 12:51 • 9 commentsI just hacked this. Shame on me !
Let's say, it might be useful to those who test bash on W10...
echo 0 $( \ wget -O - https://hackaday.io/api/misc/prizeLeaderboard |\ sed 's/},/\n/g' |\ grep \ -e 'name of your project' \ -e 'another project name' \ -e 'A third project' \ -e 'and a last one' \ sed 's/.*award\"\:/ + /' )|bc
20170321
Slightly updated version :
DATECODE=$(date '+%Y%m%d') wget -O board$DATECODE https://hackaday.io/api/misc/prizeLeaderboard echo 0 $( \ sed 's/},/\n/g' board$DATECODE |\ grep \ -e 'FlappyScope' \ -e 'HTTaP' \ -e 'micro HTTP server in C' \ -e 'Game of Life bit-parallel algorithm' \ -e 'C GPIO library for Raspberry Pi' \ -e 'Electronics Workshops Resources' \ -e 'AMBAP: A Modest Bitslice Architecture Proposal' \ -e 'dual-mode 16-segments LED display module' \ -e 'YGREC16 - YG.s 16bits Relay Electric Computer' \ -e 'C SPI library for Raspberry Pi' \ -e 'Power supply power-on sequencer' \ -e '4014 LED minimodule' \ | sed 's/.*award\"\:/ + /' )|bc
What's your code ?
20170323: To get the total number, here's the script:
DATECODE=$(date '+%Y%m%d') wget -O board$DATECODE https://hackaday.io/api/misc/prizeLeaderboard echo 0 $( \ | sed 's/},/\n/g' board$DATECODE \ | sed 's/.*award\"\:/ + /' \ | sed -e 's/}//' -e 's/]//' \ )|bc
2017-03-23 17:21:05 : 763
-
Formatting guidelines
04/04/2016 at 05:08 • 0 commentsI'm lazy.
I'm too lazy to implement a proper scraper for log pages, even though I would spare efforts by making some efforts. I have even started to implement a suitable feature for the projects list pages. But the "quick and dirty solution" so far is to list all the project logs by hand, in the "details" page. After all there are other advantages, including easier navigation.
The script uses grep and sed to recognise a specific pattern that indicates the start of the list. First, note that the elements are separated by a line break, "<br>" code in HTML, so you have to hit "shift+enter" instead of only "enter" (which generates a paragraph "<p>")
The list starts with a bold keyword, recognised in HTML by: "<strong>Logs:</strong>" (click on the bold B in the edition menu)
Then the rest of the page should be the list of links. Each link starts at the beginning of each line (remember: shift+enter) with a number (no ordering is checked) followed by a dot and a space, then a link ("<a ") and a line break. Yeah, these are absolute links, so be careful...
overall the script detects this:
Logs:
42. some link
43. another link
There are some other minor gotchas so don't hesitate to look at the scraped and sed'ed files named logs.url if something is weird.I told you it was dirty...
-
Some updates and enhancements
03/28/2016 at 19:53 • 0 commentsTime for an update !
- Fixed a parsing issue (the pages have changed a tag from <h2> to <h1>)
- Support more than one projects page (I was wondering why all my projects didn't get saved.... Now I look at the "next" link to build the list of projects)
- Kinder to the server, to avoid triggering DOS/flood protection from the image server. It's slower but it's not critical...
My backups now use several minutes and around 17MB.
It could be faster because a lot of log pages return "301 Moved Permanently", this should be fixed with a better parsing and directly reading the logs pages (those that are in chunks of 10 logs).
-
Files are now supported
01/10/2016 at 12:51 • 0 commentsHello HaD crowd !
The admins have now provided us with a 1GB storage area with a nice listing page, similar to the other resources. I have updated the script to fetch everything AND I've put the new script in the download area.
Fun fact: when I'll next backup my projects, the script will download itself, if all goes well ;-)
-
better, faster, fatter
12/15/2015 at 01:24 • 0 commentsToday I have 19 projects on hackaday (even after I asked al1 to take ownership of #PICTIL) and I need to automate more !
So I added more features, parallelised the script a bit, scraping more pages and more conditional execution to adapt to each project (some have building instructions, others have logs, some have nothing...)
So here is the new version in its whole ugliness ! (remember kids, don't do this at home, yada yada)
#!/bin/bash MYHACKERNUMBER=4012 # Change it ! fetchproject() { mkdir $1 pushd $1 # Get the main page: wget -O main.html "https://hackaday.io/project/$1" grep '<div class="section section-instructions">' main.html && wget -O instructions.html "https://hackaday.io/project/$1/instructions/" & # Get the images from the gallery wget -O gallery.html "https://hackaday.io/project/$PRJNR/gallery" grep 'very-small-button">View Full Size</a>' gallery.html |\ sed -e 's/.*href="//' \ -e 's/".*//' |\ tee images.url [[ "$( < images.url )" ]] && ( \ mkdir images pushd images wget -i ../images.url popd ) & # Get the general description of the project detail=$(grep 'show">See all details</a' main.html|sed 's/.*href="/https:\/\/hackaday.io/; s/".*//') if [[ "$detail" ]]; then echo "getting $detail" wget -O detail.html "$detail" # list the logs: grep 'https://hackaday.io/project/.*/log/' detail.html|\ sed -e 's/.*<strong>Logs:<\/strong>//' \ -e 's/<br>/\n/g' \ -e 's/<p>/\n/g'|\ grep '^[0-9]*[.] <a ' |\ tee index.txt sed 's/.*href="//' index.txt |\ sed 's/".*//' |\ tee logs.url if [[ "$( < logs.url )" ]]; then mkdir logs pushd logs wget -i ../logs.url & popd fi fi popd } ######### Start here ######### DATECODE=$(date '+%Y%m%d') mkdir $DATECODE pushd $DATECODE wget -O profile.html https://hackaday.io/hacker/$MYHACKERNUMBER # List all the projects: wget -O projects.html https://hackaday.io/projects/hacker/$MYHACKERNUMBER #stop before the contributions: sed '/contributes to<\/h2>/ q' projects.html |\ grep 'class="item-link">' |\ sed -e 's/.*href="\/project\///' -e 's/".*//' |\ tee projects.names ProjectList=$( < projects.names ) if [[ "$ProjectList" ]]; then for PRJNR in $ProjectList do ( fetchproject $PRJNR ) & done else echo "No project found." fi popd
I still have to make a better system to save the logs, I have an idea but...PS: it's another quick and dirty hack, so far I'm too lazy to look deeply into the API. It's also a problem of language since bash is not ... adapted. Sue me.
OTOH the above script works and does not require you to get an API key.