Recently I made this visualization for contributors to https://l10n.org.al for the last two years (2012-2013). This visualization was done with gource. It was a bit of hacking, so I would like to describe here how I did it.
Getting The Data To Be Visualized
For each translation or vote on https://l10n.org.al/ the time and the name of the author is saved as well. So, we would like to get this information from the database, as well as the project to which this translation belonged, for all votes and translations.
First have a look at this diagram, just to get an idea of the structure of the database:
Then look at the script below which I used to extract the data.
#!/bin/bash
query="
SELECT v.time, u.name, (v.time=t.time) AS new, project, origin
FROM votes v
LEFT JOIN users u ON (u.ulng = v.ulng AND u.umail = v.umail)
LEFT JOIN translations t ON (t.tguid = v.tguid)
LEFT JOIN strings s ON (s.sguid = t.sguid)
LEFT JOIN locations l ON (l.sguid = s.sguid)
LEFT JOIN templates tpl ON (tpl.potid = l.potid)
LEFT JOIN projects p ON (p.pguid = tpl.pguid)
"
mysql -u root -D btranslator_data -B -e "$query" > contrib.txt
This is simply a join of the tables and extracting the fields that are needed. When the time of vote is the same as the time of translation, then this is a new translation and the field new
is true.
For each translation or vote on https://l10n.org.al/ the time and the name of the author is saved as well. So, we would like to get this information from the database, as well as the project to which this translation belonged, for all votes and translations.
First have a look at this diagram, just to get an idea of the structure of the database:
Then look at the script below which I used to extract the data.
First have a look at this diagram, just to get an idea of the structure of the database:
Then look at the script below which I used to extract the data.
#!/bin/bash query=" SELECT v.time, u.name, (v.time=t.time) AS new, project, origin FROM votes v LEFT JOIN users u ON (u.ulng = v.ulng AND u.umail = v.umail) LEFT JOIN translations t ON (t.tguid = v.tguid) LEFT JOIN strings s ON (s.sguid = t.sguid) LEFT JOIN locations l ON (l.sguid = s.sguid) LEFT JOIN templates tpl ON (tpl.potid = l.potid) LEFT JOIN projects p ON (p.pguid = tpl.pguid) " mysql -u root -D btranslator_data -B -e "$query" > contrib.txtThis is simply a join of the tables and extracting the fields that are needed. When the time of vote is the same as the time of translation, then this is a new translation and the field
new
is true.Transforming Data To The Right Format
In the result file contrib.txt
fields are separated by TAB, so first I replaced tabs with commas (,) for easy processing. I did it with find/replace on Emacs, but any other editor can do this.
Gource expects the input file (which is called log file) in the format timestamp|username|action|filename
. Gource was designed to work with version control systems (like git, subversion, etc.) in order to visualize the project activity, so theaction
is expected to be A, or M, or D etc. (respectively for adding, modifying, deleting a file) and filename
is the file that was touched.
I had to transform the data to this format and I did it with a script like this:
#!/bin/bash
while read l
do
time=$(echo $l | cut -d, -f1)
name=$(echo $l | cut -d, -f2)
new=$(echo $l | cut -d, -f3)
project=$(echo $l | cut -d, -f4)
origin=$(echo $l | cut -d, -f5)
timestamp=$(date -d "$time" +%s) # convert to timestamp
if [ "$new" = "1" ]; then action='A'; else action='M'; fi
path="/$origin/$project"
echo "$timestamp|$name|$action|$path"
done < contrib.txt > contrib.log
I also made sure that there are no lines containing NULL and that the file is sorted in increasing order by the timestamp:
sed -i contrib.log -e '/NULL/d'
sort -t'|' -k1 contrib.log > contrib-1.log
rm contrib.log
mv contrib-1.log contrib.log
In the result file
Gource expects the input file (which is called log file) in the format
I had to transform the data to this format and I did it with a script like this:
contrib.txt
fields are separated by TAB, so first I replaced tabs with commas (,) for easy processing. I did it with find/replace on Emacs, but any other editor can do this.Gource expects the input file (which is called log file) in the format
timestamp|username|action|filename
. Gource was designed to work with version control systems (like git, subversion, etc.) in order to visualize the project activity, so theaction
is expected to be A, or M, or D etc. (respectively for adding, modifying, deleting a file) and filename
is the file that was touched.I had to transform the data to this format and I did it with a script like this:
#!/bin/bash while read l do time=$(echo $l | cut -d, -f1) name=$(echo $l | cut -d, -f2) new=$(echo $l | cut -d, -f3) project=$(echo $l | cut -d, -f4) origin=$(echo $l | cut -d, -f5) timestamp=$(date -d "$time" +%s) # convert to timestamp if [ "$new" = "1" ]; then action='A'; else action='M'; fi path="/$origin/$project" echo "$timestamp|$name|$action|$path" done < contrib.txt > contrib.logI also made sure that there are no lines containing NULL and that the file is sorted in increasing order by the timestamp:
sed -i contrib.log -e '/NULL/d' sort -t'|' -k1 contrib.log > contrib-1.log rm contrib.log mv contrib-1.log contrib.log
Feeding The Data To Gource And Generating The Output
First make sure that the tools that we need are installed:
sudo apt-get install gource ffmpeg
Then generate the video with a script like this:
#!/bin/bash
gource contrib.log -s 0.1 -i 0 --max-files 0 \
--date-format "%B %Y" --hide dirnames,filenames \
-640x360 -o - \
| ffmpeg -y -r 25 -f image2pipe -vcodec ppm -i - \
-vcodec libvpx -b 10000K contrib.webm
How did I know that these are the right options? I didn't know (especially for ffmpeg
), I simply googled and found some examples, then I played with some options of gource
in order to get it right.
Date: 08 December 2013
HTML generated by org-mode 6.33x in emacs 23
First make sure that the tools that we need are installed:
sudo apt-get install gource ffmpegThen generate the video with a script like this:
#!/bin/bash gource contrib.log -s 0.1 -i 0 --max-files 0 \ --date-format "%B %Y" --hide dirnames,filenames \ -640x360 -o - \ | ffmpeg -y -r 25 -f image2pipe -vcodec ppm -i - \ -vcodec libvpx -b 10000K contrib.webmHow did I know that these are the right options? I didn't know (especially for
ffmpeg
), I simply googled and found some examples, then I played with some options of gource
in order to get it right.
Date: 08 December 2013
HTML generated by org-mode 6.33x in emacs 23
No comments:
Post a Comment