Gigi on IT

Saturday, November 7, 2009

defrag interface: one year later

In my first post to this blog last year, I showed the difference between the Windows Disk Defragmenter on Windows XP and that on Windows Vista. The bottom line was that the Vista version lost the graphical interface and the user has no idea what is going on in the background.

With the release of Windows 7, one would hope that this annoyance has been taken care of. Sadly, this is not the case. The interface does give some information on what it is doing, but the user gets no idea of how much progress has been made, and the graphical display we have come to love in XP is long gone.

Fortunately, the good folks at Piriform have created a defragmentation utility called Defraggler that shows the defragmentation operations on the hard disk in real-time. This is even better than XP's defrag utility. The visual display is, of course, only one of the great features of this utility. I only gave it a quick try (and loved the Quick Defrag option), but the interface was enough to amend the sorrows caused by Vista and Windows 7.

Wednesday, October 14, 2009

Using wget

With Geocities going down in less that 2 weeks' time, I found myself needing to archive a number of websites hosted there that would otherwise disappear. For this purpose, one can go through the frustrating experience of saving a webpage's files one by one, but that would be stupid when there exist tools that automate the whole process.

The tool for the job is GNU Wget. While I've used this tool before for similar purposes, Geocities has several annoying things that made me need to learn to use the tool a bit better.

For starters, this is how to use the tool:

wget http://www.geocities.com/mitzenos/

Great, that downloaded index.html. But we want to download the whole site! So we use the -r option to make it recursive. This means that it will follow references to files used by each webpage, using attributes such as href and src. While this recursion could potentially go on forever, what limits it is the (default) recursion depth (i.e. follow such references only to a certain limit) and the fact that wget will, by default, not follow links that span different hosts (i.e. jump to different domains). Here's how the recursion thing works:

wget -r http://www.geocities.com/mitzenos/

OK, that downloads an entire site. In the case of Geocities, which hosts many different accounts, wget may end up downloading other sites on Geocities. If /mitzenos/ links to /xenerkes/, for example, both accounts are technically on the same host, so wget will just as well download them both. We can solve this problem by using the -np switch [ref1] [ref2]. Note combining -r and -np as -rnp does not work (at least on Windows it doesn't).

wget -r -np http://www.geocities.com/mitzenos/

So that solved most of the problems. Now when we try downloading /xenerkes/ separately, Geocities ends up taking down the site for an hour because of bandwidth restrictions, and you see a lot of 503 Service Temporarily Unavailable errors in the wget output. This is because Geocities impose a 4.2MB hourly limit on bandwidth (bastards). Since the webspace limit for Geocities is 15MB, it makes it difficult to download a site with size between 4.2MB and 15MB.

The solution to this problem is to force wget to download files at a slower rate, so that if the site is, say, 5MB, then the bandwidth will be spread over more than one hour. This is done using the -w switch [ref: download options], which by default takes an argument in seconds (you can also specify minutes, etc). For Geocities, 40-60 seconds per file should be enough, if the files aren't very large. Back when Geocities was popular, it wasn't really normal to have very large files on the web, so that isn't really an issue. This is the line that solves it:

wget -r -np -w 60 http://www.geocities.com/mitzenos/

This command will obviously take several hours to download a site if there are a lot of files, so choose the download interval wisely. If you're exceeding the bandwidth limit then use a large interval (around 60 seconds); if there are lots of files and the download is too damn slow, then use a smaller interval (30-40 seconds).

Saturday, April 25, 2009

Early Sierra games playable online

Some interesting stuff that surfaced between yesterday and today:

Old Sierra Games Playable In Browser Through Open Source Game Engine - a great piece of work at Sarien.net.
Microsoft Suffers Leaks, Lagging Sales Numbers As They Look Forward To Windows 8 - article about the leak of Windows 7 Release Candidate, and the fact that Microsoft are already beginning to plan Windows 8.
My thesis is going through a public evaluation beta this week - check out the chatbot evaluation page.

Friday, April 24, 2009

Oracle buys Sun; Geocities to die soon; Ubuntu 9.04 released

A lot of stuff has happened this week, and keeping up to date with Slashdot is a good idea. Some highlights:

On Monday 20th April, Oracle bought Sun Microsystems
On Thursday 23rd April, MySQL split into two separate forks
On Thursday 23rd April, Ubuntu 9.04 was released
On Thursday 23rd April, Yahoo! announced the end of Geocities

Thursday, April 16, 2009

Google Android SDK 1.5 Early Look

A few days ago, a pre-release of the Google Android SDK 1.5 was released.

Google Android is an operating system for mobile phones. I had to write a program for it in April 2008 (as one of my University Assigned Practical Tasks), back when there was no mobile phone supporting it, and when the SDK was so alpha or beta that it didn't even have a version number and was identified by a milestone number and release number.

Today, the SDK appears to have matured a lot, and so have the tools that come with it, including the emulator. Out of curiosity, I re-installed the Android SDK to see how the emulator changed over the past year. Below are a couple of screenshots.

Anyone wishing to install this pre-release version should follow the instructions on the pre-release page since there are a few differences from the procedure described by the current SDK documentation. Also, running the emulator has become slightly more complex, because of the extra step of having to create an AVD (Android Virtual Device). This tiny complication is for the better, however, as it allows you to create several different emulator configurations.

Sunday, April 5, 2009

A quick look at Ubuntu 9.04 Beta

Ubuntu 9.04 (Jaunty Jackalope) is currently in beta, and is due for a stable release on 23rd April 2009.

I'm mainly a Windows user, but for some tasks (especially programming) I like using Linux. I'm not extremely technical, so this little review is more about covering what the average user expects to get from an operating system, rather than exciting new features like the ext4 filesystem.

For about a year I've been using Kubuntu 7.04, and although support for it has long since stopped, I preferred not to upgrade. One of the main reasons was that I simply did not like the latest versions of KDE. I got my first taste of KDE when I tried Knoppix, and immediately loved it. When the time came to install Linux rather than using a live CD, Kubuntu was an obvious choice over Ubuntu. But today, this is not so obvious any more. Even Linus agrees that KDE 4 is a mess.

Since it is about time I upgrade my Linux distribution, I thought I'd try Ubuntu 9.04. I never liked GNOME (mostly due to aesthetic reasons, but I also never felt comfortable with the menu bar on top), but since KDE has become far worse, I thought I'd give it a second chance. Ubuntu 9.04 comes with GNOME 2.26 [ref: Ars Technica article]. Now this version number means very little to me, but this Ubuntu feels more like Windows than Kubuntu ever did... at least the left mouse button simply selects items rather than trying to run them; and dragging an item will always move it rather than opening up a silly context menu every time.

Ubuntu 9.04 comes with a number of good pieces of software pre-installed. Among these are Firefox 3.0.7 (on Kubuntu I am still stuck with Firefox 2 because I never managed to install Firefox 3), OpenOffice.org 3.0.1, and Pidgin, which to me looks very much like Gaim, but has a fresher look, is easier to configure, and has much less grotesque conversation windows.

One of the features I really liked on Kubuntu was how screenshots are saved. You press Print Screen, and a dialogue comes up prompting you to save the screenshot, without you having to even paste the screenshot in some image editor. This feature is still there in Ubuntu 9.04.

For those people like me who work with multiple computers with different operating systems, it is important to be able to transfer files from one PC to another over the network. In Kubuntu I used to go to "Remote Places" and then to "Samba Shares", and proceed from there. It worked great, but was painful because I had to nagivate through several virtual network folders every time I wanted to locate my shared folder.

In Ubuntu 9.04, there is something similar. You go to "Places" > "Network" and then find your network and host and shared folder. Ubuntu is nicer because it actually mounts the shared folder, so you can easily access it from your desktop next time.

Listening to music on Ubuntu 9.04, unfortunately, is not such a pleasant experience. Both pre-installed media players, "Movie Player" and "Rhythmbox Music Player", aren't capable of playing MP3s without a plugin. Also, I was unable to find my usual 2 Linux media players (VLC and XMMS) using both apt-get and the Add/Remove Applications program, and Amarok and JuK failed to install. I am still lost as to how to play MP3s on this version of Ubuntu... something I had no problem doing on good old Kubuntu.

A couple of things I never managed to do on Linux are printing, and watching videos on YouTube since the Flash player plugin for Firefox is not compatible with x64 architectures. The Flash issue can't be blamed on Ubuntu, but I think anyone would expect to be able to print without much hassle on any decent operating system. With this new version of Ubuntu, I still had no luck in either area.

Other minor things I don't like include how access to the Terminal could be easier ("Applications" > "Accessories" > "Terminal"), and how the shutdown options are available in a counter-intuitive "Live session user" menu in the top-right.

On the whole, Ubuntu 9.04 seems to be very promising, and assuming that some issues get fixed, I may seriously consider using it as my next Linux operating system once my thesis is finished.

Saturday, March 7, 2009

HTTP Communication: A Closer Look

About four months ago, I wrote a very simple HTTP server in Python, since my thesis has a Python artifact and I wanted to integrate it with a server. I've known the basics of HTTP for a year and a half now, but actually writing a server is obviously another story.

For those who aren't familiar with HTTP (I mean the actual protocol itself... everyone knows what it does, but not everyone knows what it looks like), or who know the basics but need to see a few examples, "HTTP Made Really Easy" is a great place to start.

My very simple HTTP server worked nicely, but I soon ran into a problem. If the server was hosted on Windows and I accessed it from a browser on Linux, I wasn't getting the payload of the POST packets (I wasn't getting all of the header either, although I didn't notice at first). I got the payload for all other Windows/Linux combinations (server on Linux, browser on Windows; and both server and browser on same system).

Till today, I had no clue what could be (in my mind) causing Linux to send requests without the payload. Then I decided to use Wireshark to find out what exactly was happening to the packets. Wireshark is a great tool that lets you see the actual data in packets you send and receive.

By using Wireshark, I noticed that the HTTP requests were being split into multiple packets when sent from Linux, while a Windows would send the request as one whole packet. This means the payload would arrive in a second or third packet, and since I had only one send() call, I would not receive it.

The solution is to keep a buffer associated with each single connection (identified by ip:port), and append each packet to it. You know when you've reached the end from the Content-Length field in the HTTP header, which tells you the size of the payload. The payload starts after the first "\r\n\r\n" double-newline, so you can start counting from there.

Now, regarding connections... here's another thing I learned by poking around in Wireshark. I used to think that a browser keeps the same connection open for each website, until the browser is closed, so that it can use the same connection (for efficiency purposes) for the same website rather than opening new connections all the time. Well, that's not the case.

Apparently, each HTTP request starts a new connection, so if you're watching requests coming in from the server side, you'll see the client's port number increase by one each time. Connections are reused only to send multiple packets associated with the same request (as above). In other words, a Linux client would open a connection, split the request into a number of packets, send them via the same connection, and close the connection. Well, almost.

The browser actually doesn't close the connection. If your server just sends the data, the browser will keep waiting for data to arrive. So your server must close the connection immediately after sending the response.