Saturday, March 7, 2009

HTTP Communication: A Closer Look

About four months ago, I wrote a very simple HTTP server in Python, since my thesis has a Python artifact and I wanted to integrate it with a server. I've known the basics of HTTP for a year and a half now, but actually writing a server is obviously another story.

For those who aren't familiar with HTTP (I mean the actual protocol itself... everyone knows what it does, but not everyone knows what it looks like), or who know the basics but need to see a few examples, "HTTP Made Really Easy" is a great place to start.

My very simple HTTP server worked nicely, but I soon ran into a problem. If the server was hosted on Windows and I accessed it from a browser on Linux, I wasn't getting the payload of the POST packets (I wasn't getting all of the header either, although I didn't notice at first). I got the payload for all other Windows/Linux combinations (server on Linux, browser on Windows; and both server and browser on same system).

Till today, I had no clue what could be (in my mind) causing Linux to send requests without the payload. Then I decided to use Wireshark to find out what exactly was happening to the packets. Wireshark is a great tool that lets you see the actual data in packets you send and receive.

By using Wireshark, I noticed that the HTTP requests were being split into multiple packets when sent from Linux, while a Windows would send the request as one whole packet. This means the payload would arrive in a second or third packet, and since I had only one send() call, I would not receive it.

The solution is to keep a buffer associated with each single connection (identified by ip:port), and append each packet to it. You know when you've reached the end from the Content-Length field in the HTTP header, which tells you the size of the payload. The payload starts after the first "\r\n\r\n" double-newline, so you can start counting from there.

Now, regarding connections... here's another thing I learned by poking around in Wireshark. I used to think that a browser keeps the same connection open for each website, until the browser is closed, so that it can use the same connection (for efficiency purposes) for the same website rather than opening new connections all the time. Well, that's not the case.

Apparently, each HTTP request starts a new connection, so if you're watching requests coming in from the server side, you'll see the client's port number increase by one each time. Connections are reused only to send multiple packets associated with the same request (as above). In other words, a Linux client would open a connection, split the request into a number of packets, send them via the same connection, and close the connection. Well, almost.

The browser actually doesn't close the connection. If your server just sends the data, the browser will keep waiting for data to arrive. So your server must close the connection immediately after sending the response.

Friday, March 6, 2009

DOSBox for Dummies

What is DOSBox?

DOSBox is a program that emulates DOS, allowing you to run most old games that might not run on modern operating systems.

This is what it looks like (as soon as you run it):

This is what it looks like after running a game:

How do I use DOSBox?

DOSBox can be used like any command line interface. The commands are pretty standard: cd changes directory; typing the name of an executable runs that executable, etc.

But first, before you can access your files, you need to mount a drive. In DOSBox you start at drive Z:, which is virtual, so you need to map a drive in DOSBox (e.g. C:) to a particular drive or folder on your hard disk. You can mount drive C: as follows:

mount c c:\

DOSBox recommends that you don't map a DOSBox drive directly to a root directory, so you should use some folder instead of C:\.

Note that in case a folder name is longer than 8 characters or contains spaces, you should use the tilde version of that folder name (e.g. administrator -> admini~1, i.e. take the first 6 characters and add ~1). In case a folder name contains spaces you should do something similar but take the first word instead, but I believe DOSBox still has problems with folder names containing spaces.

Writing Batch Files for DOSBox

If you use DOSBox often and don't want to do some of the repetitive tasks (e.g. mounting a drive, or running your favourite game) every time, you can write a batch file to automate the process.

A batch file (on Windows) is a text file with a .bat extension (e.g. u5.bat). In the batch file you write a list of commands that you would write in the command line; each new line runs the previous command.

The following is an example of a batch file I wrote to run Ultima 5 right away:

cd tools
cd dosbox
dosbox -c "mount c C:\docume~1\admini~1\Desktop" -c "C:" -c "cd U5" -c "ultima"

Each line is run as a separate command. In the first 3 lines, I'm going to the DOSBox directory, and in the fourth I'm running DOSBox.

Now DOSBox is nice because you can give it certain parameters, one of which is "-c". "-c" means that the following parameter is a command to be run in DOSBox. Like this, you can make DOSBox run several commands as soon as it starts, without you having to type them. Line 4 shows four commands: first I'm mounting the C drive, then I'm switching to it, then I'm going to the U5 directory, then I'm running Ultima 5.


Some games may run too fast or too slow. Hit Ctrl+F11 to slow DOSBox, or Ctrl+F12 to speed it up.