Skip to content Skip to sidebar Skip to footer

Download Http Thru Sockets (c)

Recently I started taking this guide to get myself started on downloading files from the internet. I read it and came up with the following code to download the HTTP body of a webs

Solution 1:

If you want to grab files using HTTP, then libcURL is probably your best bet in C. However, if you are using this as a way to learn network programming, then you are going to have to learn a bit more about HTTP before you can retrieve a file.

What you are seeing in your current program is that you need to send an explicit request for the file before you can retrieve it. I would start by reading through RFC2616. Don't try to understand it all - it is a lot to read for this example. Read the first section to get an understanding of how HTTP works, then read sections 4, 5, and 6 to understand the basic message format.

Here is an example of what an HTTP request for the stackoverflow Questions page looks like:

GET http://stackoverflow.com/questions HTTP/1.1\r\n
Host: stackoverflow.com:80\r\nConnection: close\r\nAccept-Encoding: identity, *;q=0\r\n
\r\n

I believe that is a minimal request. I added the CRLFs explicitly to show that a blank line is used to terminate the request header block as described in RFC2616. If you leave out the Accept-Encoding header, then the result document will probably be transfered as a gzip-compressed stream since HTTP allows for this explicitly unless you tell the server that you do not want it.

The server response also contains HTTP headers for the meta-data describing the response. Here is an example of a response from the previous request:

HTTP/1.1 200 OK\r\n
Server: nginx\r\n
Date: Sun, 01 Aug 2010 13:54:56 GMT\r\n
Content-Type: text/html; charset=utf-8\r\n
Connection: close\r\n
Cache-Control: private\r\n
Content-Length: 49731\r\n
\r\n
\r\n
\r\n
<!DOCTYPE HTMLPUBLIC"-//W3C//DTD HTML 4.01//EN" ... 49,667 bytesfollow

This simple example should give you an idea what you are getting into implementing if you want to grab files using HTTP. This is the best case, most simple example. This isn't something that I would undertake lightly, but it is probably the best way to learn and appreciate HTTP.

If you are looking for a simple way to learn network programming, this is a decent way to start. I would recommend picking up a copy of TCP/IP Illustrated, Volume 1 and UNIX Network Programming, Volume 1. These are probably the best way to really learn how to write network-based applications. I would probably start by writing an FTP client since FTP is a much simpler protocol to start with.

If you are trying to learn the details associated with HTTP, then:

  1. Buy HTTP: the Definitive Guide and read it
  2. Read RFC2616 until you understand it
    • Try examples using telnet server 80 and typing in requests by hand
    • Download the cURL client and use the --verbose and --include command line options so that you can see what is happening
  3. Read Fielding's dissertation until HTTP really makes sense.

Just don't plan on writing your own HTTP client for enterprise use. You do not want to do that, trust me as one who has been maintaining such a mistake for a little while now...

Solution 2:

The problem is, you have to implement the HTTP protocol. Downloading a file is not just a matter of connecting to the server, you have to send HTTP requests (along with proper HTTP header) before you get a response. After this, you would still need to parse the returned data to strip out more HTTP headers.

If you're just trying to download files using C, I suggest the cURL library, which does the HTTP work for you.

Solution 3:

You have to send an HTTP request before expecting a response. You code currently just waits for a response which never comes.

Also, don't write comments in all caps.

Post a Comment for "Download Http Thru Sockets (c)"