Computer Networking - Notes on Ch.2

Principles of Network Applications

Network applications in the app. layer only really run on endpoint devices. There’s obviously, P2P and client-server architectures, that’s pretty straight forward. There’s a lot of stuff here that’s pretty straight forward but I’m trying to cut down on notes so hopefully it’s not totally useless.

Sockets!

  • Interface between the application layer and the transport layer within a host (an API) between the application and the network
  • You need IP address + Port # to send stuff
    • Web servers at :80
    • SMTP servers at :25
    • others can be found here

Reliable data transfer

  • Guarantee data delivery in the transport-layer
  • loss-tolerant applications can deal without reliable data transfer (VoiP)

Timing & Throughput

  • Some applications have timing restrictions (RTB)
  • applications that have throughput requirements are said to be bandwidth-sensitive applications (also multimedia apps)
  • elastic applications use whatever is available (email, file transfer, etc.)

Transport services offered

  • Essentially two options: TCP and UDP
  • TCP
    • Connection oriented service and a reliable data transfer service
    • Connection-oriented service
      • “TCP has the client and server exchange transport- layer control information with each other before the application-level messages begin to flow.” (handshake)
      • For encryption you have an add-on to TCP called Secure Sockets Layer (SSL)
    • Reliable data transfer service
      • Can rely on TCP to deliver all data without error and in the proper order
    • Congestion control included
  • UDP
    • No frills, no handshaking, non reliable, out of order, no congestion control

HTTP

  • HyperText Transfer Protocol (HTTP) is client-server based and talk through structured messages
  • Uses TCP, stateless protocol
  • Can use persistent or non-persistent connections (same TCP connection for multiple requests or just one)
  • Non-persistent connections can request a bunch of things in parallel, but there’s a lot of overhead to keeping track of all these connections
  • Persistent connections timeout after no use but are sent over the same connection
  • Message format example:
GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: fr
  • First line is request line and the rest are header lines after an additional carriage return and line feed then comes the message body if it has one (POST request).
  • Request Line
    • Method Field
      • GET, POST, HEAD, PUT, and DELETE
    • URL Field
    • HTTP Version field
  • Host Header = Address
  • Connection Header tells the server whether to have persistent connection or not
  • User-agent specifies the browser
  • Accept-language specifies language of browser
  • HTTP Response
HTTP/1.1 200 OK
Connection: close
Date: Tue, 09 Aug 2011 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html

(data data data data data ...)
  • Have a status line with code and HTTP version
  • Connection header says what the server is going to do with TCP connection
  • Date line indicates when the response was sent
  • Server header indicates the server OS, etc.
  • Last-modified indicates when the object was last modified (used for object caching)
  • Content-Length indicates number of bytes in the message body…
  • Content-type is the type of content in the message body…

Telnet!

telnet cis.poly.edu 80

This opens up a TCP connection to port 80 of cis.poly.edu and sends a message that you construct, such as:

GET /~ross/ HTTP/1.1
Host: cis.poly.edu

Cookies!

  • Cookies allow sites to keep track of users
  • 4 components
    • Cookie header line in the HTTP response message
    • Cookie header line in the HTTP request message
    • Cookie file kept on the user’s end system and managed by the user’s browser
    • Back-end database at the web site

Web caching

  • Used to reduce traffic and increase response time
  • Users will be proxied through the web cache, if it has an up to date version it’ll serve that, otherwise it’ll fetch the request from the origin server
  • CDNs are essentially distributed web caches
  • Conditional GETs
    • To check if page is up to date, the cache will issue a GET request with an If-modified-since: XYZ header. XYZ is the stored Last-Modified from the last HTTP response of that web page. If it has not been modified, the server sends something like:
HTTP/1.1 304 Not Modified
Date: Sat, 15 Oct 2011 15:39:29
Server: Apache/1.3.0 (Unix)

(empty entity body)

File Transfer Protocol (FTP)

  • FTP uses two parallel TCP connections to transfer a file, a control connection and a data connection
    • Control connection used for sending information such as user id, password, user commands (get, put, cd). Because control information on a different connection than file data, it is said to be out-of-band while HTTP is in-band
    • Control session remains open during session, but new data connections are made every file transfer

Email

  • 3 major components
    • user agents
      • allow users to read, reply to, forward, save, and compose messages
    • mail servers
      • core email infrastructure, each user has a mailbox on a mail server
      • If a server can’t deliver a message to another server it holds it in a message queue and attempts to transfer it later (every 30 min or so)
    • Simple Mail Transfer Protocol (SMTP)
      • transfer mail between servers if one server acts as a client (the sencindg) server
      • also transfer mail from sender application to sender mail server
      • still requires 7-bit ascii encoding of binary files for transfer
      • direct connection between two servers
      • Dialogue commands for client:
        • HELO
        • MAIL FROM
        • RCPT TO
        • DATA
        • QUIT
        • . => single period means end of data message
      • Server issues replies to each command with status code and English description
  • To get mail from mail server, the recipient uses Post Office Protocol - Version 3 (POP3), Internet Mail Access Protocol (IMAP), or HTTP
    • POP3
      • Open TCP connection on :110
      • authorization (send username/password)
      • retrieve messages (also mark messages for deletion, etc.)
      • Update (after client issues quit) where server deletes messages marked for deletion
      • commands
        • list
        • retr
        • dele
        • quit
        • user
        • pass
    • IMAP
      • Basically POP3 with extra features like folders and maintains user state info accross sessions. Can also obtain certain bits of info about messages (like titles) to save bandwidth

DNS

  • DNS servers typicall run Berkeley Internet Name Domain software (BIND).
  • Run on UDP and typically use port 53
  • Host Aliasing
    • can have multiple host names for one domain
    • the main host name is said to be canonical hostname
  • Mail Server Aliasing
    • MX record can allow aliased hostnames to be the same as web server, but canonical hostname for mail server is different
  • Load distribution
    • DNS used to distribute loads among various servers and a set of IP addresses is allocated to a hostname. DNS rotates the order of the IP addresses for each reply
  • DNS servers are decentralized
    • Root DNS servers
    • top-level domain (TLD) DNS servers
    • authoratative DNS servers
  • Flow goes as follows
    • user contacts Root server for IP of TLD servers for top-level domain (.com, .co, .ru, etc.)
    • Then contact TLD servers, which returns IP address of authoritative server for hostname, finally contacts authoritative servers for hostname
  • Only 13 (replicated) root servers as of 2012
  • Local DNS servers are located @ each ISP and you usually ping them for DNS and they forward your request if need be (if not cached)
  • DNS Records & Messages
    • DNS servers implement a DNS distributed database store resource records (RRs).
    • An RR is a four-tuple that contains
      • Name
      • Value
      • Type
        • A
          • Name is a hostname, Value is IP address
        • NS
          • Name is domain and Value is hostname
        • CNAME
          • Name is alias hostname and Value is canonical hostname
        • MX
          • Name is alia hostname, Value is canonical name of mail server
      • TTL (time to live of resource record until removal from cache)
    • Messages
      • First 12 bytes is header section
        • first field is 16-bit number identifying the query so that the server can match the reply to the request
        • Flags
          • 1-bit query (0)/reply (1) flag
          • 1-bit authoritative flag when DNS server is authoritative
          • 1-bit recursion-desired flag is set when a client desires the DNS server perform recursion when it doesn’t have the record
          • 1-bit recursion-available is set in reply if the DNs server support recursion
  • can use nslookup terminal command to look up DNS records

File distribution time

  • Client-server file distribution time increases linearly:
$$D_{CS} = \max \left\lbrace \frac{NF}{u_s} , \frac{F}{d_{min}}\right\rbrace$$ $D_{CS}$ is distribution time, N is number of peers, F is size of File. $d_{min}$ is download rate of slowest peer and $u_s$ is server upload rate
  • For P2P there is a lower bound for minimum distribution time, as in it levels off:
$$ D_{P2P} \ge \max {\left\lbrace \frac{F}{u_s}, \frac{F}{d_{min}}, \frac{NF}{u_s + \sum_{i=1}^{N} u_i} \right\rbrace}$$

P2P and Bittorrent

  • All the other stuff is pretty basic, but an interesting part is the peer discovery system of bittorrent
    • Tracker sends Alice 50 IPs
    • Alice randomly selects, say, 5 peers and asks for data via TCP connections
    • Every, say, 30 seconds, Alice drops her lowest giving peer and randomly picks a new one and reorders. This way Alice can ensure to get the fastest download because she will find and rotate peers that can give her the most chunks she needs

Distributed Hash Table

  • Need to store key, value pairs in a distributed fashion and have peers update each other
  • Can’t have all peers knowing each other, too many links
  • Circular system
    • Peers only know their immediate neighbors in the circle and pass a message along until you hit the right server and then pass the reply back
    • Can decrease time to get messages by adding shortcuts to the circle
  • Peer Churn
    • Peers can come and go without warning
    • to deal with this, add both the first and second successors to the links in case of loss and verify that its successors are alive with a ping

Skype

  • Supernodes setup call and then P2P unless NAT traversal, then a supernode will forward traffic
Written on October 13, 2014