HTML Overview
Bill Fugate, July, 2000

 

Web pages are written in a language called HTML (Hypertext Markup Language).  Most web pages are created with web publishing tools like Microsoft Frontpage, which let you compose a web page with design tools and then click button to have the appropriate HTML code written for you.  Such tools are convenient but not necessary.   I think it is fun to see how HTML itself works.

This document provides a brief overview of HTML itself.  Reading this document and performing the exercises in it takes less than an hour and requires no special software other than what you already have on your PC.  When you are finished, you will have:

    • A feel for basic HTML features like tags, tables, images, and links
    • Several small HTML Files that actually work
In other words, you will have just enough to get started. To create meaningful HTML files, you will need additional information, such as a book on HTML.
 
 

Preparation

Begin by opening Windows File Explorer and creating a temporary work directory to hold the files you will be creating.  (You do this so you don't accidentally interfere with anything else on your PC.)  This overview presumes that you are storing files in a directory called c:\temp, but you can call it anything you wish.

Next, you need an easy way to edit the HTML files.  The odds are high that you have a web browser on your PC called Internet Explorer.  If so, simply open Internet Explorer and click on tools, Internet Options, Progams.  Under "HTML editor," choose Windows Notepad and click OK.  Now whenever you are in File Explorer, you can right-click on an HTML file and choose Edit, which will open the file in Notepad.  (Do you use a web publishing program like Frontpage?  If so, you will probably want to change the HTML Editor setting back to its original value when are finished with this exercise.)  Note that using Internet Explorer to change this setting does not mean that you must use that browser for the remainder of this exercise; for that, you can use Netscape or any other browser.

(In the unlikely event that you don't have Internet Explorer, there is a more tedious way to accomplish the same thing.  First, make sure that file extensions are being displayed in Windows File Explorer.  If not, click on View, Options while within Windows File Explorer.  Go to the View tab and uncheck the "Hide MS-DOS file extensions" box.  Then you need to make sure that your PC is set up so that *.txt files are associated with the Notepad text editor and *.htm files are associated with your browser. This allows you to open these two types of files by double-clicking on them.  These associations are probably already in place on your PC. If not, you can create them by using File Explorer and clicking on View, Options.  Click the File Types tab and find "Text Document" and "Netscape Hypertext Document" [if you are using the Netscape browser; if you are using the Internet Explorer browser, just look for a similar name for it]. If necessary, you can open Notepad manually by clicking on Start, Programs, Accessories.)
 
 

HTML

HTML files are ASCII text files that contain special tags. You can create HMTL files with any ASCII editor, such as Notepad. You can also create them with general editors such as Word; see that editor's Help system for details.

Your web browser can read any text file, even if it doesn’t contain HTML tags. To demonstrate this, create a file called C:\temp\temp1.txt and type this text into it.

This is a text file. (An easy way to create a text file is to right-click on white space in the right pane of Windows File Explorer and then click on New, Text Document.)

After saving the file, you can display it in your browser by entering this URL and pressing Return: file://C:/temp/temp1.txt
 
 

Tags

HTML tags tell the browser how to display the file contents. To create a simple HTML file, use Notepad to edit temp1.txt, which you created above, and change the text as follows:

This is an <b>HTML</b> file. Save it and click the browser’s Refresh button to view that file’s contents. At this point, you should see exactly what is typed above.

Next, rename the file to temp1.htm. Change the URL in the browser to point to temp1.htm instead of temp1.txt, and press Return. Now the browser displays "This is an HTML file."

By renaming the file to *.htm, you told the browser to process HTML tags, which are delimited by "<" and ">". The <b> tag tells the browser to begin displaying text in bold font, and the </b> tag tells it to quit. Most HTML tags appear in pairs, with the ending tag identical to the beginning tag except that it is prefixed with a "/".

It doesn’t matter whether you use upper or lower case letters within HTML tags; <B> works the same as <b>. Note that the browser does not display anything between the "<" and ">" symbols. Or, more accurately, if the browser does display the contents of an HTML tag, that means there is a syntax error somewhere in the file that needs to be corrected.

For something fancier, rename the file back to temp1.txt and edit it with Notepad. (The general procedure throughout this exercise is rename a file to something.txt to edit it, and then rename it to something.htm to display it in the browser.) Change the contents to read as follows:

This is an <font color=red size=+3 face="ariel"><b>HTML</b></font> file. This adds a font tag that specifies that the word "HTML" is to be displayed in bold with a red Arial font that is three sizes bigger than usual

Rename the file to temp1.htm, and display it in the browser. To display it in the browser, probably all you need to do is to double-click the temp1.htm file. If that doesn’t work, type the URL of the file into your browser.
 
 

Links

To illustrate an HTML link, use Notepad to create a second text file called temp2.txt. Type this text into the file:

There is link to the temp1.htm file right <a href="temp1.htm">here</a>. Save the file and rename it to temp2.htm. Run it in your browser by double-clicking on it. If you click on the word "here" the browser will display temp1.htm, as specified by the <a href="temp1.htm"> tag.

You have now created a web of two connected HTML files.
 
 

Images

You can include an image in a web page by pointing to the image file with HTML code something like this:

An image appears right here <img src="whatever.gif"> To see what happens when this code is displayed in a browser, create a temporary HTML file named temp3.htm as explained in previous sections and type the line above into it, replacing "whatever.gif" with the name of the image file you have chosen.

In you need an image file to use for testing, you can retrieve image files directly from the web as follows:

  • Find an image on the web
  • Right clink on the image
  • Choose "Save Image As…"
  • Save the image file in the directory you have been using to create HTML test files
"Boilerplate" HTML tags

The one-line HTML sample files above work on most browsers, and they are good enough for test programs, but you wouldn’t want to put such bare-bones files into production. A complete HTML file should include these additional tags:

<html>
<head>
<title>Whatever you type here will be displayed at the top of the browser window.</title>
</head>
<body>

This is an <b>HTML</b> file.

</body>
</html>

If you type this code in to a test file and run it in your browser, you will see exactly what you saw in an earlier section, except that you will see the title displayed at the top of the browser.

These additional tags look useless at this point only because the HTML code in this overview is so simple. They have a purpose in more realistic HTML files, so it is good to get into the habit of always including them.
 
 

Tables

Below is HTML code for a simple table. The indentations are for readability only. The <tr> ("Table Row") tag marks the beginning of a new row. The <td> ("Table Data") tag marks the beginning of a new column in a row.

<table border=1>
<tr>
<td> first row, first cell </td>
<td> first row, second cell</td>
</tr>
<tr>
<td> second row, first cell </td>
<td> second row, second cell </td>
</tr>
</table>
Create a test file with the code above and display it in your browser to see what happens. What happens if you modify the file and use "border=3"? What about "border=0"?

Tables are often used to control the placement of images by placing a pointer to the image file inside a table cell, like this:

<td><img src="whatever.gif"></td> To see the results, modify the table test file you just created, inserting into one of the table cells a pointer to the same image file you used in the Image section above

Tables can also be used to control the placement of text. Unlike word processors, you cannot format HTML text output with tabs, because a tab character in an HTML file, or even a series of tab characters, will be displayed by the browser as a single blank space. One trick is to place text in a two-column table with invisible borders, placing text in the second column and leaving the first column blank. That makes the text look like it has been tabbed to the right. You can make a table’s borders invisible by specifying "<table border=0>".
 
 

Forms

Forms are used to pass information to other programs for further processing. It is beyond the scope of this overview to present a complete working form, but the example below gives the flavor of how they work.

<form action="geo_code.cfm" method="post" name="Geo_Code_Form">

Enter Geographic Code here: <input type="text" size="2" name="geo_code">

</form>

The ACTION clause specifies what action to take when the user submits the form. In this case, it would run a ColdFusion program named geo_code.cfm, which might perform a database lookup, send email, etc. The form in this example is named "Geo_Code_Form" and it has a two-character data entry field named "geo_code".

To see what this form looks like, type the code above to a temporary HTML file and display the results in a browser.

An actual working form would be much more complex than this example, and it would have a SUBMIT button to call the application that performs the additional processing. That application could be written in ColdFusion, Perl, Visual Basic, etc.

If the form above did have a SUBMIT button, and if the form user clicked on it, an error message would appear unless there truly was a program named geo_code.cfm ready to take over at that point.
 
 

Spacing

HTML can be confusing at first when you are trying to format a page:

    • As explained earlier, the browser collapses a series of tabs in the HTML code into a single blank space.
    • Likewise, it collapses a series of blank spaces into a single blank space.
    • It ignores carriage returns in the HTML code.
Fortunately there are HTML tags to handle all these situations.
    • Tables can be used to simulate tabs.
    • To enter more than one blank space, the special tag <nbsp> is used, each appearance of which in the HTML source code will produce exactly one blank space.
    • Carriage returns are simulated by the <br> ("break") tag , which produces a single space, and by the <p> ("paragraph") tag, which produces a double space.
Publishing

Publishing a web involves nothing more than copying that collection of web files to a machine that has web server software installed on it and that is available to the public.
 
 

HTML Isn’t a Programming Language

HTML is not a programming language, it is a markup language. That is, instead working with IF-blocks, FOR-loops, etc., HTML is designed to process tags that specify how items are to be displayed. Note the following points:

  1. If a tag in your HTML code contains a misspelled command, you probably won’t see an error message, because browsers ignore any HTML tags they don’t understand. The only sure way to find errors is to inspect the output by eye.
  2. It is sometimes said that HTML tags don’t issue commands to browsers, they issue suggestions. There is nothing that requires a browser to do what you hoped it would do with your "command." A common problem is that a web page sometimes looks significantly different on two different browsers, such as Netscape and Internet Explorer. People who build commercial sites test their HTML code with all major browsers, and with all available versions of those browsers.
JavaScript

Unlike HTML, JavaScript is a programming language, books on which are available at major bookstores. (Despite the similarity of their names, JavaScript has no relationship to the Java programming language.)

JavaScript is different from HTML, yet JavaScript code can be imbedded within an HTML file where it can be used to handle form field validation, calculations, and other logic tasks.

Below is a contrived sample of JavaScript code. You can type it into a temporary HTML file, as explained above, and run it in a browser to see what it displays.

<script language=javascript>
var action = "exit";
if (action == "exit")
alert ("Are you sure you want to exit?");
</script>
This code causes a small box to appear on your browser screen asking, "Are you sure you want to exit?" In this simple example, nothing happens when you click on OK, but you could use code like this inside a larger program that performs some action when the user clicks OK.
 
 

HTML Editors

HTML files for web sites are usually built with special editors, like FrontPage or DreamWeaver. But, as the examples above show, these editors aren’t really necessary, especially for simple chores, although they certainly make life easier.

Learning the basics of raw HTML coding is beneficial even if you do use an HTML editor, because editors sometimes get confused, forcing you to manually correct the HTML code they produce.

You can view the HTML source code behind any web page, including this one. If you are using Netscape, for example, click on View, Source. Using cut-and-paste, you can "borrow" that code for your own use.
 
 

Index pages

Browsers can be used to display the contents of a directory. For example, you can display the contents of the root directory of your PC by entering this URL: file:///C:/

You probably don’t want users to view the raw contents of the directory that holds your web pages. To prevent this, place an index file in that directory. An index file is an ordinary HTML file with a special name, which is usually either index.html or index.htm, depending on how the web server is set up. If the user’s URL specifies only the directory name, the index file will be displayed by default, which means that a browser can’t be used to view the contents of that directory.

A good practice is to include an index file in every directory and subdirectory on your web site. Index files typically display a menu of available pages.