![]() ![]() 16 February 2001 ![]() |
![]() |
Into Java, Part 14
Streams are imporant data structures to know of. Not long after the
first computers were powered on, directly controlled by switches, terminals cabled to the machines were introduced.
Over time, keyboards of many different kinds surfaced and needed to be plugged into the computer, and devices
like video screens also needed connections. Today we use files on disks or other media, networks, radio links
and quite a few more techniques to receive or send information. |
Consider a pipe providing one byte, either from time to time as
from a keyboard that is used infrequently, or at a high speed from reading a locally stored file. In both cases,
there is only one byte at a time provided, you have to remove it to give room for the next one. That is what the
most basic stream looks like, which leads us to the InputStream
. The OutputStream
is the exact opposite, a pipe that can take one byte at a time, but we will come to
that class in a moment.
InputStream
) and the character streams (Reader
) provide almost the same functionality, I discuss them together. Later
we will see how the two groups differ in usage. A basic input stream provides these methods:void close() // abstract in Reader
void mark()
int read() // abstract in InputStream
int read(char[] buf)
int read(char[] buf, int offset, int length) // abstract in Reader
void reset()
void skip(long n)
|
read
methods,
the basic one returning an int
(I
presume an int
is chosen
since Reader
may return
an int
in the range of
0 to 65535 (0x00-0xffff
),
or -1 if the end of the stream is found), and the other ones filling a provided char
array with characters one at a time.reset()
is used to start over
from the place mark()
was
used to put kind of a "book-mark", if the stream provides "book-marks". We may also skip(long n)
bytes or characters if
we want to. When finished reading the stream we close()
it so the system resources will be returned.
These methods differ in InputStream
and Reader
only in the former processing bytes and the latter characters. The characters
are represented internally within your Java application in Unicode format, but externally the encoding depends
on the underlying system and the actual stream processed.
Now an observation: read
is a rather dumb method, it will sit and wait at the end of the input stream
for more data. At least until an IOException
arrives.
Hence, if you read from a stream, your application will freeze if the stream stops for a while, as a stream over
the Internet may do. Later we will see how to take care of that.
byte
or char
respectively:
void close() //
abstract in Writer
void flush() //
abstract in Writer
void write(int x) //
abstract in OutputStream
void write(x[] x)
void write(x[] x, int off, int len) // abstract
in Writer
void write(String str) //
Writer only
void write(String str, int off, int len) // Writer only
The write(...)
methods are self-explanatory, as is close()
. flush()
tells
the system to write the data right away, if buffering is used.Three pipes from the System
class are always available, System.in, System.out
and System.err
. The former two we have used, especially System.out
. System.in
is
an InputStream
while the
latter two are of the OutputStream
type.
However, not all of the methods mentioned so far are implemented,
see the abstract box. All these methods will be implemented by the classes we wrap these basic streams up in.
Hence we may consider these basic stream classes a plain pipe, and we have to wrap it up in convenience classes
if we do not want to do a lot of tiresome coding ourselves.
BufferedInputStream
is a FilterInputStream
, that is an InputStream
, that is an Object
. Hence you may for example use an InputStream
reference from somewhere. And from that abstract object
we may make ourselves an InputStreamReader
that
is a Reader
so we can make
a BufferedReader
.I have colored the base IO classes so it can be easily seen where
they may be used in constructors.
|
Let us start with File
, a handy object that represents either a file path and file name, or only
a path to a directory. Its implementation differs from one operating system to another, as the Unix path separator
/ is represented with \ on an OS/2 machine. To play safe, we use the File.separator
static variable when writing paths.
File
is constructed with only a file name, a complete
path, or a path as one string and the file name as another string. The instantiated object can now answer many
questions, such as canRead(), canWrite(), exists(), isDirectory(),
etc. Hence, if you have to create a file yourself, it may be a good idea
to instantiate a File
object.
Since read()
can only read one data packet a time, sitting in a loop until maybe the end-of-line
(EOL) character arrives, it seems convenient to use classes that have methods like
readLine(). BufferedReader
and
LineNumberReader
have such methods, thus when reading
text files, one of these two classes is most often used, mainly the BufferedReader
. We used such a reader in Into Java 4 and No
5 and we will use BufferedReader
today
too.
A speedy data structure that provides a way to tell the difference
between objects is Hashtable
,
two equal words will produce the same hash code, and we may use containsKey(Object
key)
to find out if a word is used more than once. But
where do we store the hits since a String
cannot
hold hits? It looks like we have to make ourselves a helper class that holds one word and a counter. Let us start
with that one.
Since that helper class is to be used in a
Hashtable
it must implement
hashCode
and equals
. On the other hand, since we work with String
objects, we may use the methods of that class and we
will just do a call-back to the String
class'
methods on each of the methods mentioned. (If we use Java 2 and would like to get a sorted output, we must implement
the java.lang.Comparable interface, and that has only one method to implement, compareTo(Object other). That is
because the static Collections.sort() method demands that the objects to be sorted support that interface.)
|
Having this helper class we may continue with the WordCounter class.
We will settle for a tiny terminal window version, although it could be increased into a GUI application, using
this class as an invisible engine. The class we are making shall have a Hashtable
, hence we must import the java.util
package. We must also import java.io
to get the file readers.
|
I think I have mentioned that there are system dependent characters
and line.separator
is one.
A way to support system independence in Java is to use the variables available through the
System.getProperty
method. There is a list of such variables
near that method in the Java API. The first line of main
makes it valid.
The next task is to get the filename from the input argument, we
must assure ourselves that there is a parameter to read, else we notify the user and quit automatically. Once
we have a valid input, we instantiate an object of the WordCount
class type, passing the input argument as a parameter to the constructor.
So far we know that we have a valid argument from the user, but wait, didn't we use
throws IOException
the last time we worked with file
reading?
This time we instantiate a handle to the file we want to access
even though we are not yet certain if there is a file to read from. Fortunately File
is an abstract handle to a file and does not need a
file, hence we may create ourselves a file handle to use. The next step is to ask the handle if the file exists
, and if not, tell the user and
exit.
A small note, last time I mentioned briefly that using primes as
the starting value for Hashtable
gives
better results. I will not argue further on that but please note that I looked
one up and am using that prime, 2671, and that will be used for a while. Naturally, if you would like to count
a huge file, you will need to increase this to a much bigger prime.
So far you may compile without error and try the error messages.
Nothing else will work since we have not done anything to the count
method.
|
We chose to have the count
method public, thus it may be used by GUI apps. (To do this, you would instantiate
an object with the file name as parameter and call count
. Unfortunately I will not make it quite that easy, you will have to change some lines
to redirect the output to a text area.) Now we need a try/catch
block since readLine
may go weird, and if so, an error message needs to be printed. This time we use the System.err
that is the standard pipe
to print error messages to. At this point it is still equal to System.out
, but it might very well be redirected to a log file or any other stream.
We use the File
handle to make a FileReader
, which is used to make a BufferedReader
, a convenience class that has some useful methods like
readLine
, which are preferred to the low level methods
of FileReader
. Next, we
start reading the file line by line as long as there are more lines to read. Please, note the parentheses in the while
clause.
Every line read is sent to processLine
which simply wraps a StringTokenizer
around the line. This useful piece of code is located
in java.util
and gives
you tokens delimited by blanks, or any other whitespace, if you do not specify your own delimiters. As long as
the line has more tokens (words) the while
loop
continues. Finally we close
the
file, although this time it is not strictly needed since we are only playing with it, in the future you might
be working on networks where you should be more polite.
Recall that a Hashtable
needs two things, a key
to map from, and the value
to store. They need not be the same thing, as in this case. We use the words
as keys to the table, but we store the Word
instances
made out of the actual tokens.
We use the token as a key
and want to know if it is already stored in the table, if so we increment
the word count. Else we make a new Word
instance,
use the token as key
and put
the object as
value
in the table. In the end, the file will be read
and all the words put
into
the table and counted.
Now we will continue with the "more code to come" part.
What to do with the output? I have made two versions available, one for Java version 2 (that is Java 1.2 and above)
and the one actually used here for the prior versions. If you use a later flavor of Java, please remove the appropriate
lines and make a few changes to the code as explained, both in this class and in the
Word
class where implements
Comparable
must be visible.
This time we get ourselves an Enumeration
, that is an abstraction of any data structure that is enumerable. This
interface has two methods, public boolean hasMoreElements()
and public Object nextElement()
that operate on the underlying structure.
Using such interfaces hides the actual data structure and you may
conveniently change from a Vector
to
an ArrayList
(Java 1.2)
without too much work. For example, there is not a single line to be changed in the
while
loop, Enumeration
still works.
|
The very next thing is to get ourselves an output filename. We
use a StringBuffer
which
does not cause any overhead to the JVM (as concatenating and mixing with String
s do) and it has many useful methods. Since we do not know if we can lengthen
the filename (maybe you use 8+3 FAT) we alter the first two characters. Another option would be to change the
file extension, if we know there is one.
Another try/catch
block
encompasses a new FileWriter
that
is set to not append to an existing file if there is one. For example, if you have a log file that is added to
once in a while, you instantiate a FileWriter
object
with boolean true
, and
it will append to the end of the existing log file.
The while loop is mainly self explanatory due to the narrative
method names. Please note how convenient the toString
method may be from time to time. toString
is not only a good debugging method, it can serve our purposes this time
as well. It does not provide splendid output but it is speedily implemented <grin>.
Finally the output file is closed and the
count
method is finished. Compile and go for it. Optionally
you may add a
to the method and you will see how close to the prime we got.System.out.println("The table size is: " + table.size());
The put
and lookup (containsKey
) do not depend on the size of the table, the time is constant with increasing sizes.
Further we found that is was not hard to get a list of the contents,
we used an Enumeration
and
found a nice interface, having only two methods.
We have touched on the useful File
class, a handle to a file, existing or not. Instances
from File
may be used with
some other classes, as we did.
BufferedReader
encapsulating
a FileReader
showed itself
to contain many useful methods. Reading a stream always follows the same pattern:
FileWriter
to print the results, and as with input streams, the output streams follow
the same procedure: try/catch
mechanism
before but there is more to say about that. Exception handling is one of the bigger strengths of Java and using
them in a good, sensible way may makes your apps reliable and robust.I must also announce that I have to make this column much shorter
in the future, mostly because my time is limited and I do not have unlimited strength. Still I will try to do
my best, and I hope you enjoy future installments.
Previous Article |
|
Next Article |
Copyright (C) 2001. All Rights Reserved.