Aim:
I aim to give a good overview, and insight, into making good flat
database structures in perl without too much hassle. I will be using a
telephone database as an example.
The following fields will be used:
- ID
- Name
- Address
- Telephone number
Note: You will require an intermediate perl skill level.
Tip: the correct database terminology for a 'data section is a field. 'Rows' in a database are considered 'records'.
What are flat databases?
A flat database is a database within a single file. A database is a set
of data within a structure (in memory or a file, for example). All data
stored in a database, is stored within validity constraints - that, in
simple terms, means the data stored, is validated, and only stored if it
passes a validation check.
Against popular belief, flat databases can be linked to another
database, or joined; but for this tutorial, I will cover flat databases,
without joining, in perl.
Validation:
Data must be checked before it can be stored in a 'good' database. If
you have an input string, and you require this input string to be
numerical, like a telephone number for example, you would use the
following expression:
print "the data is not a number" unless (/^\d+$/);
|
|
Perl: \d is the numerical regex operator in perl. If this returns false, then the unless condition will 'run' the print statement.
This can be extended further, to include spaces, and brackets, as most telephone numbers require an international dialing code.
print "the data is not a valid telephone number" if (/[^\d|\+| |\(|\)]/);
|
|
Perl: in regular expressions, [ ], delimit individual
comparisons, and the pipe character (|) separates these. The caret means
'not'. Here you could use the hex equivalent of 'space' if wanted.
To put this 'data validation' into action, the following script my help you:
#!/usr/bin/perl
# fdb_validation1.pl
#
# Simple data validation in perl
print "Simple data validation\n", '-' x 22 . "\n", "Hit CTRL-C to exit\n\n";
while(<STDIN>) {
unless (/[^\d|\+| |\(|\)]/) {
print "The input is not a valid telephone number\n\n";
} else {
chomp;
print "Great! $_, is a proper telephone number!\n\n"
}
}
|
|
To sum up: validation checks to see if data entered is of the correct type. Proper databases should run a validation check on all input.
Storing databases:
I'm going to cover the theory to storing a flat database first, because
it's the most important to understand, and it introduces you to storage
concepts.
We know that validation checks to see if data is ok, but we don't know
how to store information in a flat database. A structured (aka 'active'
or 'online') database will store data in memory, and to disk. We do not
have the option of storing to memory, because we are using a flat,
offline, database. Therefore, the entire database state, should be
stored to file after being modified.
When storing to file, we need a method of storing the data so it can be
read again. If the data cannot be read correctly, then there is no
reason in storing it. There are two used methods of storing data.
- Chunks:
Chunks of data are often the fastest method of saving data, because they
do not require any conditional formatting. A section of a 'chunk
database' may look like the following:
29
data
data
more data
even more data
Each field is on it's own line, and each line has a carriage return
directly after the end of the data. If there is no data, where data
should be stored, a carriage return is still used to 'hold' the database
structure. The records are stored in lines of 5. In code, a data chunk
is often referenced as an array, for example,
my @current_chunk = @db_lines[10..15];
|
|
Here, 5 lines from 10 to 15 are stored in the array, @current_chunk. Obviously you can then access fields (when knowing the format), with $current_chunk[3];
You may wish to use chunks in your database if: 1) you want speed, and
2) you wish to use less regular expressions when retrieving from the
database.
- Rows:
This is the most popular method of storing data. A line from a flat database might look like:
29:data:data:more data:even more data
This method is popular because, 1) it is easier to handle with loops,
and 2) it is easier to read by sight. Here, the fields are separated by
colons (:). Records are separated by carriage returns.
For this tutorial, we're going to use the latter - sounds a little more
complicated? Well, maybe it sounds so, but it's not really :)
Reading a database in code:
There are now two ways you can go about reading a database, you can: a)
read into an array and loop to use, or b) read into a hash, loop to use,
or require a key to use. We're going to cover the latter, as I believe
hashes are such a magnificent thing not to waste.
Perl: check perldoc's hashref before proceeding, as it's very
important one knows the difference between a hashref and a hash. Even I
slip up sometimes.
Opening, storing, and closing a file:
open(DATA, "filename") or die "Sorry, I could not open the specified file because: $!";
my @filelines = <DATA>;
close(DATA);
Ok, thats the easy bit to opening a file, but, lets make it a little more fool proof :)
my @db_lines;
if (-e $db_file) {
open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $";
@db_lines = <hDB>;
close(hDB);
}
|
|
The advantage of this, is that the script will not die on first use,
because of the '-e'xist condition. Should you wish, you can add your own
else statement and touch the file, but you'll not need to do this as
the write function later offers an alike process when it unconditionally
appends.
- Splitting delimited lines:
In the examples above, I used, :, as a line delimited. This is not a
good choice, as some input may use a colon and end up 'shifting the
data'. So for posterity, and 'because I can', I'm going to use the tab
character, \t. Lets have a look at one of those lines again.
29:data:data:more data:even more data
|
|
would now be,
29 data data more data even more data
|
|
- Parsing:
($data1, $data2, $data3, $data4, $data5) = split("\t", "29 data data more data even more data");
|
|
That was easy wasn't it? Lets add the last to sections of code together and see what we get :)
my @db_lines;
if (-e $db_file) {
open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $";
@db_lines = <hDB>;
close(hDB);
foreach $db_line (@db_lines) {
chomp;
my ($data1, $data2, $data3, $data4, $data5) = split("\t", $db_line);
# We deal with the values from the file here :)
}
}
|
|
- Hash'ed :)
I mentioned that we were going to use hashes earlier. Just to recap on
them, as I assume no one read the hashref perldoc, a hash is an
associative array. This means, a hash can old arrays within itself
access them by association. hashref talks about the simple issue
you need to understand: hashes can only hold scalars arrays! But this
isn't to limit our work in anyway, as a reference is also a scalar. A
reference, for those that don't understand pointer concepts is as
follows:
$ascalar = "this is a scalar"; # set
$another = $ascalar; # copied
$yetanother = \$ascalar; # referenced
print $ascalar; # ok
print $another; # ok
print $yetanother; # NNOOOOO!
print ${$yetanother}; # great!
print $$yetanother; # simplified
|
|
A reference is a pointer, much alike those used in C. (but references in
C start with an amperstand, [&] and pointers with an asterisk [*],
just to confuse you :) It points, or references, a memory location that
the compiler must be told to 'use' correctly. If it is not told that the
place is an area of memory, it won't be treated like one, and end up
printing the 'location' it points to, rather than the locations content.
When learning BBC basic, I remember variables and pointers much like a
mail sorting office, where they have rows and rows of boxes, each of
these boxes with a number - this is the 'key' to locating them. Within
the box is the data stored (an envelope).
When we use, ${ scalar }, we tell the interpreter that the data
is a reference, and it must then get the data from that reference in
memory, and use that when called on. It is also possible to to use
@$scalar, if $scalar is a reference to a memory space where an array is
stored.
That's a lot of recapping on hashes, now, how to use them.
my %hash; # this is a hash.
my $hashref = \%hash; # this is a reference to that hash
$hashref{'key'} = "value"; # this is setting a key, and a value pair.
print $hashref{'key'}; # this prints the value of key (which would be "value")
my @array = ('29', 'data', 'data', 'more data', 'even more data'); # makes an array called array
$hashref{'array'} = \@array; # sets the key 'array', to a REFERENCE of an array with our data
foreach $value (@{$hashref{'array'}}) { print "$value\n"; } # loops and prints each element of this new array stored in the hash.
|
|
That sounds easy enough, so, using a unique value in our database as a
key, we should be able to create a hash holding our details... lets see!
my @db_lines;
my %database;
my @validation = ('NUM', 'TEXT', 'TEXT', 'TEXT', 'TELEPHONE');
my $db_file = "database.txt";
my $sep = '|';
if (-e $db_file) {
open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $_";
@db_lines = <hDB>;
close(hDB);
foreach $db_line (@db_lines) {
chomp $db_line;
my @record = split($sep, $db_line);
if (&chk_validation(@record) == 1) {
die "Sorry, there was an error parsing the database input :(";
}
$database{$record[0]} = [@record[1..$#record]];
}
}
sub chk_validation {
my @arraytocheck = @_;
my $pos = 0;
foreach $value (@arraytocheck) {
if ($validation[$pos] eq 'NUM') {
return 1 if ($value !~ /^\d+$/); # return a non 0
}
if ($validation[$pos] eq 'TELEPHONE') {
return 1 if ($value =~ /[^\d|\+| |\(|\)]/);
}
# We don't care about text :)
$pos++;
}
return 0;
}
|
|
Here, I have added the subroutine, chk_validation. chk_validation will check to see if the elements in the parsed line fit with the conditions in the array @validation.
Perl does not have a switch statement, but it would be handy right now.
If you don't understand it, and know your biology, think of it as a
ribosome translating rDNA, but if it finds something that doesn't match,
it throws an error.
Writing to a flat database
We know how information is stored and read, so lets write ours in the
same format. This is rather easy, compared to most of the validation...
here we go.
sub write_entry {
my ($forename, $surname, $city, $telephone) = @_;
my $cid = scalar keys %database;
open(OUT, '>>' . $db_file) or die "Sorry, we could not open the database for writing, $!";
print OUT $cid . $sep . $forename . $sep . $surname . $sep . $city . $sep . $telephone . "\n";
close(OUT);
}
|
|
That was easy, but lets add some taint checking and replacing - I shall
explain. If I want to use either the tab character (\t) or the pipe
character, or even the colon character to delimit fields in the
database, I need to replace them in user input. Should someone use that
character in a text field, where it is allowed, then the data when
parsed will be split at the wrong point. To demonstrate this using
spaces, try the following:
#!/usr/bin/perl
# fdb_taint1.pl
#
# Breaking 'data' in perl
my @result;
my $input = '3 '; # note the extra space after the 3
my $line = "1, 2, $input, 4, 5"; # note no 3, rather $input
@result = split(' ', $line);
foreach (@result) { print "= $_\n"; }
|
|
In this example, the script should print = 1, .. = 5, but fails to do so
because there is an extra space after 3 on the 'input' variable. If
this was the telephone directory, and we tried to refer to
$result[3],
then you'd find the variable would not be there - instead, that extra
space would have nudged all sequential variables up a place... probably
causing havoc, and even data loss - especially if we used:
my ($data1, $data2, $data3, $data4, $data5) = @array; # there is no $data6, therefore it's been nudged, and lost
|
|
So how do I fix this kind of problem? By replacing! Since we know we
don't want extra septerators, we could can a hidden character no one
knows about, or we can use a 'word' replacement, and fix it back a
little later. Using a 'word' is preferred, as it leaves no room for
accidental errors... Lets have a look:
sub write_entry {
# my apologies if there is an easier way of doing this?
my $i = 0;
my $value;
foreach $value (@_) {
@_[$i] = $value if (s/$sep/__BAR__/ig);
$i++;
}
my ($forename, $surname, $city, $telephone) = @_;
my $cid = scalar keys %database;
open(OUT, '>>' . $db_file) or die "Sorry, we could not open the database for writing, $!";
print OUT $cid . $sep . $forename . $sep . $surname . $sep . $city . $sep . $telephone . "\n";
close(OUT);
}
|
|
We'll also need a modification to the read function to reverse the process:
if (-e $db_file) {
open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $_";
@db_lines = ;
close(hDB);
foreach $db_line (@db_lines) {
chomp $db_line;
my @record = split($sep, $db_line);
my $i = 0;
my $value;
foreach $value (@record) {
@record[$i] = $value if (s/__BAR__/$sep/ig);
$i++;
}
if (&chk_validation(@record) == 1) {
die "Sorry, there was an error parsing the database input :(";
}
$database{$record[0]} = [@record[1..$#record]];
}
}
|
|
Searching through your database
This is where our hash information comes into play, but not as evidently
as I demonstrated. To search our hash, we need to exact the key and
value pair. Because Perl is so wonderful, it has a lovely operator
called 'each' - lets have a look:
sub search {
my ($term) = @_;
while (my ($key, $value) = each(%database)) {
foreach (@{$value}) {
return $key if (/$term/i);
}
}
return undef;
}
|
|
This function will return undef if no match is found. It's also case
insensitive. Also, this is where our hash tuition comes into play - if
you remember, I said a hash holds an array of scalars (or pointers!).
Since the value of the key is
$value,
$value at any point would be exactly the same as
$database{$key}, therefore, we simply reference it as an array with
@{$value}, and do away with
@{$database{$key}}.
Finishing off
We've now covered everything we need to make the database backend to an interactive phonebook database system.
Some differences:
- I have used the tab character to delimit the database
- I have added a menu system
- I have added an addition system
So, without further ado, lets produce the final script...
#!/usr/bin/perl
# fdb_phonebook.pl
#
# DESC: A flatfile database phone book by Matt 'QX' Melton
# HTTP: http://blacksun.box.sk
# DATE: 25/10/01
# LNCE: You may not use this on your own site without pior permission
@validation = ('NUM', 'TEXT', 'TEXT', 'TEXT', 'TELEPHONE');
$db_file = "database.txt";
$sep = "\t";
&load_database;
&main_menu;
exit;
# ---------------------------------------------------------------------
##
## Display the main menu, and prompts for input
##
sub main_menu {
my $main_screen = <<END;
The Phone book - by Matt
--------------------------
What you you like to do:
1) Add a new entry
2) Display an entry
3) Search for an entry
x) Exit
END
# 1st timers...
print $main_screen;
print "\t=";
while ($choice = <STDIN>) {
chomp $choice;
exit if ($choice eq 'x');
&add_entry if ($choice eq '1');
&show_entry if ($choice eq '2');
&search_entry if ($choice eq '3');
# Returns WIN32 usually, but you can never be
# too sure with NT and 2K :)
if ($^O =~ /WIN/i) {
system('cls');
} else {
system('clear');
}
print $main_screen;
print "\t=";
}
}
#
# Prompts for new data input and validates, then runs write_entry
#
sub add_entry {
my @newrecord;
print "Forename: ";
my $forename = <STDIN>; chomp $forename;
print "Surname: ";
my $surname = <STDIN>; chomp $surname;
print "City: ";
my $city = <STDIN>; chomp $city;
print "Telephone number: ";
my $telephone = <STDIN>; chomp $telephone;
if (&chk_validation(0, $forename, $surname, $city, $telephone) == 1) {
print "Data entered was not valid. Please try again\n\n";
print "\n Entry NOT added.\n\nHit any key to continue...\n";
my $null = <STDIN>;
return;
}
&write_entry($forename, $surname, $city, $telephone);
print "\n Added entry.\n\nHit any key to continue...\n";
my $null = <STDIN>;
return;
}
#
# Prompts for key, then runs display_entry
#
sub show_entry {
print "Entry key number: ";
my $key = <STDIN>; chomp $key;
print "\n";
&display_entry($key);
print "\nHit any key to continue\n";
my $null = <STDIN>;
}
#
# Retrieves records, checks for existance, then displays
#
sub display_entry {
my ($key) = @_;
my $record = $database{$key};
if ($record == undef) {
print "That record does not exist\n";
return;
}
print "ID........... $key\n";
print "Name......... $$record[0]\n";
print "Surname...... $$record[1]\n";
print "City......... $$record[2]\n";
print "Telephone.... $$record[3]\n";
}
#
# Prompts for search term, runs the search sub, display entry if only 1, or displays
# entry keys if more
#
sub search_entry {
print "Please type the search phrase [Name, partial number]: ";
my $term = <STDIN>;
chomp $term;
print "\n";
my ($matches) = &search($term);
if (@$matches == undef) {
print "Sorry, no matches found\n\nHit any key to continue...\n";
my $null = <STDIN>;
return;
}
if ($#$matches == 1) {
print "Found one matching entry:\n";
&display_entry($$matches[1]);
print "\nHit any key to continue.\n";
my $null = <STDIN>;
return;
}
print "Found " . $#$matches . " matching entries: " . substr(join(', ', @$matches), 2) . "\n\nHit any key to continue...\n";
my $null = <STDIN>;
}
#
# If the db file exists, it will read it and split the lines into records, and then
# fields. The adds to $database hash
#
sub load_database {
if (-e $db_file) {
open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $_";
@db_lines = <hDB>;
close(hDB);
foreach $db_line (@db_lines) {
chomp $db_line;
next if $db_line eq "";
my @record = split(/$sep/, $db_line);
my $i = 0;
my $value;
foreach $value (@record) {
@record[$i] = $value if (s/__BAR__/$sep/ig);
$i++;
}
if (&chk_validation(@record) == 1) {
die "Sorry, there was an error parsing the database input :(";
}
$database{$record[0]} = [@record[1..$#record]];
}
}
}
#
# Concurrently parses the records with the array @validation
# returns 1 if there is a validation error
#
sub chk_validation {
my @arraytocheck = @_;
my $pos = 0;
foreach $value (@arraytocheck) {
if ($validation[$pos] eq 'NUM') {
return 1 if ($value !~ /^\d+$/); # returns the value 1
}
if ($validation[$pos] eq 'TELEPHONE') {
return 1 if ($value =~ /[^\d|\+| |\(|\)]/);
}
# We don't care about text :)
$pos++;
}
return 0;
}
#
# Parses, replaces, the $sep character in a string and prints to the end of the db file
#
sub write_entry {
my $i = 0;
my $value;
# my apologies if there is an easier way of doing this
foreach $value (@_) {
@_[$i] = $value if (s/$sep/__BAR__/ig);
$i++;
}
my ($forename, $surname, $city, $telephone) = @_;
my $cid = (scalar keys %database) + 1;
open(OUT, '>>' . $db_file) or die "Sorry, we could not open the database for writing, $!";
print OUT $cid . $sep . $forename . $sep . $surname . $sep . $city . $sep . $telephone . "\n";
close(OUT);
&load_database; # reload the db, but we could do it straight to the array, but it'd be
# an active memory db and not a flat file one :)
}
#
# Cycles each key/value pair and sees if they match the term, if so, adds to array and
# returns list of matches
#
sub search {
my ($term) = @_;
my @found = undef;
while (($key, $value) = each(%database)) {
foreach $field (@{$value}) {
push (@found, $key) if ($field =~ /$term/i);
}
}
return undef if ($#found == 0);
# else
return \@found;
}
|
|
Comments