PHP Web Page Security Generator Script Tutorial

Oct 4, 2012
PHP
120 Shares
By

This tutorial outlines the Captcha objective, how-to design and code your own security script, using PHP. Although considered by the masses to be a very complicated maneuver, protecting web pages, as well as internal elements of those pages, is actually quite simple. So, why are these Captcha and ReCaptcha solutions so complicated and difficult to use? The only honest response I can give you is: marketing.

No matter what anyone tells you, at the end of the digital day, these beautifully designed, engaging, stimulating and appealing folds of information are just that: pages. If these pages were printed pages, then Captcha would be a child-proof seal on each one. And for those forward thinkers out there, you guessed it: Captcha is actually a type of password. And passwords are a hackers biggest challenge. Captcha, ReCaptcha and now Image Captcha are like child-proof seals that even adults cannot open – without wanting to smash that plastic container to pieces – with seething teeth, watery eyes and narrowed brow, rip out the cotton and snatch that little purple pill, with a madman’s sort of satisfaction.

What Is CAPTCHA Actually?

First, let me break down the basics, for those who are novice programmers or frightened web site owners. Captcha is specifically a non-trademark of Carnegie Melon University, where a group of programmers elite scripted a method for reducing, or eliminating, unwanted injections to any web page. This was called Completely Automated Public Turing [to tell] Computers [and] Humans Apart. CAPTCHA. That is quite an interesting thing. How does an automated machine tell the difference between an automatic keyboard stroke and a human induced keyboard stroke? And here the fun begins. The keyboard. That poor, coffee-stained, crumb and eyelash laden attachment, that permits we humans to transform our thoughts into digital fandango! What the algorithm did was create an automated password system or security portal, to capture potential hacks, and stop them in their tracks. Think: a RAID© commercial for data driven web-pages. As of late, we have seen Captcha go steroids. But, no matter the bloated and vigorous attempt at security, that solution comes back to one thing: the password.

How The Solution Works?

So, how does the Captcha thing actually work? When a person enters or presses a key on the keyboard, a specific language command is executed and prints the keystroke into the input box. This box is designed to store the keys or characters entered until the return key is pressed and the information put through several tests behind the page. These tests are specifically designed to determine if the keys entered match the security code of the script, whether or not these keys contain unreadable characters, or if there are excessive characters -like programming code or hyperlinks- within. There are a few more tests the system performs, which I shall not indulge/bother you with. Point being, these keys or characters are a set of variables passed through pages, using scripts called security checkpoints. In short, a password checker.

Sure, you can ask me –err, ASCII Anything

How does the algorithm/script determine what key is correct or incorrect? Ah, yes. Now that is the real question and what this article is destined to do: relieve some of that child-proof frustration. Your keyboards, actually all keyboards, essentially operate under the same system. This system of key-press or key-strokes is called ASCII.  ASCII (pronounced: a-ski) is an acronym for American Standard Code [for] Information Interchange. The core language is based on American English, used for most devices that print text. In fact, the origin of this code is telegraph. Yup, like Morse code. Then evolved from typewriters to these massive key-punch systems, and finally reduced to keyboard size, as personal computers became increasingly popular. ASCII has also been revised for non-English speaking keyboards, but the code per stroke remains the same. In essence, each button on your keyboard has a digital command behind it, be it in Russian, Cantonese or Arabic. Some of these commands are non-print or non-character based, others are print or character based. Some, as you might have guessed, can be combined to form other commands or changes in characters. A good example is simply pressing the F1 key, which is a command to open the help desk of your machines operating system or pressing the CAPSLOCK, transforming the letter “a” into “A”.

The ASCII code comes in a variety of methods used, like ASCII Art, which uses characters to form Images or extended codes like Unicode, which have more characters in their system. The standard ASCII contains 128 units. Of those 128 units, 95 are considered printable characters. The remaining 33 are non-print characters, like BACKSPACE, RETURN, NULL, etc. So, it is in the ASCII 95 that the use of Captcha is -or was- used. It is precisely within these 95 characters that all passwords are generated. In essence, you could create a password that is 95 char long with a probable 9,025 password variations, using the ASCII 95. Wow, that is a lot of combos! Indeed. But, machines are “smart”, meaning the people who designed ASCII, Captcha, passwords and web pages are the same kind of people who invented robots -and are the same kind of people known as hackers or phreaks. So, why all the fuss, if ASCII can produce 9,025 variations of characters? Wouldn’t this alone hinder security breaches in web pages? It can, and does. But, invention is one of those things. Especially for programmers, perfecting the invention is often more fun than the original. What more fun could there be than competing against other like-minded individuals making the process challenging on higher levels? Captcha took on a new level by introducing a very strange kind of password system. What do I mean? Well, look at any web page. What is the first thing you notice about it, besides the striking images? Still guessing? Hmm. Give you a hint. Your reading it right now. Ding! The text.

That Is Some Kind Of Fontastic News!

That text, printed ASCII, is based on a character output known as Fonts. Depending on your System [PC,Mac, Mobile] you will notice there many different kinds of Fonts. Some fonts work only on certain machines, while other fonts are universal. Still more fonts can be installed on machines and, of course, on web pages. This is what makes them so rich and fun to engage. Scribbled-loop-de-loop or collegiate-block-bold, they make the web a lot more interesting. However, the ASCII commands are the same, no matter what font is used. Programmers know this. They also know that they could mathematically program a script to create those 9,025 variations of printable Key-Strokes. So, to change-it-up, keep the robots confused -and make half the adult computer-user population go blind with frustration- Captcha decided to create a Font system of their own. Fonts that look shaded, ultra-thin or resemble a really obscure Rorschach test. By combining various Font types, the system prints these obscure Letters or Numbers to keep machines from reading them, automatically. Or do they? Honestly, I have yet to see how. In fact, I know many web sites that are still getting spam and injections with the use of Captcha. Why? Because the key-strokes are still the same. But, now they have gone the route of Layered Fonts. Remember, fonts are actually Images -called Glyphs- stored in a specific file type. These font types are commonly referred to as TrueType or OpenType. But, there are many, many Font types from strict Mac to Adobe PDF. What is being done now is layering multiple font types, with distorted background images, even background glyphs, to further hinder the automatic hacker. Still, though, the letter & number key-strokes are still the same.

There are a great number of characters that can be applied to create passwords, because the ASCII system has 128 characters (255 extended). One of the strange things I noticed with Captcha and other programs, was the use of only letters & numbers in the system. Which sounds reasonable, since most passwords are specifically letter-number combos. In fact, the typical password system utilizes only a portion of the keyboard-stroke command, about 59 characters, providing 3,481 probabilities for any letter-number combo. Out of 9,025, that’s just one-third of the available units. Yeats & Keating!

The first group of characters, ASCII 1-31 are specifically non-character or non-print commands. So, they do nothing for creating passwords. Most password systems use letters and numbers also known as alpha-numeric characters. Many assume these characters are stored somewhere, either in a database or flat-file. But, since I know that a security checkpoint does not need to be saved to the database or flat-file, realized there was a lot more potential in creating a security portal, than a blurred ASCII Art rabbit {although it was really cute}. Adding the “unused characters”, created my own version of Captcha. I combined the units 32-47 + 58-64 + 91-96 + 123-126. Yes, I could have even applied the ASCII Extended units, up to 255. But because they are called HTML special characters, like copyright symbol or accented Latin, and do not appear on the keyboard itself, left them alone. Not that they cannot be used, they CAN! But it would make more work on the actual human and make me a Captcha monster -literally and figuratively. So, this added an additional 29 units to the available list of ASCII characters for a grand total of 88 characters out of 128, within the range of 32 and 128, respectively. The new probabilities or combos then are 7,744.

Scripting Web Page Security Measures

For the most part, a n00b programmer can design and implement their own web security system, using several approaches. And since the newer versions of Captcha are causing the not-so-visually impaired to become visually impaired, staring at those glyphs, have decided to think out of the box.

What are some of the ways to use the ASCII system to create security portals? Well, one way is to use the additional characters mentioned. This expands the password range from 3,481 to 7,744 -almost double. Sweet! Achieving this basically requires selecting the number of characters for output. They can be from one to eighty-eight characters in length. But 88 char is a bit excessive, not to mention most machines tend to stop reading an input-field after twenty characters, simply because it begins to eat up automated time-lines. Spammers do not want to get noticed. They want the fastest in-out possible, so they can move on to the next victim. I found that even groups work best, like 6 or 8 characters in length. A six-eight combo yields 36-64 probabilities per instance or display.

One approach I used, which has been extremely effective, is to simply apply a static background color to the password display area. This is achieved by creating a SPAN next to the input-field and doing a little CSS majik. The output was quite clean, simple and readable by the average human. Because I wrapped the output in a span, it is unlikely a robot or automatic spammer would “look” for it. They usually look for “input type” names or pre~fill fields. The ouput script produced a combo of lowercase and uppercase letters, as well as, numbers and symbols. This alone reduced spam on one of my clients websites to zero. I kid you not. Another way using this same method was called forced highlight. The page background was white and security output color white. Just above the input-field, I made a notation for the visitor to highlight the area and read the code.

Brilliant! This made it even easier for the real human, because they could just copy-paste the security code in, and oblivious to the robot or spammer -who could not even see the text.

.comment span {background: rgb(50,100,150);
color: rgb(255,255,255);
padding:10px; font-size:14px;

The PHP Generator Script

With the aforementioned comes the need for the code, which you will use to create the output characters. It is very straight-up easy to do using PHP. What this means, in layman’s terms is: the code is executed or run either before the web page loads, when a button is clicked or the Enter Key pressed on your blessed keyboard. This is extremely beneficial to creating “hidden” commands and functions in 1.43 seconds of load-time. The code works like this:

  1. Create a command-line function to generate a random string of characters. The initial result is NULL, because there are no characters, yet. To create the character set -or string length- we use the FOR command.
  2. Calculate a RANDOM string of the ASCII characters between units 32 and 126. Again, you could include the glyphs and accented characters up to 255. The result now is relative to the selected number of random characters. We return the string result and give the generated characters a specific name (SNAP1). The key to making the entire script effective is in the last 3 bits of script. Executing the number of characters to display at load-time. In this case, it is six (6).
  3. Place a hidden field with this code echoed in it, to compare to the inputted text, by the visitor when the form is submitted. No bloating, no waiting, automatically generates the script (have yet to see the same output twice).
function generateRandomString($length)
{ $result=''; for($i=0; $i<$length;$i++)
{ $num=rand(32, 126); $result.= chr($num);}
return $result;}
$SNAP1=generateRandomString(6);

Applying User/Member IPA as a Web Page Security Code

Up until now, we have focused on just one parent method of creating security measures for web pages using the ASCII or Key-Stroke approach. Which, by all accounts, is very effective for applying child-proof seals. The only draw back to this method is, of course, the character base itself. At some point, a machine or spammer can catch on to Captcha. Probably further demonstrates why they are coming up with new -and maddening- methods to blind the seeing. Nevertheless, another powerful approach is to use the visitor’s IPA, Internet Protocol Address. Every time someone visits a web page, their IPA is available for capturing.

In fact, companies like Alexa, Bing, Quantcast and Yahoo use the IPA to deliver visitor details to web owners. It is also where spammers tend to hide and attempt to drop iFrame ads or harvest browser information. This is formally known as malware or spyware. So, a very nifty approach I used was to take the visitors IPA, manipulate the information to make it “unfriendly”, save that IPA, along with browser details, to a database table. Yes, this is a much more complex method to securing web pages, but is extremely useful to sites wanting stronger measures than just a simple Captcha code.

The following scripts are very fast and effective ways of using the visitors IPA to increase page security. The IP Address is stripped of its “.” leaving just the numbers. You could then manipulate the string by inserting a SPACE between each letter and using the same Captcha generator, insert special characters between. In short, instead of the user inputting a password, the system generates the security code based on their IPA. You could even trick-things up a bit by reversing the IPA, then adding a space and special characters, plus further truncated characters from the ASCII “Captcha” Generator.

Strip  ‘.’  {original : 241.931.74.240  | new: 24193174240 }

str_replace('.','',$IPA);

Reverse IPA {original : 24193174240 | new: 04247139142 }

$IPA=strrev($IPA);

Add Whitespace {original : 04247139142 | new: 0 4 2 4 7 1 3 9 1 4 2 }

$IPA=wordwrap($IPA,1," ",true);

Test out methods, ideas and approaches using this concept and tell us how they worked. Hey, you never know. You could be the inventor of the Next Generation of Web Security. Happy Coding!

Author: Charles James
Applied Philosopher and Programmer, specializing in PHP, jQuery, CSS. His approach to Web Development is a modern, minimalistic hybrid of logical-creative. Full-time web solutions developer, as well as part-time writer. "I enjoy teaching, learning and exploring new and challenging aspects of digital fandango!"
  • http://www.facebook.com/kaseybonifacio Kasey Bonifacio

    I’ve never heard that PHP stands for Pre Hypertext Protocol. PHP originally stood for Personal Home Page but was later changed to PHP: Hypertext Preprocessor. It is a recursive acronym.

  • http://www.hirewebdevelopersindia.com/ Eric Lewis

    Great tutorial!!! I found it very interesting for generating security code using PHP. Thanks for sharing this…

  • http://www.cbil360.com/ Web Design Company

    Useful tutorial articles which guide you to generate a secure PHP web page with all details and required snap codes.

  • Pauline Taylor

    Your tutorial is really good and PHP programmers should follow these guidelines before developing any website. Thanks.

  • michael griddine

    I like the way you term things and i want learn more