Here, on the Smashing Editorial team, we always try to meet the
expectations of our readers. We do our best to avoid
misunderstandings, and we try to spread knowedge and present only
the best design practices and development techniques. However,
sometimes we do make mistakes. And when we do, we apologize and do
our best to correct what we’ve done.
In November 2008 we published the article
10 Advanced PHP Tips To Improve Your Programming. Apparently,
according to negative comments to the post, it contained some
errors and some statements that are just wrong. We
sincerely apologize for our mistake, and we are truly
sorry for any inconvenience we caused by it. However, this simple
apology is not good enough. To solve the problem, we asked Chris
Shiflett and Sean Coates, two PHP gurus, to take a closer look at
the article, explain its errors and make it perfectly clear what is
actually right and wrong in the theory and practice. This article
is a professional response to our article published a couple of
months ago.
[Offtopic: By the way, did you know that Smashing Magazine has a
mobile version? Try it out if you have an iPhone, Blackberry or
another capable device.]
10 Useful PHP Tips Revisited
by Chris Shiflett and Sean Coates
This article is a rebuttal to
10 Advanced PHP Tips To Improve Your Programming — henceforth
referred to as the previous article — published last November here
on Smashing Magazine. The introduction sounds intriguing:
Listed below are 10 excellent techniques that PHP developers
should learn and use every time they program.
Unfortunately, the intrigue devolves into disappointment. We
disagree with many of the tips, and even when we don’t, the
accompanying explanation is weak or misleading. In this article, we
go through each and every tip from the previous article and provide
our own commentary and evidence, either to validate and clarify the
tip, or to refute it. Our hope is that you don’t just accept our
opinion, but rather learn enough to form your own.
1. Use an SQL Injection Cheat Sheet
This particular tip is just a link to a useful resource with no
discussion on how to use it. Studying various permutations of one
specific attack can be useful, but your time is better spent
learning how to safeguard against it. Additionally, there is much
more to Web app security than SQL injection. XSS (Cross-Site
Scripting) and CSRF
(Cross-Site Request Forgeries), for example, are at least as
common and at least as dangerous.
We can provide some much-needed context, but because we don’t
want to focus too much on one attack, we’ll first take a step back.
Every developer should be familiar with good security practices,
and apps should be designed with these practices in mind. A
fundamental rule is to never trust data you receive from somewhere
else. Another rule is to escape data before you send it somewhere
else. Combined, these rules can be simplified to make up a basic
tenet of security: filter input, escape output (FIEO).
The root cause of SQL injection is a failure to escape output.
More specifically, it is when the distinction between the format of
an SQL query and the data used by the SQL query is not carefully
maintained. This is common in PHP apps that construct queries as
follows:
<?php $query = "SELECT * FROM users WHERE name = '{$_GET['name']}'"; ?>
In this case, the value of $_GET['name'] is provided by another
source, the user, but it is neither filtered nor escaped.
Escaping preserves data in a new context. The emphasis on
escaping output is a reminder that data used outside of your Web
app needs to be escaped, else it might be misinterpreted. By
contrast, filtering ensures that data is valid before it’s used.
The emphasis on filtering input is a reminder that data originating
outside of your Web app needs to be filtered, because it cannot be
trusted.
Assuming we’re using MySQL, the SQL injection vulnerability can
be mitigated by escaping the name with mysql_real_escape_string().
If the name is also filtered, there is an additional layer of
security. (Implementing multiple layers of security is called
“defense in depth” and is a very good security practice.) The
following example demonstrates filtering input and escaping output,
with naming conventions used for code clarity:
<?php // Initialize arrays for filtered and escaped data, respectively. $clean = array(); $sql = array(); // Filter the name. (For simplicity, we require alphabetic names.) if (ctype_alpha($_GET['name'])) { $clean['name'] = $_GET['name']; } else { // The name is invalid. Do something here. } // Escape the name. $sql['name'] = mysql_real_escape_string($clean['name']); // Construct the query. $query = "SELECT * FROM users WHERE name = '{$sql['name']}'"; ?>
Although the use of naming conventions can help you keep up with
what has and hasn’t been filtered, as well as what has and hasn’t
been escaped, a much better approach is to use prepared statements.
Luckily, with PDO, PHP
developers have a universal API for data access that
supports prepared statements, even if the underlying database does
not.
Remember, SQL injection vulnerabilities exist when the
distinction between the format of an SQL query and the data used by
the SQL query is not carefully maintained. With prepared
statements, you can push this responsibility to the database by
providing the query format and data in distinct steps:
<?php // Provide the query format. $query = $db->prepare('SELECT * FROM users WHERE name = :name'); // Provide the query data and execute the query. $query->execute(array('name' => $clean['name'])); ?>
The PDO manual page provides
more information and examples. Prepared statements offer the
strongest protection against SQL injection.
2. Know the Difference Between Comparison Operators
This is a good tip, but it is missing a practical example that
demonstrates when a non-strict comparison can cause problems.
If you use strpos() to determine whether a substring exists
within a string (it returns FALSE if the substring is not found),
the results can be misleading:
<?php $authors = 'Chris & Sean'; if (strpos($authors, 'Chris')) { echo 'Chris is an author.'; } else { echo 'Chris is not an author.'; } ?>
Because the substring Chris occurs at the very beginning of
Chris & Sean, strpos() correctly returns 0, indicating the
first position in the string. Because the conditional statement
treats this as a Boolean, it evaluates to FALSE, and the condition
fails. In other words, it looks like Chris is not an author, but he
is!
This can be corrected with a strict comparison:
<?php if (strpos($authors, 'Chris') !== FALSE) { echo 'Chris is an author.'; } else { echo 'Chris is not an author.'; } ?>
3. Shortcut the else
This tip accidentally stumbles upon a useful practice, which is
to always initialize variables before you use them. Consider a
conditional statement that determines whether a user is an
administrator based on the username:
<?php if (auth($username) == 'admin') { $admin = TRUE; } else { $admin = FALSE; } ?>
This seems safe enough, because it’s easy to comprehend at a
glance. Imagine a slightly more elaborate example that sets
variables for name and email as well, for convenience:
<?php if (auth($username) == 'admin') { $name = 'Administrator'; $email = 'admin@example.org'; $admin = TRUE; } else { /* Get the name and email from the database. */ $query = $db->prepare('SELECT name, email FROM users WHERE username = :username'); $query->execute(array('username' => $clean['username'])); $result = $query->fetch(PDO::FETCH_ASSOC); $name = $result['name']; $email = $result['email']; $admin = FALSE; } ?>
Because $admin is still always explicitly set to either TRUE or
FALSE, all is well, but if a developer later adds an elseif,
there’s an opportunity to forget:
<?php if (auth($username) == 'admin') { $name = 'Administrator'; $email = 'admin@example.org'; $admin = TRUE; } elseif (auth($username) == 'mod') { $name = 'Moderator'; $email = 'mod@example.org'; $moderator = TRUE; } else { /* Get the name and email. */ $query = $db->prepare('SELECT name, email FROM users WHERE username = :username'); $query->execute(array('username' => $clean['username'])); $result = $query->fetch(PDO::FETCH_ASSOC); $name = $result['name']; $email = $result['email']; $admin = FALSE; $moderator = FALSE; } ?>
If a user provides a username that triggers the elseif
condition, $admin is not initialized. This can lead to unwanted
behavior, or worse, a security vulnerability. Additionally, a
similar situation now exists for $moderator, which is not
initialized in the first condition.
By first initializing $admin and $moderator, it’s easy to avoid
this scenario altogether:
<?php $admin = FALSE; $moderator = FALSE; if (auth($username) == 'admin') { $name = 'Administrator'; $email = 'admin@example.org'; $admin = TRUE; } elseif (auth($username) == 'mod') { $name = 'Moderator'; $email = 'mod@example.org'; $moderator = TRUE; } else { /* Get the name and email. */ $query = $db->prepare('SELECT name, email FROM users WHERE username = :username'); $query->execute(array('username' => $clean['username'])); $result = $query->fetch(PDO::FETCH_ASSOC); $name = $result['name']; $email = $result['email']; } ?>
Regardless of what the rest of the code does, it’s now clear
that $admin is FALSE unless it is explicitly set to something else,
and the same is true for $moderator. This also hints at another
good security practice, which is to fail safely. The worst that can
happen as a result of not modifying $admin or $moderator in any of
the conditions is that someone who is an administrator or moderator
is not treated as one.
If you want to shortcut something, and you’re feeling a little
disappointed that our example includes an else, we have a bonus tip
that might interest you. We’re not certain it can be considered a
shortcut, but we hope it’s helpful nonetheless.
Consider a function that determines whether a user is authorized
to view a particular page:
<?php function authorized($username, $page) { if (!isBlacklisted($username)) { if (isAdmin($username)) { return TRUE; } elseif (isAllowed($username, $page)) { return TRUE; } else { return FALSE; } } else { return FALSE; } } ?>
This example is actually pretty simple, because there are only
three rules to consider: administrators are always allowed access;
those who are blacklisted are never allowed access; and isAllowed()
determines whether anyone else has access. (A special case exists
when an administrator is blacklisted, but that is an unlikely
possibility, so we’re ignoring it here.) We use functions for the
rules to keep the code simple and to focus on the logical
structure.
There are numerous ways this example can be improved. If you
want to reduce the number of lines, a compound conditional can
help:
<?php function authorized($username, $page) { if (!isBlacklisted($username)) { if (isAdmin($username) || isAllowed($username, $page)) { return TRUE; } else { return FALSE; } } else { return FALSE; } } ?>
In fact, you can reduce the entire function to a single compound
conditional:
<?php function authorized($username, $page) { if (!isBlacklisted($username) && (isAdmin($username) || isAllowed($username, $page)) { return TRUE; } else { return FALSE; } } ?>
Finally, this can be reduced to a single return:
<?php function authorized($username, $page) { return (!isBlacklisted($username) && (isAdmin($username) || isAllowed($username, $page)); } ?>
If your goal is to reduce the number of lines, you’re done.
However, note that we’re using isBlacklisted(), isAdmin(), and
isAllowed() as placeholders. Depending on what’s involved in making
these determinations, reducing everything to a compound conditional
may not be as attractive.
This brings us to our tip. A return immediately exits the
function, so if you return as soon as possible, you can express
these rules very simply:
<?php function authorized($username, $page) { if (isBlacklisted($username)) { return FALSE; } if (isAdmin($username)) { return TRUE; } return isAllowed($username, $page); } ?>
This uses more lines of code, but it’s very simple and
unimpressive (we’re proudest of our code when it’s the least
impressive). More importantly, this approach reduces the amount of
context you must keep up with. For example, as soon as you’ve
determined whether the user is blacklisted, you can safely forget
about it. This is particularly helpful when your logic is more
complicated.
4. Drop Those Brackets
Based on the content of this tip, we believe the author means
“braces,” not brackets. “Curly brackets” may mean braces to some,
but “brackets” universally means “square brackets.”
This tip should be unconditionally ignored. Without braces,
readability and maintainability are damaged. Consider a simple
example:
<?php if (date('d M') == '21 May') $birthdays = array('Al Franken', 'Chris Shiflett', 'Chris Wallace', 'Lawrence Tureaud'); ?>
If you’re good enough, smart enough, secure enough, notorious
enough, or pitied enough, you might want to party on the 21st of
May:
<?php if (date('d M') == '21 May') $birthdays = array('Al Franken', 'Chris Shiflett', 'Chris Wallace', 'Lawrence Tureaud'); party(TRUE); ?>
Without braces, this simple addition causes you to party every
day. Perhaps you have the stamina for it, so the mistake is a
welcome one. Hopefully, the silly example doesn’t detract from the
point, which is that the excessive partying is an unintended side
effect.
In order to promote the practice of dropping braces, the
previous article uses short examples such as the following:
<?php if ($gollum == 'halfling') $height --; else $height ++; ?>
Because each condition is constrained to a single line, such
mistakes might be less likely, but this leads to another problem:
inconsistencies are jarring and require more time to read and
comprehend. Consistency is such a valued quality that developers
often abide by a coding standard even if they dislike the coding
standard itself.
We recommend always using braces:
<?php if (date('d M') == '21 May') { $birthdays = array('Al Franken', 'Chris Shiflett', 'Chris Wallace', 'Lawrence Tureaud'); party(TRUE); } ?>
You’re welcome to party every day, but make sure it’s
deliberate, and please be sure to invite us!
5. Favor str_replace() Over ereg_replace() and
preg_replace()
We hate to sound disparaging, but this tip demonstrates the sort
of misunderstanding that leads to the same misuse it’s trying to
prevent. It’s an obvious truth that string functions are faster at
string matching than regular expression functions, but the author’s
attempt to draw a corollary from this fails miserably:
If you’re using regular expressions, then ereg_replace() and
preg_replace() will be much faster than str_replace().
Because str_replace() does not support pattern matching, this
statement makes no sense. The choice between string functions and
regular expression functions comes down to which is fit for
purpose, not which is faster. If you need to match a pattern, use a
regular expression function. If you need to match a string, use a
string function.
6. Use Ternary Operators
The benefit of the ternary operator is debatable (there’s only
one, by the way). Here is a line of code from an audit we performed
recently:
<?php $host = strlen($host) > 0 ? $host : htmlentities($host); ?>
Oops! The author actually means to escape $host if the string
length is greater than zero, but instead accidentally does the
opposite. Easy mistake to make? Maybe. Easy to miss during a code
audit? Certainly. Concision doesn’t necessarily make the code any
better.
The ternary operator may be fine for one-liners, prototypes, and
templates, but we strongly believe that an ordinary conditional
statement is almost always better. PHP is descriptive and verbose.
We think code should be, too.
7. Memcached
Disk access is slow. Network access is slow. Databases typically
use both.
Memory is fast. Using a local cache avoids the overhead of
network and disk access. Combine these truths and you get
memcached, a “distributed memory object caching system” originally
developed for the Perl-based blogging platform LiveJournal.
If your application isn’t distributed across multiple servers,
you probably don’t need memcached. Simpler caching
approaches — serializing data and storing it in a temporary file,
for example — can eliminate a lot of redundant work on each
request. In fact, this is the sort of low-hanging fruit we consider
when helping our clients tune their apps.
One of the easiest and most universal ways to cache data in
memory is to use the shared memory helpers in APC, a caching system originally
developed by our colleague George Schlossnagle. Consider the
following example:
<?php $feed = apc_fetch('news'); if ($feed === FALSE) { $feed = file_get_contents('http://example.org/news.xml'); // Store this data in shared memory for five minutes. apc_store('news', $feed, 300); } // Do something with $feed. ?>
With this type of caching, you don’t have to wait on a remote
server to send the feed data for every request. Some latency is
incurred — up to five minutes in this example — but this can be
adjusted to as close to real time as your app requires.
8. Use a Framework
All decisions have consequences. We appreciate frameworks — in
fact, the main developers behind CakePHP and Solar work with us at OmniTI — but using
one doesn’t magically make what you’re doing better.
In December, our colleague Paul Jones wrote an article for PHP
Advent called
The Framework as Franchise, in which he compares frameworks to
business franchises. He refers to a suggestion by Michael Gerber
from his book “The E-Myth Revisited”:
Gerber notes that to run a successful business, the entrepreneur
needs to act as if he is going to sell his business as a franchise
prototype. It is the only way the business owner can make the
business operate without him being personally involved in every
decision.
This is good advice. Whether you’re using a framework or
defining your own standards and conventions, it’s important to
consider the value from the perspective of future developers.
Although we would love to give you a universal truth, extending
this idea to suggest that a framework is always appropriate isn’t
something we’re willing to do. If you ask us whether you should use
a framework, the best answer we could give is, “It depends.”
9. Use the Suppression Operator Correctly
Always try to avoid using the error suppression operator. In the
previous article, the author states:
The @ operator is rather slow and can be costly if you need to
write code with performance in mind.
Error suppression is slow. This is because PHP dynamically
changes error_reporting to 0 before executing the suppressed
statement, then immediately changes it back. This is expensive.
Worse, using the error suppression operator makes it difficult
to track down the root cause of a problem.
The previous article uses the following example to support the
practice of assigning a variable by reference when it is unknown if
$albus is set:
<?php $albert =& $albus; ?>
Although this works — for now — relying on strange, undocumented
behavior without a very good understanding of why it works is a
good way to introduce bugs. Because $albert is assigned to $albus
by reference, future modifications to $albus will also modify
$albert.
A much better solution is to use isset(), with braces:
<?php if (!isset($albus)) { $albert = NULL; } ?>
Assigning $albert to NULL is the same as assigning it to a
nonexistent reference, but being explicit greatly improves the
clarity of the code and avoids the referential relationship between
the two variables.
If you inherit code that uses the error suppression operator
excessively, we’ve got a bonus tip for you. There is a new
PECL extension
called Scream that
disables error suppression.
10. Use isset() Instead of strlen()
This is actually a neat trick, although the previous article
completely fails to explain it. Here is the missing example:
<?php if (isset($username[5])) { // The username is at least six characters long. } ?>
When you treat strings as arrays, each character in the string
is an element in the array. By determining whether a particular
element exists, you can determine whether the string is at least
that many characters long. (Note that the first character is
element 0, so $username[5] is the sixth character in
$username.)
The reason this is slightly faster than strlen() is complicated.
The simple explanation is that strlen() is a function, and isset()
is a language construct. Generally speaking, calling a function is
more expensive than using a language construct.
About the Authors
Hi, we’re Chris Shiflett and
Sean Coates. We work together
at OmniTI (“the most important web
company you’ve never heard of”), blog about PHP and other stuff at
shiflett.org and seancoates.com, curate PHP Advent, and do the Twitter thing as
@shiflett and @coates.
(al)