The Differences Between SQL and SAS SQL

Contributed by:

Keith Brown
Senior Clinical SAS Programmer Analyst, DataCeutics

Many SAS programmers have some SQL knowledge from working with a particular database vendor’s software before they discover that they can use SQL in their SAS programming, by simply enclosing their SQL statements inside of a PROC SQL invocation. 

As an example of why this is useful, a single SQL create/select statement can produce the same results as the typical (sort, sort, merge, filter, sort) sequence of SAS data steps and procs. SAS programmers might wonder what the differences are between what they can do with the standard or vendor specific SQL that they already know, vs. what the corresponding capabilities are specific to SAS SQL. It’s a great topic to explore, and the news is good... very good! So let’s have a brief high-level look at those differences. First, we will note some standard SQL capabilities that are not supported, or are supported slightly differently, in SAS.

SAS SQL is basically ANSI Standard SQL (1992) with a very small number of exceptions and a wealth of useful extensions.

SAS SQL does not implement COMMIT, ROLLBACK, and GRANT, because SAS is not designed primarily as a database management system. Names for columns and tables follow SAS naming rules and so are limited to 32 characters starting with a letter or underscore. SAS treats NULL values according to SAS rules for missing values which differs slightly from the standard SQL NULL treatment. These differences and restrictions are helpful for making SQL fit more seamlessly into a SAS program.

The extensions are numerous, and there is no special extension package to install or additional products required. These SAS specific SQL extensions are already available and ready for use in a default base SAS installation. Here are eight good ones:

1.      Hundreds of SAS built-in functions can be used in SQL, many more functions than are available in the ANSI standard.

2.     You can even compile user-defined functions in SAS using PROC FCMP and then use your new function in SQL expressions.

3.     The syntax for SELECT INTO :variablename allows placing results directly into SAS macro variables. This construct is the SAS way of doing what standard SQL does for parameterized queries.

4.     SAS macro variables and even macro invocations can be freely intermixed into SQL syntax and will execute and expand into the final SQL that gets executed, so you can even write SQL macros. This is handy when you need to inject logic into SQL to things like computing or varying column name lists, table names, and where clause conditions. For example: “Select %dynamic_column_list from mydata where x=y order by key1, key2.”

5.     Summary function results are automatically remerged back into the tables’ data so that you can use them in expressions; e.g., to express what fraction of a total is represented for each row you could SELECT x/sum(x), or for a percentage 100* x/sum(x).

6.     ORDER BY and GROUP BY clauses can include expressions in addition to column names; e.g., “order by (new_value – baseline_value).”

7.     ORDER BY clauses are allowed in CREATE VIEW statements.

8.     SAS datasets function as tables, and they can be freely mixed in heterogeneous queries. You can join a SAS dataset to an MS Access table and an Oracle table, and then combine the result using a UNION operation with an SQL Server table.

Are There Reasons to Avoid SAS SQL Extensions?

Normally it is thought to be wise to avoid SQL extensions, so it is easier to transition to a different database management system if needed. Normally I agree with that. There are two reasons why I embrace SQL extensions in SAS, whereas not normally with extensions from Oracle, Sybase, Microsoft, Postgresql, etc.

1.       Most extensions are only available in a database running under the software of a particular DBMS vendor. SAS SQL extensions are language extensions, not DBMS vendor extensions, so they apply to SAS language processing of SQL for a database from ANY SAS supported (or ODBC) DBMS vendor; i.e., they are not implemented in the database engine, but rather in the host programming language.

2.      SAS programs rarely face the need to convert to a new DBMS vendor compliance, and changes are usually minimal if they do.

So fear not! Enjoy the extra power of SQL macros, heterogeneous multi-vendor queries, more functions, remerged aggregation results, and all the rest.