Tracing the Roots of Unicode and Non-Unicode Data Types
Nchar is short for "NATIONAL CHARACTER", nvarchar stands for "NATIONAL CHARACTER VARYING", and ntext is the ISO synonym for "NATIONAL TEXT". Originally intended for pre-Unicode multibyte encodings like JIS encoding for Asian characters. The idea was that VARCHAR would continue to be utilized for ASCII, with NVARCHAR being employed for non-ASCII characters.
This use-case was designed when the Internet was still in its infancy and before the Unicode project had taken off. In those days, Asian languages in particular, employed their own specific - and mutually incompatible - encodings, with GB for mainland Chinese, JIS/SJIS for Japanese, BIG5 in Hong Kong and Taiwan, CNS in Taiwan, etc. However, all of that changed with the emergence of the Unicode project encodings, as database vendors realized that it was easier to just allow VARCHAR itself to support multibyte character encodings, and use Character Sets and Collations to deal with specific encodings. For instance, you can use UTF-8 to encode any character you need in any language your applications need to support. Thus, the need for a whole group of character data types that were specific to "NATIONAL CHARACTER" soon faded away.
Today, in many modern DB engines, "NVARCHAR" and "NATIONAL CHARACTER VARYING" are really just aliases for VARCHAR, with the actual implementation being virtually (if not exactly) identical. Having said that, SQL Server does treat the two differently. As stated in the docs:
The key difference between varchar and nvarchar is the way they are stored, varchar is stored as regular 8-bit data(1 byte per character) and nvarchar stores data at 2 bytes per character. Due to this reason, nvarchar can hold up to 4000 characters and it takes double the space as SQL varchar.
Which to Use?
As specified above, the biggest concern when deciding between types is the amount of storage used. For example, nvarchar uses 2 bytes per character, whereas varchar uses 1. Thus, nvarchar(4000) uses the same amount of storage space as varchar(8000). Hence, if you have requirements to store UNICODE or multilingual data, nvarchar is the best choice. Varchar stores ASCII data and should be your data type of choice for normal use. Another consideration is that joining a VARCHAR column to a NVARCHAR (and vice-versa) in queries can lead to a considerable performance hit.
Conclusion
In today's blog, we compared SQL Server's Unicode and Non-Unicode String Data Types to decide when to use one over the other.