1. Primitive
Types
Any data types the compiler directly
supports are
called primitive types.
Primitive types map directly to types existing in the
Framework Class Library (FCL).
For the types that are compliant with the Common Language
Specification (CLS), other languages will offer similar
primitive types. However, languages aren’t required to offer any support for
the non–CLS-compliant types.
Primitives with Corresponding FCL Types
Another
way to think of this is that the C# compiler automatically assumes that you have
the following using directives in all of your source code files.
using sbyte = System.SByte;
using byte = System.Byte;
using short = System.Int16;
using ushort = System.UInt16;
using int = System.Int32;
using uint = System.UInt32;
...
About the compiler:
First,
the compiler is able to perform implicit or explicit casts between primitive
types. C# allows implicit casts if the conversion is “safe,” that is, no
loss of data is possible. C# requires explicit casts if the conversion is
potentially unsafe. For numeric types, “unsafe” means that you could lose
precision or magnitude as a result of the conversion.
Be aware that different compilers can generate different code to
handle these cast operations. For example, when casting a Single with a
value of 6.8 to an Int32, some compilers could generate code to put a 6 in
the Int32, and others could perform the cast by rounding the result
up to 7. By the way, C# always truncates the result.
In addition to casting, primitive types can be written
as literals.
If you have an expression consisting of literals, the compiler is
able to evaluate the expression at compile time, improving the
application’s performance.
2.
Checked and Unchecked Primitive Type Operations
The CLR offers IL instructions that allow the compiler to choose
the desired behavior. The CLR has an instruction
called add that adds two
values together.
The add instruction
performs no overflow checking. The CLR also has an instruction called
add.ovf that also adds two values together.
However, add.ovf throws
a System.OverflowException if
an overflow occurs. In addition to these two IL instructions for the add
operation, the CLR also has similar IL instructions for subtraction
(sub/sub.ovf), multiplication (mul/mul.ovf),
and data conversions (conv/conv.ovf).
One way to get the C# compiler to control overflows is to use
the /checked+ compiler switch.
This switch tells the compiler to generate code that has the overflow-checking
versions of the add, subtract, multiply, and conversion IL instructions.
The code executes a little slower because the CLR is checking these
operations to determine whether an overflow occurred. If an overflow
occurs, the CLR throws
an OverflowException.
In addition to having overflow checking
turned on or off
globally, programmers can control overflow
checking in
specific regions of their
code. C# allows this flexibility by
offering checked and unchecked operators.
e.g
UInt32 invalid = unchecked((UInt32) (-1)); //
OK
Byte b = 100;
b = checked((Byte) (b + 200)); //
OverflowException is thrown
b = (Byte) checked(b + 200); // b
contains 44; no OverflowException
In addition to the checked and unchecked
operators, C# also offers
checked and unchecked statements. The
statements cause all expressions within a block to be
checked or unchecked.
e.g
checked { // Start of checked block
Byte b = 100;
b = (Byte) (b + 200); // This expression is
checked for overflow.
}
In fact, if you use
a checked statement block,
you can now use the +=
operator with the Byte, which simplifies the
code a bit:
e.g
checked { // Start of checked block
Byte b = 100;
b += 200; // This expression is checked
for overflow.
}
Important: Because the only effect that the
checked operator and statement have is to determine which versions of the
add, subtract, multiply, and data conversion IL instructions are produced,
calling a method within a checked operator or statement has no impact on
that method, as the following code demonstrates:
checked {
// Assume SomeMethod tries to load 400
into a Byte.
SomeMethod(400);
// SomeMethod might or might not throw
an OverflowException.
// It would if SomeMethod were compiled
with checked instructions.
}
Some recommended rules to programmers
(1) Use signed data types
(such as Int32 and Int64) instead of unsigned numeric
types (such as UInt32 and
UInt64) wherever possible.
(2) As you write your
code, explicitly use
checked around blocks where an unwanted overflow
might occur due to invalid input data
(3) As you write your
code, explicitly use
unchecked around blocks where an overflow is OK,
such as calculating a checksum
(4) For any code that doesn’t use checked or unchecked, the
assumption is that you do want an exception to occur on overflow.
Important: The System.Decimal type
is a very special type. Although many
programming languages (C# and Visual
Basic included) consider Decimal a primitive
type, the CLR does not. This
means that the CLR doesn’t have IL instructions that know how to manipulate
a Decimal value. If you look up
the Decimal type in the
.NET Framework SDK documentation, you’ll see that it has public static
methods
called Add, Subtract, Multiply, Divide,
and so on. In addition,
the Decimal type provides
operator overload methods for +, -, *, /, and so on.
When
you compile code that
uses Decimal values, the
compiler generates code to call Decimal’s members to
perform the actual operation. This means
that manipulating Decimal values
is slower than manipulating CLR primitive
values. Also, because there are no
IL instructions for manipulating Decimal values,
the checked and unchecked operators, statements,
and compiler switches have no effect. Operations on Decimal values
always
throw an OverflowException if
the operation can’t be performed safely.
Similarly,
the System.Numerics.BigInteger type
is also special in that it internally uses an array
of UInt32s to represent an
arbitrarily large integer whose value has no upper or lower bound.
Therefore, operations on a BigInteger never result in an
OverflowException. However, a BigInteger operation may throw
an OutOfMemoryException if
the value gets too large and there is insufficient available memory to
resize the array.
3.
Reference Types and Value Types
The CLR supports two kinds of
types: reference
types and value types.
In C#, types declared using struct are value types, and
types declared using class are reference types.
Value type instances are
usually allocated on a thread’s
stack (although they can also be embedded as a field
in a reference type object). The variable representing the instance doesn’t
contain a pointer to an instance; the variable contains the fields of the
instance itself.
Reference
types are always allocated
from the managed heap, and the C# new operator returns the memory
address of the object—the memory address refers to the object’s bits.
All of
the structures are immediately derived
from the System.ValueType abstract
type. System.ValueType is
itself immediately derived
from the System.Object type. By
definition, all value types must be derived from System.ValueType. All
enumerations are derived from the System.Enum abstract type, which is
itself derived from System.ValueType. The CLR and all
programming languages give enumerations special treatment.
In addition, all value types are
sealed, which prevents a value type from being used as a base type
for any other reference type or value type
Important For many developers
(such as unmanaged C/C++ developers), reference types and value types will
seem strange at first. In unmanaged C/C++,
you declare a type, and then the
code that uses the type gets
to decide if an instance of
the type should be allocated on the thread’s
stack or in the application’s heap.
In managed code, the
developer defining the type indicates where instances of the type are
allocated; the developer using the type has no control over
this.
4.
CLR controls the Layout of Type‘s Fields
To improve performance, the CLR is capable of arranging the fields
of a type any way it chooses.
You tell the CLR what to do by applying
the System.Runtime.InteropServices. StructLayoutAttribute attribute
on the class or structure you’re defining. To this attribute’s constructor,
you can
pass LayoutKind.Auto to
have the CLR arrange the
fields, LayoutKind.Sequential to
have the CLR preserve your field layout,
or LayoutKind.Explicit to explicitly
arrange the fields in memory by using offsets. If you don’t explicitly
specify
the StructLayoutAttribute on
a type that you’re defining, your compiler selects whatever layout it
determines is best.
You should be aware that Microsoft’s C#
compiler selects LayoutKind.Auto for
reference types (classes)
and LayoutKind.Sequential for value
types (structures).
The StructLayoutAttribute also
allows you to explicitly indicate the offset of each field by
passing LayoutKind.Explicit to
its constructor. Then you apply an instance of
the System.Runtime.InteropServices.FieldOffsetAttribute attribute
to each field passing to this attribute’s constructor an Int32 indicating
the offset (in bytes) of the field’s first byte from the beginning of the
instance. Explicit layout is typically used to simulate what would be
a union in
unmanaged C/C++ because you can have multiple fields starting at the same
offset in memory.
The Differences between Value Type and Reference Type:
(1) Value type objects have two representations:
an unboxed form and
a boxed form. Reference
types are always in a boxed form.
(2) Value types are derived from
System.ValueType. This type offers the same methods as defined by
System.Object. However, System.ValueType overrides the Equals method so
that it returns true if the values of the two objects’ fields match. In
addition, System.ValueType overrides the GetHashCode method to produce a
hash code value by using an algorithm that takes into account the values in
the object’s instance fields.
(3) Because you can’t define a new value type or a new
reference type by using a value type as a base class, you shouldn’t
introduce any new virtual methods into a value type. No methods can be
abstract, and all methods are
implicitly sealed (can’t
be overridden).
(4) Reference type variables contain the memory address of
objects in the heap. By default, when a reference type variable is created,
it is initialized to null, indicating
that the reference type variable doesn’t currently point to a valid object.
Attempting to use a null reference type variable causes
a NullReferenceException to
be thrown. By contrast, value type variables always contain a value of the
underlying type, and all members of the value type are initialized to 0.
Since a value type variable isn’t a pointer, it’s not possible to generate
a NullReferenceException when accessing a value type. The CLR does offer a
special feature that adds the notion of nullability to a value type. This
feature, called nullable types.
(5) When you assign a value type variable to another value
type variable,
a field-by-field copy is
made. When you assign a reference type variable to another reference
type variable, only the memory address is copied.
(6) Two or more reference type variables can refer to
a single object in the heap, allowing operations on one variable to
affect the object referenced by the other variable. On the other
hand, value type variables
are distinct objects, and
it’s not possible for operations on one value type variable to affect
another
(7) Because unboxed value
types aren’t allocated on the heap,
the storage allocated
for them is freed as
soon as the method that defines an instance of the type is no
longer active. This means that a value type instance doesn’t receive
a notification (via a Finalize method) when its memory is reclaimed.
4.
Boxing and Unboxing Value Types
It’s possible to convert a value type
to a reference type by using a mechanism
called boxing.
Internally, here’s what happens when an instance of a value type is
boxed:
1. Memory is allocated from
the managed heap. The amount of memory allocated is the size required by
the value type’s fields plus the two additional
overhead members (the type object pointer and the sync block
index) required by all objects on the
managed heap.
2.
The value
type’s fields are copied
to the newly
allocated heap memory.
3.
The address of the object
is returned. This address is now a reference
to an object; the value type is now a reference
type.
When trying convert reference type to value type. Two steps to
accomplish the progress:
First,
the address of the value
type‘s fields in the boxed value type‘s
object is obtained. This process is
called unboxing.
Then,
the values of
these fields are copied from
the heap to the stack-based value type instance.
Unboxing is not the exact opposite of
boxing. The unboxing operation is much less
costly than boxing.
Unboxing is really just the
operation of obtaining a pointer to the raw value type (data fields)
contained within an object. In effect, the pointer refers
to the unboxed portion in the boxed instance. So, unlike boxing, unboxing
doesn’t involve the copying of any bytes in memory. Having made this
important clarification, it is important to note
that an unboxing operation is typically followed by
copying the fields.
Unboxed value types
are lighter-weight types
than reference types for two reasons:
(1) They are not
allocated on the managed
heap.
(2) They don’t
have the additional
overhead members that every object on the heap has:
a type object pointer and a sync block index.
Because unboxed value types don’t have a sync block index, you
can’t have multiple threads synchronize their access to the instance by
using the methods of
the System.Threading.Monitor type
5.
Changing Fields in a Boxed Value Type by Using Interfaces
6.
Object Equality and Identity
The System.Object type offers a virtual
method named Equals,
whose purpose is to return true if two
objects contain the same value. The
implementation of Object’s Equals method looks like this:
public class Object {
public virtual Boolean Equals(Object
obj) {
// If
both references point to the same object,
// they
must have the same value.
if (this
== obj) return true;
// Assume
that the objects do not have the same value.
return
false;
}
}
At first, this seems like a reasonable default implementation of
Equals: it returns true if the this and obj arguments refer to the same
exact object. This seems reasonable because Equals knows that an object
must have the same value as itself. However, if the arguments refer to
different objects, Equals can’t be certain if the objects contain the same
values, and therefore, false is returned. In
other words, the default implementation of Object’s Equals method really
implements identity, not value equality.
Here is how to properly implement an Equals method
internally
1.
If the obj argument
is null,
return false because the
current object identified by this is
obviously not null when the
nonstatic Equals method is
called.
2.
If
the this and obj arguments
refer to the same object, return true. This
step can improve performance when comparing objects with many fields.
3.
If
the this and obj arguments
refer to objects of different types,
return false. Obviously, checking if a
String object is equal to a FileStream object should result in a false
result.
4.
For each instance field defined by the type, compare the value in
the this object
with the value in
the obj object. If any
fields are not equal, return false.
5.
Call the base
class’s Equals method so it
can compare any fields defined by it. If the base
class’s Equals method
returns false,
return false; otherwise,
return true.
So Microsoft should have implemented Object’s Equals like
this:
e.g
public class Object {
public virtual Boolean Equals(Object
obj) {
// The
given object to compare to can‘t be null
if (obj
== null) return false;
// If
objects are different types, they can‘t be equal.
if
(this.GetType() != obj.GetType()) return false;
// If
objects are same type, return true if all of their fields match
// Since
System.Object defines no fields, the fields match
return
true;
}
}
But,
since Microsoft didn’t implement Equals this way, the rules for how to
implement Equals are significantly more complicated than you would think.
When a type overrides Equals, the override should call its base class’s
implementation of Equals unless it would be calling Object’s
implementation. This also means that since a type can override
Object’s Equals method, this Equals method can no longer be called to test
for identity. To fix this, Object offers a
static ReferenceEquals method,
which is implemented like this:
public class Object {
public static Boolean
ReferenceEquals(Object objA, Object objB) {
return
(objA == objB);
}
}
You should always
call ReferenceEquals if you want to
check for identity (if two references point to the same object). You
shouldn’t use the C# == operator (unless you cast both operands to Object
first) because one of the operands’ types could overload the == operator,
giving it semantics other than identity.
As you can see, the .NET Framework has a very confusing story when
it comes to object equality and identity. By the way, System.ValueType (the
base class of all value types) does override Object’s Equals method and is
correctly implemented to perform a value equality check (not an identity
check). Internally, ValueType’s Equals is implemented this way:
1.
If the obj argument is null, return false.
2.
If the this and obj arguments refer to objects of different types, return
false.
3.
For each instance field defined by the type, compare the value in the this
object withthe value in the obj object by calling the field’s Equals method. If
any fields are notequal, return false.
4.
Return true. Object’s Equals method is not called by ValueType’s Equals
method.
Internally, ValueType’s Equals method uses reflection in step
#3.
The four properties of equality
.. Equals must
be reflexive; that is, x.Equals(x) must
return true.
.. Equals must be symmetric;
that is, x.Equals(y) must return the same value as y.Equals(x).
.. Equals must be transitive;
that is, if x.Equals(y) returns true and y.Equals(z) returns true, then
x.Equals(z) must also return true.
.. Equals must be consistent.
Provided that there are no changes in the two values being compared, Equals
should consistently return true or false.
When overriding the Equals method, there are a few more things that
you’ll probably want to do:
.. Have the type implement the
System.IEquatable<T> interface’s Equals method
This
generic interface allows you to define a type-safe Equals method. Usually,
you’ll implement the Equals method that takes an Object parameter to
internally call the type-safe Equals method
.. Overload the == and !=operator
methods
Usually,
you’ll implement these operator methods to internally call the type-safe
Equals method.
7.
Object Hash Codes
The designers of the FCL decided that it would be incredibly useful
if any instance of any object could be placed into a hash table collection.
To this
end, System.Object provides
a virtual GetHashCode method
so that an Int32 hash code
can be obtained for any and all objects.
If you define a type
and override the Equals method,
you should also
override the GetHashCode method.
In fact, Microsoft’s C# compiler emits a warning if you define a type that
overrides Equals without also overriding GetHashCode.
The reason why a type that defines Equals must also define
GetHashCode is that the implementation of the System.Collections.Hashtable
type, the System.Collections. Generic.Dictionary type, and some other
collections require that any two objects that
are equal must have the same hash code value. So if you override
Equals, you should override GetHashCode to ensure that the algorithm you
use for calculating equality corresponds to the algorithm you use for
calculating the object’s hash code.
Defining
a GetHashCode method can
be easy and straightforward.
But depending on your data types and the distribution of data, it can be
tricky to come up with a hashing algorithm that returns a well-distributed
range of values. Here’s a simple example that will probably work just
fine for Point objects:
internal sealed class Point {
private readonly Int32 m_x,
m_y;
public override Int32 GetHashCode()
{
return
m_x ^ m_y; // m_x XOR‘d with m_y
}
...
}
When selecting an algorithm for
calculating hash codes for instances of your type,
try to follow these guidelines:
.. Use an algorithm that gives a good
random distribution for the best performance
of the hash table.
.. Your algorithm can also call the base type’s GetHashCode method,
including its return value. However, you don’t
generally want to call
Object’s or ValueType’s GetHashCode method, because the
implementation in either method doesn’t lend itself to
highperformance hashing algorithms.
.. Your algorithm should use at least one
instance field.
.. Ideally, the fields you
use in your algorithm should
be immutable; that is, the fields should
be initialized when the object is constructed, and they should never
again change during the object’s lifetime.
.. Your algorithm should execute
as quickly as
possible.
.. Objects with the same
value should return
the same code. For example, two
String objects with the same text should return the same hash code
value.
System.Object’s implementation of
the GetHashCode method
doesn’t know anything about its derived type and any fields that are in the
type. For this reason, Object’s GetHashCode method returns a number that is
guaranteed to uniquely identify the object within the AppDomain; this
number is guaranteed not to change for the lifetime of the object. After
the object is garbage collected, however, its unique number can be reused
as the hash code for a new object.
Note If a
type overrides Object’s
GetHashCode method, you
can no longer call it
to get a unique ID for the object. If
you want to get a unique ID (within an AppDomain) for an object, the FCL
provides a method that you can call. In
the System.Runtime.CompilerServices namespace,
see
the RuntimeHelpers class’s
public, static
GetHashCode method that takes a reference to
an Object as an
argument. RuntimeHelpers’
GetHashCode method returns
a unique ID for an object even if the object’s
type overrides Object’s GetHashCode method. This method got its name
because of its heritage, but it would have been better if Microsoft
had named it something
like GetUniqueObjectID.
System.ValueType’s implementation
of GetHashCode uses
reflection (which is slow)
and XORs some of the type’s instance
fields together. This is a na?ve implementation that might be good for some
value types, but I still recommend that you implement GetHashCode
yourself because you’ll know exactly what it does, and your implementation
will be faster than ValueType’s implementation.
8.
Dynamic Primitive Type
Important Do not
confuse dynamic and var.
Declaring a local variable using var is just a syntactical shortcut that
has the compiler infer the specific data type from an expression.
The var keyword can
be used only for declaring local
variables inside a method while
the dynamic keyword can
be used for local variables, fields, and
arguments. You cannot cast an expression to var but you can cast
an expression to dynamic. You must explicitly initialize a variable
declared using var while you do not have to initialize a variable declared
with dynamic.
Important A dynamic expression is really the
same type as System.Object. The compiler assumes that whatever
operation you attempt on the expression is legal, so the compiler will
not generate any warnings or errors. However, exceptions will be thrown at
runtime if you attempt to execute an invalid operation. In addition, Visual
Studio cannot offer any IntelliSense support to help you write code against
a dynamic expression. You cannot define an extension method that extends
dynamic, although you can define one that extends Object. And, you cannot
pass a lambda expression or anonymous method as an argument to a dynamic
method call since the compiler cannot infer the types being used.