The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or clickhereto continue anyway

Exploring Swift Dictionary's Implementation – ankit.im

Swift Dictionary :

There are multiple ways to store these (key, value) records one of which is open addressing with linear probing which is used by swifts dictionary implementation.

Consider a dictionary of capacity = 8, it can store a max of 7 records of (key, value) because there should be atleast one empty space (called holes) in the dictionary buffer which acts as sentinel for stopping the search during retrivals/insertions.

This will look something like this in memory:

A continuous memory is allocated with

size = capacity * (sizeof(Bitmap) + sizeof(Keys) + sizeof(Values))

This can be rearranged logically to look something like this :

Each column is called a bucket which holds three things : a bitmap value, a key and a value

Bitmap value of a bucket tells if the key and value in that bucket is valid and initialized or not. If not, then that bucket is called a hole.

_HashedContainerStorageHeader (struct)

This struct acts as header to the memory buffer which will be created. It contains the capacity and count of the allocated storage where capacity is actual capacity of the allocated memory and count is number of valid elements which are held right now in the buffer.

maxLoadFactorInverse is factor used to grow the buffer when it needs to expand.

_NativeDictionaryStorageImpl<Key, Value> (class)

subclass of ManagedBuffer<_HashedContainerStorageHeader, UInt8>

The main responsibility of this class is to allocate the memory needed for the dictionary and give pointers to bitmap, keys and values array for mutating them for eg:

internal var _values: UnsafeMutablePointer<Value> {
  @warn_unused_result
  get {
    let start = UInt(Builtin.ptrtoint_Word(_keys._rawValue)) &+
      UInt(_capacity) &* UInt(strideof(Key.self))
    let alignment = UInt(alignof(Value))
    let alignMask = alignment &- UInt(1)
    return UnsafeMutablePointer<Value>(
        bitPattern:(start &+ alignMask) & ~alignMask)
  }
}

Since Bitmap, keys and values memory is allocated contiguously, values_pointer will be keys_pointer + capacity * keys_pointer.

internal class func create(capacity: Int) -> StorageImpl {
  let requiredCapacity = bytesForBitMap(capacity) + bytesForKeys(capacity)
      + bytesForValues(capacity)

  let r = super.create(requiredCapacity) { _ in
    return _HashedContainerStorageHeader(capacity: capacity)
  }
  let storage = r as! StorageImpl
  let initializedEntries = _BitMap(
      storage: storage._initializedHashtableEntriesBitMapStorage,
      bitCount: capacity)
  initializedEntries.initializeToZero()
  return storage
}

All the entries in bitmap are initalized to zero when buffer is created i.e all buckets are holes. It is also worth noting that the generic type Key of this class doesnt need to follow Hashable to use this storage implementation.

// Note: It is intended that Key, Value
// (without : Hashable) is used here - this storage must work
// with non-Hashable types.

_NativeDictionaryStorage<Key : Hashable, Value> (struct)

This struct keeps the above storage implementation in its buffer property and provides method which converts contiguous allocated memory into logical bucket array. Also note that the end of bucket array is connected to start of the bucket array forming a logical ring. This just means that if youre at end of the bucket array and you call next method, youll end up at begining of the array.

To do operation like insert or get an value using its key, we need to figure out which bucket the key belongs to. Since Key is Hashable, there is a hashValue method on it which will return an Int, but this hashValue can be greater than capacity of the dictionary buffer so it must be squeezed to be within bounds of 0 and capacity :

@warn_unused_result
internal func _bucket(k: Key) -> Int {
  return _squeezeHashValue(k.hashValue, 0..<capacity)
}

We can move between the buckets using _next and _prev methods. Capacity is always a number which can be represented in powers of 2. If were at end of the bucket array well just cycle back to 0th position by calling _next

internal var _bucketMask: Int {
  // The capacity is not negative, therefore subtracting 1 will not overflow.
  return capacity &- 1
}

@warn_unused_result
internal func _next(bucket: Int) -> Int {
  // Bucket is within 0 and capacity. Therefore adding 1 does not overflow.
  return (bucket &+ 1) & _bucketMask
}

@warn_unused_result
internal func _prev(bucket: Int) -> Int {
  // Bucket is not negative. Therefore subtracting 1 does not overflow.
  return (bucket &- 1) & _bucketMask
}

To add an element, we just need to figure out which bucket it belongs to and then call this method which will insert the key and value in the bucket and also set its bitmaps entry to true.

@_transparent
internal func initializeKey(k: Key, value v: Value, at i: Int) {
  _sanityCheck(!isInitializedEntry(i))

  (keys + i).initialize(k)
  (values + i).initialize(v)
  initializedEntries[i] = true
  _fixLifetime(self)
}

_find method :

/// Search for a given key starting from the specified bucket.
///
/// If the key is not present, returns the position where it could be
/// inserted.
@warn_unused_result
internal
func _find(key: Key, _ startBucket: Int) -> (pos: Index, found: Bool) {
  var bucket = startBucket

  // The invariant guarantees there's always a hole, so we just loop
  // until we find one
  while true {
    let isHole = !isInitializedEntry(bucket)
    if isHole {
      return (Index(nativeStorage: self, offset: bucket), false)
    }
    if keyAt(bucket) == key {
      return (Index(nativeStorage: self, offset: bucket), true)
    }
    bucket = _next(bucket)
  }
}

This is why the hashValue of Hashable should be extremely good function for good performance.

func _squeezeHashValue(hashValue: Int, _ resultRange: Range<UInt>) -> UInt {
  let mixedHashValue = UInt(bitPattern: _mixInt(hashValue))
  let resultCardinality: UInt = resultRange.endIndex - resultRange.startIndex
  if _isPowerOf2(resultCardinality) {
    return mixedHashValue & (resultCardinality - 1)
  }
  return resultRange.startIndex + (mixedHashValue % resultCardinality)
}

func _mixUInt64(value: UInt64) -> UInt64 {
  // Similar to hash_4to8_bytes but using a seed instead of length.
  let seed: UInt64 = _HashingDetail.getExecutionSeed()
  let low: UInt64 = value & 0xffff_ffff
  let high: UInt64 = value >> 32
  return _HashingDetail.hash16Bytes(seed &+ (low << 3), high)
}

static func getExecutionSeed() -> UInt64 {
  // FIXME: This needs to be a per-execution seed. This is just a placeholder
  // implementation.
  let seed: UInt64 = 0xff51afd7ed558ccd
  return _HashingDetail.fixedSeedOverride == 0 ? seed : fixedSeedOverride
}

static func hash16Bytes(low: UInt64, _ high: UInt64) -> UInt64 {
  // Murmur-inspired hashing.
  let mul: UInt64 = 0x9ddfea08eb382d69
  var a: UInt64 = (low ^ high) &* mul
  a ^= (a >> 47)
  var b: UInt64 = (high ^ a) &* mul
  b ^= (b >> 47)
  b = b &* mul
  return b
}

This can be summarized in this image :

_NativeDictionaryStorageOwner (class)

This is just becoming awkward now :

_VariantDictionaryStorage<Key : Hashable, Value> (enum)

This enum has two cases with storage for Native/Cocoa dictionary as stored properties inside those cases.

case Native(_NativeDictionaryStorageOwner<Key, Value>)
case Cocoa(_CocoaDictionaryStorage)

Major features :

  internal mutating func nativeUpdateValue(
    value: Value, forKey key: Key
  ) -> Value? {
    var (i, found) = native._find(key, native._bucket(key))
    
    let minCapacity = found
      ? native.capacity
      : NativeStorage.getMinCapacity(
          native.count + 1,
          native.maxLoadFactorInverse)

    let (_, capacityChanged) = ensureUniqueNativeStorage(minCapacity)
    if capacityChanged {
      i = native._find(key, native._bucket(key)).pos
    }

    let oldValue: Value? = found ? native.valueAt(i.offset) : nil
    if found {
      native.setKey(key, value: value, at: i.offset)
    } else {
      native.initializeKey(key, value: value, at: i.offset)
      native.count += 1
    }

    return oldValue
  }
/// - parameter idealBucket: The ideal bucket for the element being deleted.
/// - parameter offset: The offset of the element that will be deleted.
/// Requires an initialized entry at offset.
internal mutating func nativeDeleteImpl(
  nativeStorage: NativeStorage, idealBucket: Int, offset: Int
) {
  _sanityCheck(
      nativeStorage.isInitializedEntry(offset), "expected initialized entry")

  // remove the element
  nativeStorage.destroyEntryAt(offset)
  nativeStorage.count -= 1

  // If we've put a hole in a chain of contiguous elements, some
  // element after the hole may belong where the new hole is.
  var hole = offset

  // Find the first bucket in the contiguous chain
  var start = idealBucket
  while nativeStorage.isInitializedEntry(nativeStorage._prev(start)) {
    start = nativeStorage._prev(start)
  }

  // Find the last bucket in the contiguous chain
  var lastInChain = hole
  var b = nativeStorage._next(lastInChain)
  while nativeStorage.isInitializedEntry(b) {
    lastInChain = b
    b = nativeStorage._next(b)
  }

  // Relocate out-of-place elements in the chain, repeating until
  // none are found.
  while hole != lastInChain {
    // Walk backwards from the end of the chain looking for
    // something out-of-place.
    var b = lastInChain
    while b != hole {
      let idealBucket = nativeStorage._bucket(nativeStorage.keyAt(b))

      // Does this element belong between start and hole?  We need
      // two separate tests depending on whether [start,hole] wraps
      // around the end of the buffer
      let c0 = idealBucket >= start
      let c1 = idealBucket <= hole
      if start <= hole ? (c0 && c1) : (c0 || c1) {
        break // Found it
      }
      b = nativeStorage._prev(b)
    }

    if b == hole { // No out-of-place elements found; we're done adjusting
      break
    }

    // Move the found element into the hole
    nativeStorage.moveInitializeFrom(nativeStorage, at: b, toEntryAt: hole)
    hole = b
  }
}

And finally the dictionary looks like this

(just one single property of the above enum)

Continue reading on ankit.im