Skip to content

Latest commit

 

History

History
681 lines (506 loc) · 23 KB

File metadata and controls

681 lines (506 loc) · 23 KB

Map

A Map is a data structure where a key is mapped to a value. It’s used for a fast lookup of values based on the given key. Only one key can map to a value (no key duplicates are possible).

Note
Map has many terms depending on the programming language. Here are some other names: Hash Map, Hash Table, Associative Array, Unordered Map, Dictionary.

Map Applications

Maps are one of the most popular data structures because of their fast lookup time.

Holding key/value pairs have many applications like:
  • Caching: keys are URLs, and values are website content.

  • Indexing: keys are words, and values are the list of documents where they are found.

  • Spell checking: keys are English words.

  • Networks: the key is an IP address/port number, while the value is the corresponding application.

There are many other use cases. We will explore some techniques that we can use to speed up your code with it. But first, let’s get the fundamentals out of the way.

Map vs Array

A map shares some similarities with an array. In an array, the key/index is always a number, while the value in a Map can be anything!

Both an Array and Map are very fast for getting values by key in constant time O(1) on average.

A Map uses an array internally. It translates the key into an array’s index using a hash function. That’s why it is also called "Hash Map" or "Hash Table".

Map vs Objects

JavaScript has two ways to use Maps: one uses objects ({}), and the other is using the built-in Map.

Using Objects as a HashMap.
const objMap = {};
// mapping values to keys
objMap['str'] = 'foo'; // string as key
objMap[1] = 'bar'; // number as key
objMap[{}] = 'test1'; // object as key (not recommended)
const obj1 = {};
objMap[obj1] = 'test2'; // object as key (not recommended)

// searching values by key
console.log(objMap[1]); //↪️ bar
console.log(objMap['str']); //↪️ foo
console.log(objMap[{}]); //↪️ test2 👀
console.log(objMap[obj1]); //↪️ test2 👀

console.log(objMap); // {1: "bar", str: "foo", [object Object]: "test"}

Notice that the objMap[{}] and objMap[obj1] return the same value! They both were converted to [object Object] as a key.

Let’s now use the built-in Map

JavaScript Built-in Map Usage
const myMap = new Map();
// mapping values to keys
myMap.set('str', 'foo'); // string as key
myMap.set(1, 'bar'); // number as key
myMap.set({}, 'test1'); // object as key
const obj1 = {};
myMap.set(obj1, 'test2');

// searching values by key
console.log(myMap.get(1)); //↪️ bar
console.log(myMap.get('str')); //↪️ foo
console.log(myMap.get({})); //↪️ undefined 👀
console.log(myMap.get(obj1)); //↪️ test2

console.log(myMap);
// Map(4) {"str" => "foo", 1 => "bar", {…} => "test1", {…} => "test2"}

As you can see, Map handled other objects as a key much better.

Objects are one of the oldest data structures in JavaScript. Maps were introduced as part of the ES2015 enhancements to solve the shortcomings of using Object as a Hashmap. Having two methods can be confusing. We are going to make it clear when to use one or the other.

Map vs. Object main differences:
  • Object's keys should be strings, numbers, or symbols. Map's keys can be anything! Strings, numbers, symbols, arrays, objects, and even other maps!

  • Objects are not guaranteed to be in insertion order. Maps guarantee insertion order.

  • When using Objects as HashMaps, they might be polluted with other keys defined at the prototype chain. You need to use hasOwnProperty or Object.keys to avoid these issues. Maps doesn’t get polluted by the prototype chain.

  • Maps has been optimized for cases of frequent additions and removals. Objects are not optimized.

When do you use an Object over a Map then? When you need to use JSON since it doesn’t support Maps yet.

You can convert from Map to Object and vice-versa:

const objMap = Object.fromEntries(myMap.entries());   // map -> obj
const map = new Map(objMap.entries());                // obj -> map

For completeness, here are some of the most basic operations side-by-side.

Object vs Map Side-by-Side
//
// Initialization
//
const obj1 = {};                            // Empty
const obj2 = { adrian: 33, nathalie: 32 };  // w/values

const map1 = new Map();                                   // Empty
const map2 = new Map([['adrian', 33], ['nathalie', 32]]); // w/values

//
// Access
//
assert.equal(obj1.adrian, undefined);
assert.equal(obj2['adrian'], 33); // also "obj2.adrian"

assert.equal(map1.get('adrian'), undefined);
assert.equal(map2.get('adrian'), 33);

//
// Check if the key exists
//
assert.equal(obj1.adrian !== undefined, false);
assert.equal(obj2['adrian'] !== undefined, true);

assert.equal(map1.has('adrian'), false);
assert.equal(map2.has('adrian'), true);

//
// Adding new elements
//
obj2['Abi'] = 2;
obj2['Dudu'] = 2;

map2.set('Abi', 2).set('Dudu', 2);

//
// Deleting
//
delete obj2.Dudu;

map2.delete('Dudu');

//
// Iterating key/value pairs with for loops
//
for (var k in obj2){
   console.log(`key: ${k}, value: ${obj2[k]}`);
}

for (const [k, v] of map2){
  console.log(`key: ${k}, value: ${v}`);
}

//
// Iterating key/value pairs with
//
Object.keys(obj2)
  .forEach(k => console.log(`key: ${k}, value: ${obj2[k]}`));

map2
  .forEach((v, k) => console.log(`key: ${k}, value: ${v}`));

//
// Getting the size
//
assert.equal(Object.keys(obj2).length, 3);
assert.equal(map2.size, 3);

//
// Representation
//
console.log(obj2);
// { adrian: 33, nathalie: 32, Abi: 2 }
console.log(map2);
// Map { 'adrian' => 33, 'nathalie' => 32, 'Abi' => 2 }

From this point on, we will use built-in Maps (and not objects).

Key by Reference vs. by Value

There’s a catch when you use objects/arrays/classes as keys on a Map. JavaScript will match the key only if it has the same reference in memory.

Look at the following example:

Array as a Map’s key
const map = new Map();

map.set([1, 2, 3], 'value');
console.log(map.get([1, 2, 3])); // undefined 👀

Trying to access a Map’s value using a complex type is a common gotcha. If you want array as a key to work, you need to hold on to a reference, like the following example.

Array reference as a Map’s key
const map = new Map();
const arr = [1, 2, 3];

map.set(arr, 'value');
console.log(map.get(arr)); // 'value'

The same applies to any key that is not a number, string, or symbol.

Map Inner Workings
There are two popular ways to implement Maps, key/value pair data structures:
  • Array + Hash Function: Hash Map

  • Balanced Binary Search Tree: TreeMap.

In this chapter, we will focus on the Hash Map implementation, which is the one that JavaScript has built-in. In the next parts, we will cover TreeMap.

A map uses an array to store the values and a hash function that translate the key into an array index behind the scenes.

Let’s say we have the following key/value pairs.

const map = new Map();

map.set('cat', 2);
map.set('dog', 1);
map.set('rat', 7);
map.set('art', 8);
There are multiple algorithms for hashing keys. One of them is using modulo division:
  1. Convert the key into a number (a.k.a hash code or pre-hashing).

  2. Convert the number from step 1 into an array index using modulo. (hashCode % arrayLength).

image
Figure 1. Internal HashMap representation

No hash function is perfect, so it will map two different keys to the same value for some cases. That’s what we called a collision. When that happens, we chain the results on the same bucket. If we have too many collisions, it could degrade the lookup time from O(1) to O(n).

The Map doubles the size of its internal array to minimize collisions when it reaches a certain threshold. This restructuring is called a rehash. This rehash operation takes O(n), since we have to visit every old key/value pair and remap it to the new internal array. Rehash doesn’t happen very often, so statistically speaking, Maps can insert/read/search in constant time O(1).

Note
collisions and rehashes are handled automatically. But it’s still good to know the trade-offs. We will go into more detail when we compare it with TreeMaps.
HashMap time complexity

Hash Map is optimal for searching values by key in constant time O(1). However, searching by value is not any better than an array since we have to visit every value O(n).

Table 1. Time complexity for a Hash Map

Data Structure

Searching By

Insert

Delete

Space Complexity

Index/Key

Value

Hash Map

O(1)

O(n)

O(1)*

O(1)

O(n)

* = Amortized run time. E.g. rehashing might affect run time.

As you can notice, we have amortized times since it will take O(n) while it resizes in the unfortunate case of a rehash. After that, it will be O(1).

HashMap Patterns for Solving Interview Questions

HashMaps are one of the most versatile data structures. You can speed up many programs by using them correctly. In this section, we are going to explore some common patterns.

Smart Caching

One everyday use case for key/value data structures is caching. If you are working on a trendy website, you can save scale better if you cache the results instead of hitting the database and other expensive services every time. That way, you can server many more users requesting the same data.

A common issue with cache you want to expire data you don’t often use to make room for hot data. This next exercise is going to help you do that.

HM-A) Design a Least Recently Used (LRU) cache. This cache has a limit on the number of items it can store. Once the limit it’s reached, it will discard the least recently used item. Design a class that takes a limit value, and the methods put and get, to insert and get data.

Signature
/**
 * Least Recently Used (LRU) cache.
 * Key/Value storage with fixed max number of items.
 * Least recently used items are discarded once the limit is reached.
 * Reading and updating the values mark the items as recently used.
 */
class LRUCache {
  /**
   * @param {number} capacity - The max number of items on the cache
   */
  constructor(capacity) {

  }

  /**
   * Get the value associated with the key. Mark keys as recently used.
   * @param {number} key
   * @returns {number} value or if not found -1
   */
  get(key: number): number {

  }

  /**
   * Upsert key/value pair. Updates mark keys are recently used.
   * @param {number} key
   * @param {number} value
   * @returns {void}
   */
  put(key, value) {

  }
}
Examples:
const c = new LRUCache(2); // capacity: 2
c.put(2, 1); // Cache is [2:1]
c.put(1, 1); // Cache is [2:1, 1:1]
c.put(2, 3); // Cache is [1:1, 2:3]
c.put(4, 1); // Removed 1. Cache is [2:3, 4:1]
c.get(1); // Returns -1 (key 1 not found)
c.get(2); // Returns 3. Cache is [4:1, 2:3]
c.put(5, 5); // Removed key 4. Cache is [2:3, 5:5]
c.get(4); // Returns -1 (key 4 not found)
Tip
Try it on your own before reading the solution on the next page!

Solution

The LRU cache behavior is almost identical to the Map.

The differences are:
  • LRU cache has a limited size, while Map grows until you run out of memory.

  • LRU cache removes the least used items once the limit is reached.

We can extend the Map functionality. Also, the Map implementation on JavaScript already keeps the items by insertion order. Every time we read or update a value, we can remove it from where it was and add it back. That way, the oldest (least used) it’s the first element on the Map.

Solution: extending Map
class LRUCache extends Map {
  constructor(capacity) {
    super();
    this.capacity = capacity;
  }

  get(key) {
    if (!super.has(key)) return -1;
    const value = super.get(key);
    this.put(key, value); // re-insert at the top (most recent).
    return value;
  }

  put(key, value) {
    if (super.has(key)) super.delete(key);
    super.set(key, value);
    if (super.size > this.capacity) {
      const oldestKey = super.keys().next().value;
      super.delete(oldestKey);
    }
  }
}

Notice that we call put within get. This is to rotate the keys to the top (most recent place).

Complexity Analysis
  • Time Complexity: O(1). All operations read, write, update, and delete takes O(1).

  • Space complexity: O(k). In this case, k, is the capacity of the cache. Even if n has 1 million items, we would only hold to the k most recent ones.

Trading Speed for Space

Maps have a O(1) runtime for lookups and O(n) space complexity. It can improve the speed of programs in exchange for using a little bit more of memory. Let’s do an example.

Let’s say you are working on a webcrawler, and for each site, you want to find out the most common words used. How would you do it?

HM-B) Given a text, return the most common words in descending order. You should sanitize the input by removing punctuation !?',;. and converting all letters to lowercase. Return the most common words in descending order.

Signature
/**
 * Given text and banned words,
 * return the most common words in descending order.
 * @param {string} text - The text to parse.
 * @param {number} n - The number of results.
 * @return {string[]}
 */
function mostCommonWords(text, n = 1) {
  // you code goes here
}
Examples:
mostCommonWords(
  'The map, maps keys to values; Keys can be anything.',
  1); // ['keys']
mostCommonWords(
  'Look at it! What is it? It does look like my code from 1 year ago',
  2); // ['it', 'look']
mostCommonWords(
  'a; a,b, a\'s c A!; b,B,    c.',
  4); // ['a', 'b', 'c', 's']
Tip
Try it on your own before reading the solution on the next page!

Solutions

This is a problem that has multiple steps:
  1. Split the text into lowercased words and remove whitespaces and punctuation.

  2. Count the frequency of words.

  3. Sort words by frequency and return the top n words.

Possible implementations for each of the steps:
  1. We can use regex (regular expressions) and split on non-words \W+. The runtime of this will be O(n).

  2. Let’s discuss this later.

  3. We have an array of the word → frequency pairs. We can sort by the frequency using the built-in sort function and return the subarray with the top n results. The time complexity would be O(n log n).

For step 2, we can do it in multiple ways. A brute force solution is using 2 for loops to count the number of times each word appear:

Solution 1: Brute Force
link:../../interview-questions/most-common-words-ii.js[role=include]

Notice that we null out the counted words. That’s to avoid counting the phrase more than once.

Complexity Analysis:
  • Time complexity: O(n^2). We have three steps and each one has its time complexity: O(n) + O(n^2) + O(n log n). Remember that with Big O notation, we only care about the term with the highest order: n^2.

  • Space complexity: O(n). We use multiple O(n) auxiliary spaces for these variables: words, entries, and the solution is also n space.

Another alternative is to use a Map to count.

Solution 2: Map counter
link:../../interview-questions/most-common-words-ii.js[role=include]

With this solution, we iterate over the words only once. We first get the current count and add one. If the word didn’t exist, we would default to a count of 0. Steps 1 and 3 are almost identical to solution #1.

Complexity Analysis
  • Time Complexity: O(n log n). We have 3 steps: O(n) + O(n) + O(n log n). The most significant term is n log n.

  • Space complexity: O(n). We used the same O(n) auxiliary space as solution #1 for words, entries, and the solution. Additionally, we added one more O(n) space for the Map.

Sliding Window

We saw sliding windows with arrays before. We are now going to use them with strings. The idea is very similar, we still use the two pointers, and the solution is the "window" between the pointers. We can increase or decrease the window as long as it keeps the constraints of the problem. Let’s do an example for better understanding.

HM-C) Return the length of the longest substring without repeating characters.

Signature
/**
 * Return the length of the longest substring without repeating characters.
 * @param {string} s
 * @return {number}
 */
function longestSubstring(s) {
  // your code goes here!
};
Examples
longestSubstring('abcdaefg'); // 7 ('bcdaefg')
longestSubstring('abbaa'); // 2 ('ab')
longestSubstring('abbadvdf') // 4 ('badv')
Tip
Try it on your own before reading the solution on the next page!

Solutions

We are going to solve this problem by using a sliding window approach. We have two pointers called lo and hi. They start both at zero, and we increase the window as long as they don’t have duplicates. If we found a duplicate, we reopen a new window past the duplicated value.

Take a look at this illustration doing an example for the string abbadvdf:

sliding window for abbadvdf

As you can see, we calculate the length of the string on each iteration and keep track of the maximum value.

What would this look like in code? Let’s try a couple of solutions. Let’s go first with the brute force and then how we can improve it.

We can have two pointers, lo and hi, to define a window. We can use two for-loops for that. Later, within lo to hi window, we want to know if there’s a duplicate value. A simple and naive approach is to use another two for-loops to check for duplicates (4 nested for-loop)! We need labeled breaks to skip updating the max if there’s a duplicate.

Warning
The following code can hurt your eyes. Don’t try this in production; for better solutions, keep reading.
Solution 1: Super Brute Force
/**
 * Return the length of the longest substring without repeating characters.
 * @param {string} s
 * @return {number}
 */
function longestSubstring(s) {
  let max = 0;

  for (let lo = 0; lo < s.length; lo++) {
    repeatedFound:
    for (let hi = lo; hi < s.length; hi++) {
      // check if it's unique withing [lo,hi] range
      for (let i = lo; i < hi; i++) {
        for (let j = lo + 1; j <= hi; j++) {
          if (i !== j && s[i] === s[j]) break repeatedFound;
        }
      }
      // all are unique between [lo,hi] range
      max = Math.max(max, hi - lo + 1);
    }
  }

  return max;
};

This function gets the job done. But how efficient is it?

Complexity Analysis
  • Time Complexity: O(n^4). In the worst-case, when the string has all unique characters, we have n^4!

  • Space complexity: O(1). The only variable we are using is integers.

Solution 1 has a horrible runtime, but the space complexity is constant. Can we trade space for a better speed performance? Absolutely!

Instead of having two loops for finding duplicates, we can do one pass and use a Map to detect duplicates.

Solution 2: Brute force with Map
/**
 * Return the length of the longest substring without repeating characters.
 * @param {string} s
 * @return {number}
 */
function longestSubstring(s) {
  let max = 0;

  for (let lo = 0; lo < s.length; lo++) {
    repeatedFound:
    for (let hi = lo; hi < s.length; hi++) {
      // check if it's unique withing [lo,hi] range
      const map = new Map();
      for (let i = lo; i <= hi; i++) {
        if (map.has(s[i])) break repeatedFound;
        map.set(s[i], true);
      }
      // all are unique between [lo,hi] range
      max = Math.max(max, hi - lo + 1);
    }
  }

  return max;
}

We are using the Map to detect duplicates, where the characters are the keys.

Complexity Analysis
  • Time Complexity: O(n^3). We have three nested loops that, in the worst-case, each will visit n items.

  • Space complexity: O(n). We have a map that might grow as big as size n.

One optimization that we can do the solution 2 is to store the index where we last saw a character. We can map each character to its index. That way, when we find a duplicate, we can update the lo pointer with it, shrinking the window.

Solution 3: Optimized Sliding Window
/**
 * Return the length of the longest substring without repeating characters.
 * @param {string} s
 * @return {number}
 */
function longestSubstring(s) {
  const map = new Map();
  let max = 0;

  for (let hi = 0, lo = 0; hi < s.length; hi++) {
    if (map.has(s[hi])) lo = Math.max(lo, map.get(s[hi]) + 1);
    map.set(s[hi], hi);
    max = Math.max(max, hi - lo + 1);
  }

  return max;
};

This solution has the least amount of code, and it’s also the most efficient!

Something that might look unnecessary is the Math.max when updating the lo pointer. You can remove it and try running it with the string "abba", what would happen?

Complexity Analysis
  • Time Complexity: O(n). We do one pass and visit each character once.

  • Space complexity: O(n). We store everything on the Map so that the max size would be n.

Practice Questions

Fit two movies in a flight

HM-1) You are working in an entertainment recommendation system for an airline. Given a flight duration (target) and an array of movies length, you need to recommend two movies that fit exactly the length of the flight. Return an array with the indices of the two numbers that add up to the target. No duplicates are allowed. If it’s not possible to return empty [].

Common in interviews at: FAANG.

Examples:

twoSum([113, 248, 80, 200, 91, 201, 68], 316); // [1, 6] (248 + 68 = 316)
twoSum([150, 100, 200], 300); // [2, 3] (100 + 200 = 300)
twoSum([150, 100, 200], 150); // [] (No two numbers add up to 150)

Starter code:

link:../../interview-questions/two-sum.js[role=include]
Subarray Sum that Equals K

HM-2) Given an array of integers, find all the possible subarrays to add up to k. Return the count.

Common in interviews at: FAANG

Examples:

subarraySum([1], 1); // 1 (1 equals to 1 :)
subarraySum([1, 1, 1], 1); // 3 ([1], [1], [1] equals 1)
subarraySum([1, -1, 1], 0); // 2 (sum([1, -1]), sum([-1, 1]) equals 0)
subaraySum([1, 2, 3, 0, 1, 4, 0, 5], 5) // 8
// All of these 8 sub arrays add up to 5:
// [2, 30], [2,3,0], [0,1,4], [0,1,4,0], [1,4], [1,4,0], [0,5], [5]

Starter code:

link:../../interview-questions/subarray-sum-equals-k.js[role=include]