Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide consensus of naming "cardinality" constants. #221

Open
Tracked by #220
luckytyphlosion opened this issue Aug 18, 2023 · 31 comments
Open
Tracked by #220

Decide consensus of naming "cardinality" constants. #221

luckytyphlosion opened this issue Aug 18, 2023 · 31 comments

Comments

@luckytyphlosion
Copy link
Member

Cardinality is just number of elements for a set. So it could refer to the total length of an array, the number of values an enum could have, and other stuff.

For example:

u16 GetTotalMulchQuantity(Bag *bag, HeapID heapId) {
    s32 i;
    u16 total;
    for (total = 0, i = 0; i < NUM_MULCHES; i++) {
        total += Bag_GetQuantity(bag, ITEM_GROWTH_MULCH + i, heapId);
    }
    return total;
}

NUM_ to refer to the number of elements in a set is weird, because you could also have numMulches which stores the current amount of mulches. TODO investigate what naming we want.

We could say TOTAL_MULCHES, but in the above example, total is used to determine a dynamic cardinality.

It's probably good to determine what terminology should be used.

@adrienntindall
Copy link
Collaborator

Im ok with a _MAX suffix

@luckytyphlosion
Copy link
Member Author

_MAX is good, I didn't think of that one

@luckytyphlosion
Copy link
Member Author

the spirit of pikalax — Today at 8:15 PM
NUM_MULCH_KINDS
solved

@red031000
Copy link
Member

this is already not consistent in the code we have, but imo MULCH_MAX is the way to go here

@luckytyphlosion
Copy link
Member Author

I think MAX is the best. What makes stuff like total, num, and count ambiguous is that they can refer to both a noun and adjective. But max is much harder to interpret as a noun. e.g. "the max of mulches" doesn't make much sense, whereas "the total of mulches" does, leading to ambiguity. I'll leave this up a bit longer, but I think MAX is the best option.

The issue still remains unresolved for set-like structures, such as arrays. What terminology should we use to describe the cardinality of an array? Some options are length, size, and count.

@red031000
Copy link
Member

red031000 commented Aug 18, 2023

length imo, considering stuff like player name

sounds a lot better being PLAYER_NAME_LENGTH etc

@AnonymousRandomPerson
Copy link
Contributor

Between my day job and hobbies in Java and C#, they can't even agree and I have to use all three of length, size, and count. I don't have any preferences so any of those is fine.

@luckytyphlosion
Copy link
Member Author

length imo, considering stuff like player name

sounds a lot better being PLAYER_NAME_LENGTH etc

Fyi, this can be either LEN or LENGTH.

@red031000
Copy link
Member

LENGTH better imo, really dont like abbreviations when there's no reason to

@luckytyphlosion
Copy link
Member Author

Fyi, we should probably clarify between using Num and Total.

Total should be used when aggregating together a quantity-like value, so we say GetTotalMulchQuantity, not GetNumMulchQuantity
Num should be used when calculating a value involving the count of something, so we say GetNumVowelsInString, not GetTotalVowelsInString

@PikalaxALT
Copy link
Collaborator

I'd also debate the use of Get here. Get implies that the total is precomputing and you're just grabbing it from cache. Calc or Compute indicates a more complex procedure, even if it is just sum(x.quantity for x in mulches).

@luckytyphlosion
Copy link
Member Author

Get is another issue entirely that I didn't want to open just yet because of how intricate it is.

@AsparagusEduardo
Copy link
Contributor

AsparagusEduardo commented Aug 19, 2023

Gen 3 had the _COUNT suffix, so maybe that could work here?

@tgsm
Copy link
Collaborator

tgsm commented Aug 19, 2023

NUM_ to refer to the number of elements in a set is weird, because you could also have numMulches which stores the current amount of mulches.

I personally prefer NUM_XXX or TOTAL_XXX_KINDS. As for situations where we would have numXxx we could work around this with different terminology like kindsOfXxxs or xxxQuantity.

@luckytyphlosion
Copy link
Member Author

Gen 3 had the _COUNT suffix, so maybe that could work here?

Count runs into issues as you can use it to refer to both a calculated value and a cardinality. E.g. for the given shared code:

// take a guess what this array is
extern const u8 gVowels[];

#define VOWEL_A 'A'
#define VOWEL_E 'E'
#define VOWEL_I 'I'
#define VOWEL_O 'O'
#define VOWEL_U 'U'

If we use the COUNT to describe the cardinality of vowels, you get ambiguity between the two terms (count as the cardinality of vowels vs count as the calculated number of vowels in the string):

#define VOWEL_COUNT 5

u32 String_GetVowelCount(const u8 * data, u32 len) {
	u32 count = 0;
	int i, j;

	for (i = 0; i < len; i++) {
		u8 curChar = data[i];
		for (j = 0; j < VOWEL_COUNT; j++) {
			if (curChar == gVowels[j]) {
				count++;
				break;
			}
		}
	}

	return count;
}

But if we use MAX to describe the cardinality of vowels, we can avoid having two concepts share the same name:

#define MAX_VOWELS 5

u32 String_GetVowelCount(const u8 * data, u32 len) {
	u32 count = 0;
	int i, j;

	for (i = 0; i < len; i++) {
		u8 curChar = data[i];
		for (j = 0; j < MAX_VOWELS; j++) {
			if (curChar == gVowels[j]) {
				count++;
				break;
			}
		}
	}

	return count;
}

@AsparagusEduardo
Copy link
Contributor

Gotcha, I understand now.

@luckytyphlosion
Copy link
Member Author

This might be a problem though. AsparagusEduardo mentioned that in the gen 3 repos, MAX refers to a hard limit of the maximum cardinality a set could have, while NUM refers to the current cardinality of the set. This happens in the gen 4 repos as well.

#define NUM_APRICORN_TREE 31
#define MAX_APRICORN_TREE 128

typedef struct SAVE_MISC_DATA {
    APRICORN_TREE apricorn_trees[MAX_APRICORN_TREE];
	// etc...
};

static const u8 sTreeApricorns[NUM_APRICORN_TREE] = {
    APRICORN_BLK, // Route 1
    APRICORN_PNK, // Route 2 North
    APRICORN_YLW, // Route 8
	// etc...
};

void ApricornTrees_Init(APRICORN_TREE *trees) {
    int i;
    MI_CpuClear8(trees, sizeof(APRICORN_TREE) * MAX_APRICORN_TREE);
    for (i = 0; i < MAX_APRICORN_TREE; i++) {
        trees[i].unk_0 = 0;
    }
}

// ApricornTrees_IterateOverAndDoSomething
void sub_0202AE0C(APRICORN_TREE *trees) {
    int i;
    for (i = 0; i < MAX_APRICORN_TREE || i < NUM_APRICORN_TREE; i++) {
        sub_0202AE30(&trees[i]);
        trees[i].unk_0 = 1;
        trees[i].unk_1 = 1;
    }
}

Both MAX_APRICORN_TREE and NUM_APRICORN_TREE are needed. NUM_APRICORN_TREE refers to the cardinality of the defined apricorn trees, while MAX_APRICORN_TREE refers to the max possible allowed apricorn trees, as restricted by the save data. If we choose to use MAX for set cardinality, we need to determine what to use for the "hardware limit" cardinality.

@luckytyphlosion
Copy link
Member Author

One possibility: say MAX_APRICORN_TREES and MAX_SAVE_APRICORN_TREES

@luckytyphlosion
Copy link
Member Author

We could also say MAX for hardware limit, NUM for defined limit, and count for quantity. So when standardizing the below function:

BOOL Save_NumModifiedPCBoxesIsTooMany(SaveData *saveData) {
    return Save_CalcNumModifiedPCBoxes(saveData) >= 6;
}

We would rename the relevant terms to:

// actually, Save_HasTooManyModifiedPCBoxes is probably a better name here
BOOL Save_ModifiedPCBoxesCountIsTooMany(SaveData *saveData) {
    return Save_CalcModifiedPCBoxesCount(saveData) >= 6;
}

@luckytyphlosion
Copy link
Member Author

Fyi, count might have issues because it can be used as a verb?

static u32 Save_CalcNumModifiedPCBoxes(SaveData *saveData) {
    return PCModifiedFlags_CountModifiedBoxes(Save_CalcPCBoxModifiedFlags(saveData));
}

@luckytyphlosion
Copy link
Member Author

Idea: use LIMIT for "hardware" limit

// take a guess what this array is
extern const u8 gPokeBallItems[];

#define MAX_GROUND_ITEMS 10
#define LIMIT_GROUND_ITEMS 20

typedef struct GroundItem {
    u16 item;
    s16 x;
    s16 y;
    u16 quantity;
} GroundItem;

typedef struct GroundItems {
    GroundItem data[LIMIT_GROUND_ITEMS];
} GroundItems;

const u16 sGroundItemIds[MAX_GROUND_ITEMS] = {
    ITEM_POKE_BALL,
    ITEM_POTION,
    // etc...
};

extern GroundItems gGroundItems;

void GroundItems_Init(void) {
    int i;
    MI_CpuClear8(gGroundItems, sizeof(GroundItem) * LIMIT_GROUND_ITEMS);
    for (i = 0; i < LIMIT_GROUND_ITEMS && i < MAX_GROUND_ITEMS; i++) {
        gGroundItems.data[i].item = sGroundItemIds[i];
        gGroundItems.data[i].x = Random();
        gGroundItems.data[i].y = Random();
        gGroundItems.data[i].quantity = Random();
    }
}

// or GroundItems_GetNumOfSpecificItem
u32 GroundItems_FindNumInstancesOfItem(u16 item) {
    u32 numInstances;
    int i;

    for (i = 0; i < LIMIT_GROUND_ITEMS && i < MAX_GROUND_ITEMS; i++) {
        if (gGroundItems.data[i].item == item) {
            numInstances++;
        }
    }

    return numInstances;
}
        
u32 GroundItems_CalcTotalQuantityOfItem(u16 item) {
    u32 totalQuantity;
    int i;

    for (i = 0; i < LIMIT_GROUND_ITEMS && i < MAX_GROUND_ITEMS; i++) {
        if (gGroundItems.data[i].item == item) {
            totalQuantity += gGroundItems.data[i].quantity;
        }
    }

    return totalQuantity;
}

u32 GroundItems_GetNumGroundItems(void) {
    u32 count;
    int i;

    for (i = 0; i < LIMIT_GROUND_ITEMS && i < MAX_GROUND_ITEMS; i++) {
        if (gGroundItems.data[i].quantity != 0) {
            count++;
        }
    }
    return count;
}

@red031000
Copy link
Member

I don't mind NUM_MULCHES and MULCHES_MAX for the hardware limit

@luckytyphlosion
Copy link
Member Author

The only issue with NUM is that it can lead to two concepts sharing the same name, e.g.

u32 GroundItems_GetNumGroundItems(void) {
    u32 count;
    int i;

    for (i = 0; i < MAX_GROUND_ITEMS && i < NUM_GROUND_ITEMS; i++) {
        if (gGroundItems.data[i].quantity != 0) {
            count++;
        }
    }
    return count;
}

@luckytyphlosion
Copy link
Member Author

So it seems like we have two choices when referring to set cardinality. Refer to my above code samples when I mention specific functions.

Choice 1

NUM for the defined limit and MAX for the hardware limit.

Advantages:

  • More consistent and familiar with what gen 3 and gen 4 repos currently have

Disadvantages:

  • Functions which involve getting a count would have naming restrictions. Either we aren't allowed to refer to Num in the function at all, or we do allow Num and we deal with two concepts sharing the same name.
    • Not allowed/ambiguity: GroundItems_FindNumInstancesOfItem. Allowed/distinct: GroundItems_CountItemInstances or GroundItems_GetItemInstancesCount.
    • Not allowed/ambiguity: GroundItems_GetNumGroundItems. Allowed/distinct: GroundItems_CountGroundItems or GroundItems_GetNumGroundItems

Choice 2

MAX for the defined limit and LIMIT for the hardware limit.

Advantages:

  • Less instances of two concepts sharing the same name.
  • As a byproduct of the above point, allows us to refer to a count in functions as either a Count or a Num, which increases flexibility in how we name functions:
    • All three allowed: GroundItems_FindNumInstancesOfItem, GroundItems_CountItemInstances, GroundItems_GetItemInstancesCount.
    • All three allowed: GroundItems_GetNumGroundItems, GroundItems_CountGroundItems, GroundItems_GetNumGroundItems

Disadvantages:

  • Slightly clunky naming for constants: MAX_GROUND_ITEMS reads from left to right, but LIMIT_GROUND_ITEMS does not. And opting to name the limit constant as GROUND_ITEMS_LIMIT leads to naming asymmetry, which could be confusing, as in the below example.
#define MAX_GROUND_ITEMS 10
#define GROUND_ITEMS_LIMIT 20

An alternative

It's possible that the instances where the hardware limit needs to be a constant are so far and few that we can get away with needing to create a name for it. For example, in the apricorn example, we can call the cardinality referring to the fixed data as MAX_APRICORN_TREES, and the cardinality referring to the save data as MAX_SAVE_APRICORN_TREES. The same can be done for the fictional GroundItems example, i.e. MAX_GROUND_ITEMS and MAX_SAVE_GROUND_ITEMS respectively. Whether opting for this approach is better depends on the possible instances where a hardware limit constant needs to exist, as this SAVE naming scheme might not work for other instances we haven't seen yet.

@luckytyphlosion
Copy link
Member Author

Compilation of symbols containing the phrase "count" or "num" (case insensitive without word boundary): https://pastebin.com/4DGN38mU

@luckytyphlosion
Copy link
Member Author

luckytyphlosion commented Aug 21, 2023

Worth mentioning: use LENGTH for arrays, not NUM. Still some possible conflicts (see CountNumUniqueBagItems)

#define BAG_LENGTH 20
// BAD
// #define NUM_BAG_ITEMS 20
typedef struct Gen1ItemBagSlot {
	u8 itemId;
	u8 quantity;
} Gen1ItemBagSlot;

typedef struct Gen1ItemBag {
	u8 numBagItems;
	Gen1ItemBagSlot bagSlots[BAG_LENGTH];
} Gen1ItemBag;

u8 GetNumBagItems(Gen1ItemBag * bag) {
	return bag->numBagItems;
}

u8 CountNumPokeBallItems(Gen1ItemBag * bag) {
	u32 count;
	int i;

	for (i = 0; i < bag->numBagItems; i++) {
		if (IsPokeBallItem(bag->bagSlots[i].itemId)) {
			count++;
		}
	}
}

u8 SumBagItemsQuantity(Gen1ItemBag * bag) {
	u32 totalQuantity;
	int i;

	for (i = 0; i < bag->numBagItems; i++) {
		totalQuantity += bag->bagSlots[i].quantity;
	}

	return totalQuantity;
}

u8 CountNumUniqueBagItems(Gen1ItemBag * bag) {
	// almost becomes a problem? but NUM_ITEMS is different than numBagItems
	// still might be confusing
	bool8 uniqueBagItemsTracker[NUM_ITEMS] = {0};
	int i;
	u32 numUniqueBagItems = 0;

	for (i = 0; i < bag->numBagItems; i++) {
		// pretend Game Freak is bad and didn't write the code like this
		// bool8 bagItemFoundAlready = uniqueBagItemsTracker[bag->bagSlots[i].itemId];
		// if (!bagItemFoundAlready) {
		//	numUniqueBagItems++;
			uniqueBagItemsTracker[bag->bagSlots[i].itemId] = TRUE;
		//}
	}

	for (i = 0; i < NUM_ITEMS; i++) {
		if (uniqueBagItemsTracker[i]) {
			numUniqueBagItems++;
		}
	}
	return numUniqueBagItems;
}

Maybe you can name numBagItems something else (bagItemsCount?)

@luckytyphlosion
Copy link
Member Author

Currently I can think of two options for naming dynamic cardinalities:

Option 1:

typedef struct MapEvents {
    u32 numBgEvents;
    u32 numObjectEvents;
    u32 numWarpEvents;
    u32 numCoordEvents;
	// rest omitted
} MapEvents;

Option 2:

typedef struct MapEvents {
    u32 bgEventsCount;
    u32 objectEventsCount;
    u32 warpEventsCount;
    u32 coordEventsCount;
	// rest omitted
} MapEvents;

count conflicts less with NUM for cardinalities of enums, but you could argue that bgEventsCount is less pleasant to read than numBgEvents.

@luckytyphlosion
Copy link
Member Author

There's a lot to take in here, but based on the examples I listed, I think MAX is better because it conflicts less with other terms, the necessity of LIMIT constants is so far and few that the "jank" they add is minimal, and using MAX probably has better futureproofing (based on the past few days of me coming up with new examples where num could possibly be used elsewhere as a quantity.

@luckytyphlosion
Copy link
Member Author

One final thing. Capacity could be used to describe the maximum amount of elements allowed in a "real life" container, so you can use capacity to describe a bag pocket (KEY_ITEMS_POCKET_CAPACITY), your party (PARTY_CAPACITY), the number of mons that can fit in a box (PSS_BOX_CAPACITY), the number of boxes in the storage system (PSS_CAPACITY). Although, MAX is also another solid contender as it allows you to be more explicit in the constant name (MAX_BAG_KEY_ITEMS, PARTY_MAX_MONS, PSS_MAX_MONS_PER_BOX, PSS_MAX_BOXES). Size is probably a synonym with capacity, but capacity doesn't conflict with size referring to the raw byte size of an object.

@luckytyphlosion
Copy link
Member Author

Peanut Colada luckytyphlosion — Today at 3:26 PM

I wonder if you could use NUM if the container is known to be fixed
so like NUM_BOXES could be fine because the number of boxes is always the same, no matter what
but NUM_BAG_KEY_ITEMS is not, because the number of key items in the bag is dynamic

@luckytyphlosion
Copy link
Member Author

It's probably agreeable that NUM should not be used to describe the capacity of a container. The rest seems to be up in the air.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants