Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need full configuration guide #623

Open
ZhangJiaQiao opened this issue Feb 23, 2023 · 12 comments
Open

Need full configuration guide #623

ZhangJiaQiao opened this issue Feb 23, 2023 · 12 comments
Labels
feature-request New feature request

Comments

@ZhangJiaQiao
Copy link

Description

The configuration docs seems not to include some important params of HSE, such as the memory usage and compaction control. And I see that many params in config/hse_gparms.c are not mentioned in the docs, so I think the configuration docs is incomplete. I need a full description of HSE configuration so that I can test HSE properly, as I mentioned in this YCSB issue.

@ZhangJiaQiao ZhangJiaQiao added the feature-request New feature request label Feb 23, 2023
@alexttx
Copy link
Contributor

alexttx commented Feb 23, 2023

Many configuration parameters you see in source code are not meant to be changed except for debugging and development. We mark those parameters as "experimental". The non-experimental parameters are described here.

@alexttx
Copy link
Contributor

alexttx commented Feb 23, 2023

If a parameter has the PARAM_EXPERIMENTAL flag, then we don't recommend adjusting it. We typically do not adjust those except for test and debug.

@ZhangJiaQiao
Copy link
Author

How can I control the memory usage and compaction of HSE? Does HSE use memory under limit? What is its memory limit under a specfic machine, such as a server with 64G memory? Cause I often use RocksDB and HSE is also an LSM-based storage engine, difference between them makes me curious and wonder how HSE solves some problem of RocksDB in other ways.

@smoyergh
Copy link

HSE does not provide a config parameter to control memory consumption. A large portion of the memory that HSE consumes is clean pages in the page cache because we memory map immutable structures on media. So the kernel can quickly and easily discard those pages under memory presuure.

@ZhangJiaQiao
Copy link
Author

When I ran YCSB-HSE and YCSB-RocksDB, I observed such memory usage situation.

YCSB-HSE:

# Print free before run
              total        used        free      shared  buff/cache   available
Mem:       65970400     1608844    54659560       32916     9701996    63645376
Swap:       2097148      742192     1354956

LD_LIBRARY_PATH=/opt/hse/lib/x86_64-linux-gnu /opt/hse/bin/hse kvdb create ./ycsb_data/ycsb-hse-data

LD_LIBRARY_PATH=/opt/hse/lib/x86_64-linux-gnu ./bin/ycsb load hse -P workloads/my_workload -p hse.kvdb.home=./ycsb_data/ycsb-hse-data -threads 20 -p hse.kvdb.rparams="throttling.init_policy=light"

# Print top
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM 
  xxx   xxx        20   0 1106.1g  20.9g  94092 S  52.8  33.2 
# Print free after run
              total        used        free      shared  buff/cache   available
Mem:       65970400    23514652    32753288       32916     9702460    41739576
Swap:       2097148      742192     1354956

YCSB-RocksDB:

# Print free before run
              total        used        free      shared  buff/cache   available
Mem:       65970400     1610756    54656772       32916     9702872    63643460
Swap:       2097148      742192     1354956

./bin/ycsb load rocksdb -P workloads/my_workload -p rocksdb.dir=./ycsb_data/ycsb-rocksdb-data -threads 20

# Print top
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM 
  xxx   xxx        20   0   23.8g   1.5g  35392 S 302.0   2.4 
# Print free after run
              total        used        free      shared  buff/cache   available
Mem:       65970400     3197448    47180772       32916    15592180    62050260
Swap:       2097148      742192     1354956

From the test listed above, I can see that YCSB-HSE use up to 20.9G RES memory, while RocksDB only uses 1.5G because its write buffer is configured to be this size. The RES memory should be the memory allocated by the process, not included in the system page cache memory.

Why and Where does HSE use such large memory? Are there any things like RocksDB block cache or write buffer in the HSE? It feels dangerous to use HSE without control of its memory consumption.

@smoyergh
Copy link

smoyergh commented Feb 27, 2023

According to the top man page, RES includes mmap file data in the page cache. See the Linux Memory Types section of the top man pages which states that RES includes so called "quadrant 3" pages including mmap(PRIVATE, fd). As noted above, this page cache data is clean (immutable) and so the kernel can discard it whenever needed (without write-back).

Newly ingested data is held in memory until the associated operation/transaction completes/commits and is flushed. This is very similar to RocksDB memtables.

Background flushing is controlled by the durability.interval_ms config parameter, which is time-based rather than size-based like similar parameters in RocksDB. We chose to expose a time-based parameter becaused it better aligns to the semantics found in most DBs. However, it would certainly be possible to also expose a size-based parameter (though again that won't address the portion of RES that is page cache data).

@ZhangJiaQiao
Copy link
Author

OK, my mistake in mmap memory. So HSE completely uses mmap to access data files in storage, and no read/write IO at all? Can the memory used by mmap be controlled or limited in HSE?

@smoyergh
Copy link

HSE uses mmap for all queries of immutable data on media (essentially the equivalent of RocksDB's SSTable files). HSE uses direct I/O for all writes (which includes ingest of new data and output of compactions), and all compaction reads (so as not to perturb the page cache).

HSE has no way to control the amount of data in the page cache, and in fact even if it were possible that is a job best left to the kernel since it has global visibility of memory pressure and hot/cold pages.

@ZhangJiaQiao
Copy link
Author

Are there any buffers for writes (the equivalent of write buffers in RocksDB) or cache for compaction reads in HSE? How to control these memory usage?

@smoyergh
Copy link

smoyergh commented Mar 1, 2023

There are no significant write buffers or caches other than what is needed to hold newly ingested (but not yet flushed) data, as described in my earlier response. Cursors do require some buffering to merge data, but this is allocated as needed so there's no associated control parameter.

@ZhangJiaQiao
Copy link
Author

Will the ingested data be cleared from the memory once they are flushed? If there are a lot of data writing, will the memory for newly ingested data occupy much more than we expect?

@smoyergh
Copy link

smoyergh commented Mar 6, 2023

Buffered data is cleared from memory after it flushed. Newly ingested data occupies space proportional to its size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature request
Projects
None yet
Development

No branches or pull requests

3 participants