@ -151,13 +151,17 @@ The only piece of information we know for sure when we look at an RSS value is t
When a process wants memory, the Kernel will allocate space for it in a unit called a page. On Linux systems this is generally 4096 bytes, or 4KiB (you can check for sure on your system with `getconf PAGESIZE`). If you're new to this concept and didn't understand what I meant when I used the term "page" earlier in the post, now you know what I mean. :)
When a process wants memory, the Kernel will allocate space for it in a unit called a page. On Linux systems this is generally 4096 bytes, or 4KiB (you can check for sure on your system with `getconf PAGESIZE`). If you're new to this concept and didn't understand what I meant when I used the term "page" earlier in the post, now you know what I mean. :)
When the kernel allocates a page requested by the process, various things are taken into account such as whether the page is private or shared (in our case, all pages we are concerned with are private), whether the page is backed by some data readable from the disk, whether it is something special like page cache etc. All of this is to determine whether a page can be considered "reclaimable" by the kernel. User-space processes, drivers, or even kernel operations may be holding a lot of pages in RAM at a given time, and if a system is not under memory pressure, then the process might as well keep the pages in RAM because presumably these pages were useful for somebody at some point. However, once the system **is** under some manner of memory pressure, the kernel will do some work to reclaim the least important pages held in RAM at the moment to satisfy new requests.
When the kernel allocates a page requested by the process, various things are taken into account such as whether the page is private or shared (in our case, all pages we are concerned with are private), whether the page is backed by some data readable from the disk, whether it is something special like page cache etc. All of this is to determine whether a page can be considered "reclaimable" by the kernel. User-space processes, drivers, or even kernel operations may be holding a lot of pages in RAM at a given time, and if a system is not under memory pressure, then the process might as well keep the pages in RAM because presumably these pages were useful for somebody at some point. However, once the system **is** under some manner of memory pressure, the kernel will do some work to reclaim the least important pages held in RAM at the moment to satisfy new requests.
I bring all of this up to say that some of the first pages to go in a high pressure reclaim scenario are file-backed memory mappings. An anonymous memory mapping is less preferred in part because if the page is dirty (aka has been written to) it needs to be written to swap-space on disk before the page can be reclaimed, otherwise the data would be lost. A read-only file-backed mapping is always clean and thus doesn't have this restriction, as the data is readily available it can simply be read from disk again if it's needed.
I bring all of this up to say that some of the first pages to go in a high pressure reclaim scenario are often file-backed memory mappings. The first consideration is generally any inactive clean pages, but after taking an account of the LRU pages file-backed pages are preferred. An anonymous memory mapping is less preferred in part because if the page is dirty (aka has been written to) it needs to be written to swap-space on disk before the page can be reclaimed, otherwise the data would be lost. A read-only file-backed mapping is always clean and thus doesn't have this restriction, as the data is readily available it can simply be read from disk again if it's needed.
While file-backed mappings are preferred, the reclaim still happens in a least-recently-used manner. Pages that are heavily actively referenced won't be reclaimed as readily as dead pages. The `otelcol(-contrib)` file backed pages will be considerably active since they are constantly read for the process's operation, but it doesn't mean they aren't reclaimable when push comes to shove. So while at time of checking we found that the first mapping of `otelcol-contrib` is taking 72576 bytes of RSS, that doesn't mean many of its pages won't be reclaimed in a memory pressure scenario.
While file-backed mappings are preferred, the reclaim still happens in a least-recently-used manner. Pages that are heavily actively referenced won't be reclaimed as readily as dead pages. The `otelcol(-contrib)` file backed pages will be considerably active since they are constantly read for the process's operation, but it doesn't mean they aren't reclaimable when push comes to shove. So while at time of checking we found that the first mapping of `otelcol-contrib` is taking 72576 bytes of RSS, that doesn't mean many of its pages won't be reclaimed in a memory pressure scenario.
That means that just looking at RSS doesn't always paint a full picture of what matters in our process's memory usage. The OOM (Out Of Memory) Killer is one of the biggest things you want to avoid, but a high RSS doesn't necessarily mean you need to fear the OOM Killer yet; the OOM Killer will kill a process specifically when it is **unable to reclaim enough space to fill a memory request**. That is to say that it will first do everything it can to reclaim enough pages of memory to satisfy the new allocation request before OOM Killing a process.
That means that just looking at RSS doesn't always paint a full picture of what matters in our process's memory usage. The OOM (Out Of Memory) Killer is one of the biggest things you want to avoid, but a high RSS doesn't necessarily mean you need to fear the OOM Killer yet; the OOM Killer will kill a process specifically when it is **unable to reclaim enough space to fill a memory request**. That is to say that it will first do everything it can to reclaim enough pages of memory to satisfy the new allocation request before OOM Killing a process.
Since I imagine the primary audience here is observability-minded folks, so let's tie this back to observability for those who haven't long since dropped off the article. Obviously the fresh VM I used for this experiment is experiencing essentially no memory pressure and can very easily satisfy memory requests for the foreseeable future, but even if we were getting tight, I might be getting worried about my collector by looking at the higher RSS value. If I'm looking at the memory usage of my system, and RSS as my primary per-process memory metric, I might consider the Collector to be contributing to some significant portion of that memory pressure at a glance. But RSS is a cumulative measurement that often measures a lot of reclaimable pages. The RSS itself could still be bad; since it's also measuring dirty anonymous memory mappings, a consistently rising RSS can still indicate a memory leak in the actual program, and a process with a lot of RSS in a high memory pressure system may be not at risk of being killed but still at risk of causing heavy [page thrashing](https://en.wikipedia.org/wiki/Thrashing_(computer_science)) for the system. However we can't actually grasp the true nature of a given process' memory usage and what effect it has on our whole system just by its RSS value, and in the case of my contrived example the high RSS is not such a big risk because we know how much of RSS is presently taken up by file-backed pages.
Since I imagine the primary audience here is observability-minded folks, so let's tie this back to observability for those who haven't long since dropped off the article. Obviously the fresh VM I used for this experiment is experiencing essentially no memory pressure and can very easily satisfy memory requests for the foreseeable future, but even if we were getting tight, I might be getting worried about my collector by looking at the higher RSS value. If I'm looking at the memory usage of my system, and RSS as my primary per-process memory metric, I might consider the Collector to be contributing to some significant portion of that memory pressure at a glance. But RSS is a cumulative measurement that often measures a lot of reclaimable pages. The RSS itself could still be bad; since it's also measuring dirty anonymous memory mappings, a consistently rising RSS can still indicate a memory leak in the actual program, and a process with a lot of RSS in a high memory pressure system may be not at risk of being killed but still at risk of causing heavy [page thrashing](https://en.wikipedia.org/wiki/Thrashing_(computer_science)) for the system. However we can't actually grasp the true nature of a given process' memory usage and what effect it has on our whole system just by its RSS value, and in the case of my contrived example the high RSS is not such a big risk because we know how much of RSS is presently taken up by file-backed pages.
### Note on cgroups
The Collector is often not running as a standalone process like this. Typically it will be running under a `cgroup`, either as a `systemd` service or as a container image. A `cgroup` itself can have a local memory limit. The page reclaim behaviour that I explained in the previous section when the entire system is under memory pressure also applies to when a cgroup is locally hitting its memory limit. However the page reclaim doesn't occur on the whole system, only locally on the pages owned by the cgroup.
## Conclusion
## Conclusion
It still helps to keep Go binary sizes down. Using more RSS can still cause some problems. But the impact that the larger binary actually has on our practical system operation is not as bad as it looks.
It still helps to keep Go binary sizes down. Using more RSS can still cause some problems. But the impact that the larger binary actually has on our practical system operation is not as bad as it looks.
Had a horrendous sleep last night, let's see what I can reasonably accomplish today. If I could, I'd pick some lower brainpower tasks, but I've got too much to do...
I worked a bit on `distrogen` documentation. I have a large wave of new features to add, and I'm excited to be able to share the tool with new OTel community members whilst actually having usage documentation to go with it.
I don't have a lot else to say journal-wise today because of just how hard I worked on the Go Binary Size blog post. It was my hardest post yet to write, but I'm really proud of it.
**Song of the day**: Break Those Bones Whose Sinews Gave It Motion by Meshuggah
I focus well to music that is incredibly aggressive, catchy, and groovy. This song is all 3 to me. Maybe not melodically catchy, but man that rhythm hooks me...
# Feb 4, 2026
# Feb 4, 2026
Starting my journal yippee. :)
Starting my journal yippee. :)
@ -34,3 +23,33 @@ I saw another AI slop PR in Collector Contrib today. My reaction to obvious LLM-
**Song of the day**: A.M War by Karnivool
**Song of the day**: A.M War by Karnivool
I have been listening to a lot of Karnivool since the new album comes out this Friday! First album in 13 years!!!
I have been listening to a lot of Karnivool since the new album comes out this Friday! First album in 13 years!!!
# Feb 5, 2026
Had a horrendous sleep last night, let's see what I can reasonably accomplish today. If I could, I'd pick some lower brainpower tasks, but I've got too much to do...
I worked a bit on `distrogen` documentation. I have a large wave of new features to add, and I'm excited to be able to share the tool with new OTel community members whilst actually having usage documentation to go with it.
I don't have a lot else to say journal-wise today because of just how hard I worked on the Go Binary Size blog post. It was my hardest post yet to write, but I'm really proud of it.
**Song of the day**: Break Those Bones Whose Sinews Gave It Motion by Meshuggah
I focus well to music that is incredibly aggressive, catchy, and groovy. This song is all 3 to me. Maybe not melodically catchy, but man that rhythm hooks me...
# Feb 6, 2026
I slept a little better last night at least.
Today I have a lot of writing to do. I have to write a handoff for the GBOC packaging project, I have to finish the distrogen docs, and I'm writing a small piece about Collector plugin loading at runtime. So it will be a sparse journal entry once again likely.
Everyone is all excited today about the fact that Claude agents were able to write a C compiler. Cool I guess? It feels like pure hype-bait to me, except worse than the web browser because this time it was something that "worked"; that web browser was an obvious piece of garbage, but this compiles C code wow! But I'm really not shocked at all that an AI could write a C compiler tbh. The language is very heavily specified in text, and there are lots of C compilers that the AI is likely already trained on. Building something that can successfully compile C code isn't exactly rocket science. Building something that can compile C code with all of the immense optimization work that goes into popular compilers like gcc is where the rocket science really happens.
Yesterday we talked about issues with double-writing old and new schemas when receivers are transition to semconv. The reconciliation is quite difficult if the two schemas have a a metric with the same name but other details about the metric changing. We came up with a decent reconciliation pattern here, but it might be confusing for new users in multiple ways. https://github.com/open-telemetry/opentelemetry-collector/pull/14538#discussion_r2775609768
**Song of the day**: The entire IN VERSES album by Karnivool
The new album came out, and dare I say it was worth the 13 year wait. This album closer is transcendental.
# Feb 7, 2026
I'm doing a bit of writing today, but will be busy with a family event for the rest of the day so not much to say!
**Song of the day**: My Pantheon (Forevermore) by Kamelot
They are kind of a guilty pleasure band for me. Common perception appears to be that they are corny and formulaic. But for me that corn is buttered and salted, and it's a damn good formula.