btrfs
- [PATCH v4 00/46] btrfs: add fscrypt support
The fscrypt work continues to steadily plod along, really hoping that there won't need to be many more version of the patchset, especially seeing as a bunch of the non-BTRFS-specific work has already landed.
- Release v0.14 · markfasheh/duperemovegithub.com Release v0.14 · markfasheh/duperemove
Notable changes: Batching has been reimplemented on top of the dedupe_seq. The "scan" phase has been reimplemented (see 8264336 for details). Filesystem locking has been implemented. See f3947e9 f...
- Today I discovered Garuda's BTRFS assistant and it's a total game changer.
cross-posted from: https://feddit.uk/post/4577666
> Was looking at how to set up snapper on Fedora 39 and came across the ever knowledgable Stephens tech talks video. It does balance, setting up snapper, sub-volume management in a really cool GUI tool. > > edit updated the link as the GitHub page was apparently ood, but it is in most repo's
- [PATCH v2 00/36] btrfs: add fscrypt support
Looks like it's v2 time.
The btrfs-progs -side patch is here.
- Is it possible to encrypt my harddisk when using btrfs?
Update:
With the native Manjaro installer I succeeded in making my disk encrypted. But it's below the btrfs layer (btrfs sits inside the encryption)
- btrfs appreciation post
Just wanted to share some love for this filesystem.
I’ve been running a btrfs raid1 continuously for over ten years, on a motley assortment of near-garbage hard drives of all different shapes and sizes. None of the original drives are still in it, and that server is now on its fourth motherboard. The data has survived it all!
It’s grown to 6 drives now, and most recently survived the runtime failure of a SATA controller card that four of them were attached to. After replacing it, I was stunned to discover that the volume was uncorrupted and didn’t even require repair.
So knock on wood — I’m not trying to tempt fate here. I just want to say thank you to all the devs for their hard work, and add some positive feedback to the heap since btrfs gets way more than it’s fair share of flak, which I personally find to be undeserved. Cheers!
- Btrfs progs release 6.3.3
Hi,
btrfs-progs version 6.3.3 have been released. This is a bugfix release.
There are two bug fixes, the rest is CI work, documentation updates and some preparatory work. Due to no other significant changes queued, the release 6.4 will be most likely skipped.
Changelog:
- add btrfs-find-root to btrfs.box
- replace: properly enqueue if there's another replace running
- other:
- CI updates, more tests enabled, code coverage, badges
- documentation updates
- build warning fixes
- [PATCH v1 00/17] btrfs: add encryption feature
``` This is a changeset adding encryption to btrfs. It is not complete; it does not support inline data or verity or authenticated encryption. It is primarily intended as a proof that the fscrypt extent encryption changeset it builds on work.
As per the design doc refined in the fall of last year [1], btrfs encryption has several steps: first, adding extent encryption to fscrypt and then btrfs; second, adding authenticated encryption support to the block layer, fscrypt, and then btrfs; and later adding potentially the ability to change the key used by a directory (either for all data or just newly written data) and/or allowing use of inline extents and verity items in combination with encryption and/or enabling send/receive of encrypted volumes. As such, this change is only the first step and is unsafe.
This change does not pass a couple of encryption xfstests, because of different properties of extent encryption. It hasn't been tested with direct IO or RAID. Because currently extent encryption always uses inline encryption (i.e. IO-block-only) for data encryption, it does not support encryption of inline extents; similarly, since btrfs stores verity items in the tree instead of in inline encryptable blocks on disk as other filesystems do, btrfs cannot currently encrypt verity items. Finally, this is insecure; the checksums are calculated on the unencrypted data and stored unencrypted, which is a potential information leak. (This will be addressed by authenticated encryption).
This changeset is built on two prior changesets to fscrypt: [2] and [3] and should have no effect on unencrypted usage.
[1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing [2] https://lore.kernel.org/linux-fscrypt/cover.1687988119.git.sweettea-kernel@dorminy.me/ [3] https://lore.kernel.org/linux-fscrypt/cover.1687988246.git.sweettea-kernel@dorminy.me ```
- [PATCH v1 00/12] fscrypt: add extent encryption
``` This changeset adds extent-based data encryption to fscrypt. Some filesystems need to encrypt data based on extents, rather than on inodes, due to features incompatible with inode-based encryption. For instance, btrfs can have multiple inodes referencing a single block of data, and moves logical data blocks to different physical locations on disk in the background.
As per discussion last year in [1] and later in [2], we would like to allow the use of fscrypt with btrfs, with authenticated encryption. This is the first step of that work, adding extent-based encryption to fscrypt; authenticated encryption is the next step. Extent-based encryption should be usable by other filesystems which wish to support snapshotting or background data rearrangement also, but btrfs is the first user.
This changeset requires extent encryption to use inlinecrypt, as discussed previously. There are two questionable parts: the forget_extent_info hook is not yet in use by btrfs, as I haven't yet written a test exercising a race where it would be relevant; and saving the session key credentials just to enable v1 session-based policies is perhaps less good than
This applies atop [3], which itself is based on kdave/misc-next. It passes most encryption fstests with suitable changes to btrfs-progs, but not generic/580 or generic/595 due to different timing involved in extent encryption. Tests and btrfs progs updates to follow.
[1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing [2] https://lore.kernel.org/linux-fscrypt/80496cfe-161d-fb0d-8230-93818b966b1b@dorminy.me/T/#t [3] https://lore.kernel.org/linux-fscrypt/cover.1687988119.git.sweettea-kernel@dorminy.me/
```
- [PATCH 00/18] btrfs: simple quotas
``` btrfs quota groups (qgroups) are a compelling feature of btrfs that allow flexible control for limiting subvolume data and metadata usage. However, due to btrfs's high level decision to tradeoff snapshot performance against ref-counting performance, qgroups suffer from non-trivial performance issues that make them unattractive in certain workloads. Particularly, frequent backref walking during writes and during commits can make operations increasingly expensive as the number of snapshots scales up. For that reason, we have never been able to commit to using qgroups in production at Meta, despite significant interest from people running container workloads, where we would benefit from protecting the rest of the host from a buggy application in a container running away with disk usage. This patch series introduces a simplified version of qgroups called simple quotas (squotas) which never computes global reference counts for extents, and thus has similar performance characteristics to normal, quotas disabled, btrfs. The "trick" is that in simple quotas mode, we account all extents permanently to the subvolume in which they were originally created. That allows us to make all accounting 1:1 with extent item lifetime, removing the need to walk backrefs. However, this sacrifices the ability to compute shared vs. exclusive usage. It also results in counter-intuitive, though still predictable and simple, accounting in the cases where an original extent is removed while a shared copy still exists. Qgroups is able to detect that case and count the remaining copy as an exclusive owner, while squotas is not. As a result, squotas works best when the original extent is immutable and outlives any clones.
==Format Change== In order to track the original creating subvolume of a data extent in the face of reflinks, it is necessary to add additional accounting to the extent item. To save space, this is done with a new inline ref item. However, the downside of this approach is that it makes enabling squota an incompat change, denoted by the new incompat bit SIMPLE_QUOTA. When this bit is set and quotas are enabled, new extent items get the extra accounting, and freed extent items check for the accounting to find their creating subvolume. In addition, 1:1 with this incompat bit, the quota status item now tracks a "quota enablement generation" needed for properly handling deleting extents with predate enablement.
==API== Squotas reuses the api of qgroups. The only difference is that when you enable quotas via
btrfs quota enable
, you pass the--simple
flag. Squotas will always report exclusive == shared for each qgroup. Squotas deal with extent_item/metadata_item sizes and thus do not do anything special with compression. Squotas also introduce auto inheritance for nested subvols. The API is documented more fully in the documentation patches in btrfs-progs.==Testing methodology== Using updated btrfs-progs and fstests (relevant matching patch sets to be sent ASAP) btrfs-progs: https://github.com/boryas/btrfs-progs/tree/squota-progs fstests: https://github.com/boryas/fstests/tree/squota-test
I ran '-g auto' on fstests on the following configurations: 1a) baseline kernel/progs/fstests. 1b) squota kernel baseline progs/fstests. 2a) baseline kernel/progs/fstests. fstests configured to mkfs with quota 2b) squota kernel/progs/fstests. fstests configured to mkfs with squota
I compared 1a against 1b and 2a against 2b and detected no regressions. 2a/2b both exhibit regressions against 1a/1b which are largely issues with quota reservations in various complicated cases. I intend to run those down in the future, but they are not simple quota specific, as they are already broken with plain qgroups.
==Performance Testing== I measured the performance of the change using fsperf. I ran with 3 configurations using the squota kernel:
- plain mkfs
- qgroup mkfs
- squota mkfs And added a new performance test which creates 1000 files in a subvol, creates 100 snapshots of that subvol, then unshares extents in files in the snapshots. I measured write performance with fio and btrfs commit critical section performance side effects with bpftrace on 'wait_current_trans'.
The results for the test which measures unshare perf (unshare.py) with qgroup and squota compared to the baseline:
group test results unshare results metric baseline current stdev diff ======================================================================================== avg_commit_ms 162.13 285.75 3.14 76.24% bg_count 16 16 0 0.00% commits 378.20 379 1.92 0.21% elapsed 201.40 270.40 1.34 34.26% end_state_mount_ns 26036211.60 26004593.60 2281065.40 -0.12% end_state_umount_ns 2.45e+09 2.55e+09 20740154.41 3.93% max_commit_ms 425.80 594 53.34 39.50% sys_cpu 0.10 0.06 0.06 -42.15% wait_current_trans_calls 2945.60 3405.20 47.08 15.60% wait_current_trans_ns_max 1.56e+08 3.43e+08 32659393.25 120.07% wait_current_trans_ns_mean 1974875.35 28588482.55 1557588.84 1347.61% wait_current_trans_ns_min 232 232 25.88 0.00% wait_current_trans_ns_p50 718 740 22.80 3.06% wait_current_trans_ns_p95 7711770.20 2.21e+08 17241032.09 2761.19% wait_current_trans_ns_p99 67744932.29 2.68e+08 41275815.87 295.16% write_bw_bytes 653008.80 486344.40 4209.91 -25.52% write_clat_ns_mean 6251404.78 8406837.89 39779.15 34.48% write_clat_ns_p50 1656422.40 1643315.20 27415.68 -0.79% write_clat_ns_p99 1.90e+08 3.20e+08 2097152 68.62% write_io_kbytes 128000 128000 0 0.00% write_iops 159.43 118.74 1.03 -25.52% write_lat_ns_max 7.06e+08 9.80e+08 47324816.61 38.88% write_lat_ns_mean 6251503.06 8406936.06 39780.83 34.48% write_lat_ns_min 3354 4648 616.06 38.58%
squota test results unshare results metric baseline current stdev diff ======================================================================================== avg_commit_ms 162.13 164.16 3.14 1.25% bg_count 16 0 0 -100.00% commits 378.20 380.80 1.92 0.69% elapsed 201.40 208.20 1.34 3.38% end_state_mount_ns 26036211.60 25840729.60 2281065.40 -0.75% end_state_umount_ns 2.45e+09 3.01e+09 20740154.41 22.80% max_commit_ms 425.80 415.80 53.34 -2.35% sys_cpu 0.10 0.08 0.06 -23.36% wait_current_trans_calls 2945.60 2981.60 47.08 1.22% wait_current_trans_ns_max 1.56e+08 1.12e+08 32659393.25 -27.86% wait_current_trans_ns_mean 1974875.35 1064734.76 1557588.84 -46.09% wait_current_trans_ns_min 232 238 25.88 2.59% wait_current_trans_ns_p50 718 746 22.80 3.90% wait_current_trans_ns_p95 7711770.20 1567.60 17241032.09 -99.98% wait_current_trans_ns_p99 67744932.29 49880514.27 41275815.87 -26.37% write_bw_bytes 653008.80 631256 4209.91 -3.33% write_clat_ns_mean 6251404.78 6476816.06 39779.15 3.61% write_clat_ns_p50 1656422.40 1581056 27415.68 -4.55% write_clat_ns_p99 1.90e+08 1.94e+08 2097152 2.21% write_io_kbytes 128000 128000 0 0.00% write_iops 159.43 154.12 1.03 -3.33% write_lat_ns_max 7.06e+08 7.65e+08 47324816.61 8.38% write_lat_ns_mean 6251503.06 6476912.76 39780.83 3.61% write_lat_ns_min 3354 4062 616.06 21.11%
And the same, but only showing results where the deviation was outside of a 95% confidence interval for the mean (default significance highlighting in fsperf): qgroup test results unshare results metric baseline current stdev diff ======================================================================================== avg_commit_ms 162.13 285.75 3.14 76.24% elapsed 201.40 270.40 1.34 34.26% end_state_umount_ns 2.45e+09 2.55e+09 20740154.41 3.93% max_commit_ms 425.80 594 53.34 39.50% wait_current_trans_calls 2945.60 3405.20 47.08 15.60% wait_current_trans_ns_max 1.56e+08 3.43e+08 32659393.25 120.07% wait_current_trans_ns_mean 1974875.35 28588482.55 1557588.84 1347.61% wait_current_trans_ns_p95 7711770.20 2.21e+08 17241032.09 2761.19% wait_current_trans_ns_p99 67744932.29 2.68e+08 41275815.87 295.16% write_bw_bytes 653008.80 486344.40 4209.91 -25.52% write_clat_ns_mean 6251404.78 8406837.89 39779.15 34.48% write_clat_ns_p99 1.90e+08 3.20e+08 2097152 68.62% write_iops 159.43 118.74 1.03 -25.52% write_lat_ns_max 7.06e+08 9.80e+08 47324816.61 38.88% write_lat_ns_mean 6251503.06 8406936.06 39780.83 34.48% write_lat_ns_min 3354 4648 616.06 38.58%
squota test results unshare results metric baseline current stdev diff ======================================================================================== elapsed 201.40 208.20 1.34 3.38% end_state_umount_ns 2.45e+09 3.01e+09 20740154.41 22.80% write_bw_bytes 653008.80 631256 4209.91 -3.33% write_clat_ns_mean 6251404.78 6476816.06 39779.15 3.61% write_clat_ns_p50 1656422.40 1581056 27415.68 -4.55% write_clat_ns_p99 1.90e+08 1.94e+08 2097152 2.21% write_iops 159.43 154.12 1.03 -3.33% write_lat_ns_mean 6251503.06 6476912.76 39780.83 3.61%
Particularly noteworthy are the massive regressions to wait_current_trans in qgroup mode as well as the solid regressions to bandwidth, iops and write latency. The regressions/improvements in squotas are modest in comparison in line with the expectation. I am still investigating the squota umount regression, particularly whether it is in the umount's final commit and represents a real performance problem with squotas.
Link: https://github.com/boryas/btrfs-progs/tree/squota-progs Link: https://github.com/boryas/fstests/tree/squota-test Link: https://github.com/boryas/fsperf/tree/unshare-victim ```
- [GIT PULL] Btrfs updates for 6.5
Hi,
there are mainly core changes, refactoring and optimizations. Performance is improved in some areas, overall there may be a cumulative improvement due to refactoring that removed lookups in the IO path or simplified IO submission tracking.
No merge conflicts. Please pull, thanks.
Core:
-
submit IO synchronously for fast checksums (crc32c and xxhash), remove high priority worker kthread
-
read extent buffer in one go, simplify IO tracking, bio submission and locking
-
remove additional tracking of redirtied extent buffers, originally added for zoned mode but actually not needed
-
track ordered extent pointer in bio to avoid rbtree lookups during IO
-
scrub, use recovered data stripes as cache to avoid unnecessary read
-
in zoned mode, optimize logical to physical mappings of extents
-
remove PageError handling, not set by VFS nor writeback
-
cleanups, refactoring, better structure packing
-
lots of error handling improvements
-
more assertions, lockdep annotations
-
print assertion failure with the exact line where it happens
-
tracepoint updates
-
more debugging prints
Performance:
-
speedup in fsync(), better tracking of inode logged status can avoid transaction commit
-
IO path structures track logical offsets in data structures and does not need to look it up
User visible changes:
-
don't commit transaction for every created subvolume, this can reduce time when many subvolumes are created in a batch
-
print affected files when relocation fails
-
trigger orphan file cleanup during START_SYNC ioctl
Notable fixes:
-
fix crash when disabling quota and relocation
-
fix crashes when removing roots from drity list
-
fix transacion abort during relocation when converting from newer profiles not covered by fallback
-
in zoned mode, stop reclaiming block groups if filesystem becomes read-only
-
fix rare race condition in tree mod log rewind that can miss some btree node slots
-
with enabled fsverity, drop up-to-date page bit in case the verification fails
-
- Btrfs progs release 6.3.2
Changelog:
- build: fix mkfs on big endian hosts
- mkfs: don't print changed defaults notice with --quiet
- scrub: fix wrong stats of processed bytes in background and foreground mode
- convert: actually create free-space-tree instead of v1 space cache
- print-tree: recognize and print CHANGING_FSID_V2 flag (for the metadata_uuid change in progress)
- other:
- documentation updates