segment

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 2, 2025 License: BSD-3-Clause Imports: 25 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ComputeFileCRC

func ComputeFileCRC(data []byte, crcOffset int) uint64

ComputeFileCRC computes CRC for a file with the CRC field zeroed.

func ValidateHeader

func ValidateHeader(h *CommonHeader, expectedMagic uint32, expectedVersion uint16) error

ValidateHeader validates a common header.

func VerifyFileCRC

func VerifyFileCRC(data []byte, crcOffset int, expected uint64) bool

VerifyFileCRC verifies the CRC of a file.

func WriteCommonHeader

func WriteCommonHeader(w io.Writer, magic uint32, version uint16) error

WriteCommonHeader writes a common header to a writer.

func WriteLoudsHeader

func WriteLoudsHeader(w io.Writer, h *LoudsHeader) error

WriteLoudsHeader writes a LOUDS header.

Types

type AcceptHeader

type AcceptHeader struct {
	CommonHeader
	NodesCount uint64
	Encoding   uint32 // RRR = 1
	OffBits    uint64
	OffCounts  uint64
	FileCRC32C uint64
}

AcceptHeader is the header for accept state files.

type BloomFilter

type BloomFilter struct {
	Bits uint64  `json:"bits"`
	K    uint32  `json:"k"`
	FPR  float64 `json:"fpr"`
}

BloomFilter configuration.

type BloomHeader

type BloomHeader struct {
	CommonHeader
	Bits       uint64
	K          uint32
	Hash       uint32 // BLAKE3 = 1
	OffBitset  uint64
	FileCRC32C uint64
}

BloomHeader is the header for bloom filter files.

type Builder

type Builder struct {
	// contains filtered or unexported fields
}

Builder creates immutable segment files from sorted key-value pairs.

func NewBuilder

func NewBuilder(segmentID uint64, level int, dir string, logger common.Logger) *Builder

NewBuilder creates a new segment builder.

func (*Builder) Add

func (b *Builder) Add(key, value []byte, tombstone bool) error

Add adds a key-value pair to the builder.

func (*Builder) AddFromMemtable

func (b *Builder) AddFromMemtable(snapshot *memtable.Snapshot) error

AddFromMemtable adds all entries from a memtable snapshot.

func (*Builder) AddFromPairs added in v1.0.7

func (b *Builder) AddFromPairs(keys [][]byte, tombs []bool) error

AddFromPairs adds entries from provided keys and tomb flags (already in-memory), optimized for partitioned flush.

func (*Builder) AddTombstone

func (b *Builder) AddTombstone(key []byte)

AddTombstone records a deletion for the given key (used during compaction/merge).

func (*Builder) AddWithExpiry added in v1.0.7

func (b *Builder) AddWithExpiry(key, value []byte, expiresAtNanos int64, tombstone bool) error

AddWithExpiry adds a key-value pair with an absolute expiry timestamp (unix nanos; 0 = none).

func (*Builder) Build

func (b *Builder) Build() (*Metadata, error)

Build creates the segment files.

func (*Builder) BuildBloomOnly added in v1.0.7

func (b *Builder) BuildBloomOnly(dir string) error

BuildBloomOnly builds only the bloom filter into dir.

func (*Builder) BuildTrigramOnly added in v1.0.7

func (b *Builder) BuildTrigramOnly(dir string) error

BuildTrigramOnly builds only the trigram filter into dir.

func (*Builder) BuildWithContext added in v1.1.5

func (b *Builder) BuildWithContext(ctx context.Context) (*Metadata, error)

BuildWithContext creates the segment files with context support.

func (*Builder) ConfigureBuild added in v1.0.6

func (b *Builder) ConfigureBuild(maxShards, shardMinKeys, bloomAdaptMinKeys int)

ConfigureBuild sets parallelization and adaptive thresholds.

func (*Builder) ConfigureFilters added in v1.0.6

func (b *Builder) ConfigureFilters(bloomFPR float64, prefixLen int, enableTrigram bool)

ConfigureFilters sets filter-related options for this builder.

func (*Builder) DropAllTombstones

func (b *Builder) DropAllTombstones()

DropAllTombstones instructs the builder to omit writing tombstones.dat and to not derive metadata key range from tombstone keys.

func (*Builder) MarkAlreadySorted added in v1.1.0

func (b *Builder) MarkAlreadySorted()

MarkAlreadySorted marks input as sorted to enable fast-path builders.

func (*Builder) SetAutoDisableLOUDSThreshold added in v1.1.0

func (b *Builder) SetAutoDisableLOUDSThreshold(n int)

SetAutoDisableLOUDSThreshold sets minimal key count to auto-skip LOUDS.

func (*Builder) SetDisableLOUDS added in v1.1.0

func (b *Builder) SetDisableLOUDS(skip bool)

SetDisableLOUDS configures whether to skip LOUDS generation during Build.

func (*Builder) SetForceTrie added in v1.1.0

func (b *Builder) SetForceTrie(force bool)

SetForceTrie forces building trie even for sorted inputs (benchmark/debug).

func (*Builder) SetSkipFilters added in v1.0.8

func (b *Builder) SetSkipFilters(skip bool)

SetSkipFilters controls whether Build() should skip generating filter files. When set to true, filters can be generated later via BuildBloomOnly/BuildTrigramOnly.

func (*Builder) SetTrieGCPercent added in v1.1.0

func (b *Builder) SetTrieGCPercent(p int)

SetTrieGCPercent configures a temporary GC percent used only during trie build (0 = unchanged).

type CommonHeader

type CommonHeader struct {
	Magic   uint32
	Version uint16
}

CommonHeader is the common header for all segment files.

func ReadCommonHeader

func ReadCommonHeader(r io.Reader) (*CommonHeader, error)

ReadCommonHeader reads a common header from a reader.

type Counts

type Counts struct {
	Nodes    uint64 `json:"nodes"`
	Edges    uint64 `json:"edges"`
	Tails    uint64 `json:"tails"`
	Accepted uint64 `json:"accepted"`
}

Counts contains element counts in the segment.

type EdgesHeader

type EdgesHeader struct {
	CommonHeader
	TotalEdges          uint64
	LabelsBytes         uint64
	CutsCount           uint64
	TargetsCount        uint64
	NodesCount          uint64
	OffLabelsBlob       uint64
	OffLabelsCutsEF     uint64
	OffTargets          uint64
	OffFirstEdgeEF      uint64
	OffEdgeCnt          uint64
	OffTargetsOffsetsEF uint64
	FileCRC32C          uint64
}

EdgesHeader is the header for edge index files.

type Encodings

type Encodings struct {
	Labels     string `json:"labels"`
	Targets    string `json:"targets"`
	AcceptBits string `json:"acceptBits"`
	IDBits     int    `json:"idBits,omitempty"`
}

Encodings describes encoding methods used.

type Files

type Files struct {
	Louds  string `json:"louds"`
	Edges  string `json:"edges,omitempty"`
	Accept string `json:"accept,omitempty"`
	TMap   string `json:"tmap,omitempty"`
	Tails  string `json:"tails,omitempty"`
	Expiry string `json:"expiry,omitempty"`
}

Files lists all segment files.

type Filters

type Filters struct {
	PrefixBloom *BloomFilter   `json:"prefixBloom,omitempty"`
	Trigram     *TrigramFilter `json:"trigram,omitempty"`
}

Filters describes filter configurations.

type LoudsHeader

type LoudsHeader struct {
	CommonHeader
	TotalNodes          uint64
	BitsLen             uint64
	RankSuperStrideBits uint32
	RankBlockStrideBits uint32
	Select1Sample       uint32
	Select0Sample       uint32
	OffBV               uint64
	OffRank             uint64
	OffSelect1          uint64
	OffSelect0          uint64
	FileCRC32C          uint64
}

LoudsHeader is the header for LOUDS index files.

func ReadLoudsHeader

func ReadLoudsHeader(r io.Reader) (*LoudsHeader, error)

ReadLoudsHeader reads a LOUDS header.

type Metadata

type Metadata struct {
	Format        string            `json:"format"`
	Version       string            `json:"version"`
	SegmentID     uint64            `json:"segmentID"`
	Level         int               `json:"level"`
	MinKeyHex     string            `json:"minKeyHex"`
	MaxKeyHex     string            `json:"maxKeyHex"`
	Counts        Counts            `json:"counts"`
	Encodings     Encodings         `json:"encodings"`
	Filters       Filters           `json:"filters"`
	Files         Files             `json:"files"`
	CreatedAtUnix int64             `json:"createdAtUnix"`
	Parents       []uint64          `json:"parents,omitempty"`
	Blake3        map[string]string `json:"blake3,omitempty"`
}

Metadata represents segment metadata stored in segment.json

func LoadFromFile

func LoadFromFile(path string) (*Metadata, error)

LoadFromFile loads metadata from a JSON file.

func NewMetadata

func NewMetadata(segmentID uint64, level int) *Metadata

NewMetadata creates a new segment metadata.

func (*Metadata) GetMaxKey

func (m *Metadata) GetMaxKey() ([]byte, error)

GetMaxKey returns the maximum key as bytes.

func (*Metadata) GetMinKey

func (m *Metadata) GetMinKey() ([]byte, error)

GetMinKey returns the minimum key as bytes.

func (*Metadata) SaveToFile

func (m *Metadata) SaveToFile(path string) error

SaveToFile saves metadata to a JSON file.

func (*Metadata) SetKeyRange

func (m *Metadata) SetKeyRange(minKey, maxKey []byte)

SetKeyRange sets the min and max keys.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader provides read access to an immutable segment.

func NewReader

func NewReader(segmentID uint64, dir string, logger common.Logger, verifyChecksums bool) (*Reader, error)

NewReader creates a new segment reader.

func (*Reader) AllKeys

func (r *Reader) AllKeys() [][]byte

AllKeys returns the list of keys stored in this segment (from keys.dat).

func (*Reader) Bloom added in v1.0.7

func (r *Reader) Bloom() *filters.BloomFilter

Bloom returns the loaded bloom filter (may be nil if not present).

func (*Reader) Close

func (r *Reader) Close() error

Close closes the segment reader.

func (*Reader) Contains

func (r *Reader) Contains(key []byte) bool

Contains checks if a key exists in the segment.

func (*Reader) DecRef added in v1.0.7

func (r *Reader) DecRef()

DecRef decrements the reference count.

func (*Reader) Get

func (r *Reader) Get(key []byte) ([]byte, bool)

Get retrieves a value by key. Note: Currently uses binary search on keys array instead of LOUDS for reliability.

func (*Reader) GetMetadata

func (r *Reader) GetMetadata() *Metadata

GetMetadata returns the segment metadata.

func (*Reader) GetSegmentID

func (r *Reader) GetSegmentID() uint64

GetSegmentID returns the segment ID.

func (*Reader) HasTombstone

func (r *Reader) HasTombstone(key []byte) bool

HasTombstone reports if key is tombstoned in this segment.

func (*Reader) HasTombstoneExact added in v1.1.0

func (r *Reader) HasTombstoneExact(ctx context.Context, key []byte) bool

HasTombstoneExact checks if the exact key is tombstoned, without preloading the full set. It streams tombstones and stops early on match or cancellation.

func (*Reader) IncRef added in v1.0.7

func (r *Reader) IncRef() bool

IncRef increments the reader's reference count. Returns false if the reader is already closed.

func (*Reader) IterateKeys

func (r *Reader) IterateKeys(fn func(k []byte) bool) (int, error)

IterateKeys streams keys from keys.dat sequentially without loading all into memory. It returns the number of keys iterated and the first error encountered.

func (*Reader) IterateTombstones

func (r *Reader) IterateTombstones(fn func(k []byte) bool) (int, error)

IterateTombstones streams tombstone keys from tombstones.dat without building the full in-memory set. It returns the number of tombstones iterated and the first error encountered.

func (*Reader) Iterator

func (r *Reader) Iterator() *SegmentIterator

Iterator returns an iterator for the segment.

func (*Reader) MayContain

func (r *Reader) MayContain(pattern *regexp.Regexp) bool

MayContain returns true if the segment may contain matches for the regex. Uses Bloom filter for literal prefix pruning, then trigram filter; otherwise returns true.

func (*Reader) MinKey added in v1.0.7

func (r *Reader) MinKey() ([]byte, bool)

MinKey returns a copy of the minimum key from metadata if available.

func (*Reader) RegexIterator

func (r *Reader) RegexIterator(pattern *regexp.Regexp) *SegmentIterator

RegexIterator returns an iterator for regex matching.

func (*Reader) Release added in v1.0.7

func (r *Reader) Release()

Release decrements refcount and closes resources when it reaches zero. Uses atomic compare-and-swap to prevent race conditions.

func (*Reader) StreamKeys

func (r *Reader) StreamKeys() (advance func() ([]byte, bool), closeFn func())

StreamKeys provides a streaming closure over keys.dat for k-way merge. Returns an advance function and a close function to release resources early. IMPORTANT: Caller MUST call closeFn when done, preferably with defer.

func (*Reader) StreamTombstones

func (r *Reader) StreamTombstones() (advance func() ([]byte, bool), closeFn func())

StreamTombstones provides a streaming closure over tombstones.dat. Returns an advance function and a close function to release resources early. IMPORTANT: Caller MUST call closeFn when done, preferably with defer.

func (*Reader) Tombstones

func (r *Reader) Tombstones() [][]byte

Tombstones returns the list of tombstoned keys recorded in this segment.

func (*Reader) Trigram added in v1.0.7

func (r *Reader) Trigram() *filters.TrigramFilter

Trigram returns the loaded trigram filter (may be nil if not present).

type SegmentIterator

type SegmentIterator struct {
	// contains filtered or unexported fields
}

SegmentIterator iterates over keys in a segment.

func (*SegmentIterator) Close

func (it *SegmentIterator) Close() error

Close closes the iterator. Close closes the iterator and any streaming resources.

func (*SegmentIterator) Error

func (it *SegmentIterator) Error() error

Error returns any error encountered during iteration.

func (*SegmentIterator) Key

func (it *SegmentIterator) Key() []byte

Key returns the current key.

func (*SegmentIterator) Next

func (it *SegmentIterator) Next() bool

Next advances to the next matching key.

func (*SegmentIterator) Value

func (it *SegmentIterator) Value() []byte

Value returns the current value.

type TMapHeader

type TMapHeader struct {
	CommonHeader
	TailNodesCount uint64
	OffTailNodesEF uint64
	OffTailIDs     uint64
	FileCRC32C     uint64
}

TMapHeader is the header for tail mapping files.

type TailsHeader

type TailsHeader struct {
	CommonHeader
	BlockSize      uint32
	TailsCount     uint64
	BlocksCount    uint64
	OffBlocksIndex uint64
	OffMap         uint64
	OffFrames      uint64
	FileCRC32C     uint64
}

TailsHeader is the header for tails data files.

type TrigramFilter

type TrigramFilter struct {
	Bits   uint64 `json:"bits"`
	Scheme string `json:"scheme"`
}

TrigramFilter configuration.

type TrigramHeader

type TrigramHeader struct {
	CommonHeader
	Scheme     uint32
	Bits       uint64
	OffBitset  uint64
	FileCRC32C uint64
}

TrigramHeader is the header for trigram filter files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL