ingest

package
v0.6.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 28, 2026 License: MIT Imports: 17 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DetectFormat

func DetectFormat(filename string) (format string, warn bool)

DetectFormat returns the ingestion format based on file extension. Returns "markdown" for .md files, "text" for .txt, and "text" with a warning flag for unknown extensions.

func ExtractDateFromHeader

func ExtractDateFromHeader(header string) string

func FindTyposInMessages

func FindTyposInMessages(messages []db.TextMessage) map[string]string

func NormalizeText

func NormalizeText(text string) string

func RunIngestCLI

func RunIngestCLI(args []string, kisekiDB, ollamaHost, embedModel string)

RunIngestCLI handles the "kiseki ingest" CLI command.

func UpdateTyposFile

func UpdateTyposFile(newTypos map[string]string) error

Types

type ChunkData

type ChunkData struct {
	Text            string
	SourceFile      string
	SectionTitle    string
	HeaderLevel     int
	ParentTitle     string
	SectionSequence int
	ChunkSequence   int
	ChunkTotal      int
	ValidAt         string
}

func ChunkSection

func ChunkSection(section Section, maxWords int) []ChunkData

type IngestResult

type IngestResult struct {
	SectionsFound    int
	ChunksCreated    int
	SubChunksCreated int
	DeletedChunks    int64
}

func IngestContent

func IngestContent(db *sql.DB, embedder ollama.Embedder, content string, sourceName string, validAt string, format string) (IngestResult, error)

IngestContent parses content in the given format and ingests it into the database. sourceName is stored as the source_file for each chunk (typically a file path or label).

func IngestFile

func IngestFile(db *sql.DB, embedder ollama.Embedder, filePath string, validAt string, formats ...string) (IngestResult, error)

type Section

type Section struct {
	Title       string
	HeaderLevel int
	ParentTitle string
	Content     string
	Sequence    int
	ValidAt     string
}

func ParseContent

func ParseContent(content string, sourceName string, format string) []Section

ParseContent dispatches to the appropriate parser based on format. Valid formats: "markdown", "text". Defaults to "markdown" for unknown formats.

func ParseMarkdown

func ParseMarkdown(content string) []Section

func ParsePlainText

func ParsePlainText(content string, sourceName string) []Section

ParsePlainText treats the entire content as a single section. The sourceName (typically the filename) is used as the section title.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL