dksplit

package module
v0.0.0-...-65066b5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 11, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

README

DKSplit-go

⚠️ Security Notice: The only official repositories for this project are ABTdomain/dksplit (Python) and ABTdomain/dksplit-go (Go). We are aware of unauthorized clones that may distribute suspicious files. Please only download from our official repositories.

Go implementation of DKSplit - fast word segmentation for text without spaces.

Built with BiLSTM-CRF model and ONNX Runtime.

Performance

CPU Mode QPS
Intel Core i9-14900K Single ~1,700/s
Intel Core i9-14900K Batch ~7,000/s
Intel Core i9-9900K Single ~1,000/s
Intel Core i9-9900K Batch ~3,000/s

Batch mode is 4.6x faster than single mode.

Compared to Python version:

  • Single: 2.7x faster
  • Batch: 5.6x faster

Install

go get github.com/ABTdomain/dksplit-go

Usage

package main

import (
    "fmt"
    "log"

    dksplit "github.com/ABTdomain/dksplit-go"
)

func main() {
    splitter, err := dksplit.New("models")
    if err != nil {
        log.Fatal(err)
    }
    defer splitter.Close()

    // Single
    result, _ := splitter.Split("chatgptlogin")
    fmt.Println(result)
    // Output: [chatgpt login]

    // Batch
    results, _ := splitter.SplitBatch([]string{"openaikey", "microsoftoffice"}, 256)
    fmt.Println(results)
    // Output: [[openai key] [microsoft office]]
}

Examples

Input Output
chatgptlogin chatgpt login
kubernetescluster kubernetes cluster
microsoftoffice microsoft office
mercibeaucoup merci beaucoup
gutenmorgen guten morgen

Real World Benchmark

Tested on Majestic Million domains:

Input Output
amitriptylineinfo amitriptyline info
autoriteprotectiondonnees autorite protection donnees
mountaingoatsoftware mountain goat software
psychologytoday psychology today
affordablecollegesonline affordable colleges online
stephenwolfram stephen wolfram
ralphlauren ralphlauren
m12ivermectin m12i vermectin

Run benchmark yourself:

wget https://downloads.majestic.com/majestic_million.csv -O top-1m.csv
go test -v -run TestRealWorldBenchmark

Accuracy Benchmark

For detailed accuracy benchmarks on 1,000 real newly registered domains (DKSplit vs WordSegment vs WordNinja vs GPT-5.2), see the Python version benchmark.

The Go and Python versions use the same model and produce identical results.

Results on Intel Core i9-9900K:

  • Dataset: 10,000 unique domains (length > 10, no hyphens)
  • QPS: 3,175/s

Requirements

  • Go 1.21+
  • Linux x64

Support

If you find this useful:

License

This project is licensed under the Apache License 2.0.

Please attribute as: DKsplit by ABTdomain

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Splitter

type Splitter struct {
	// contains filtered or unexported fields
}

Splitter is the main word segmentation engine

func New

func New(modelDir string) (*Splitter, error)

New creates a new Splitter instance

func (*Splitter) Close

func (s *Splitter) Close() error

Close releases resources

func (*Splitter) Split

func (s *Splitter) Split(text string) ([]string, error)

Split segments a single string into words

func (*Splitter) SplitBatch

func (s *Splitter) SplitBatch(texts []string, batchSize int) ([][]string, error)

SplitBatch segments multiple strings with length grouping for efficiency

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL