DKSplit-go
⚠️ Security Notice: The only official repositories for this project are ABTdomain/dksplit (Python) and ABTdomain/dksplit-go (Go). We are aware of unauthorized clones that may distribute suspicious files. Please only download from our official repositories.
Go implementation of DKSplit - fast word segmentation for text without spaces.
Built with BiLSTM-CRF model and ONNX Runtime.
| CPU |
Mode |
QPS |
| Intel Core i9-14900K |
Single |
~1,700/s |
| Intel Core i9-14900K |
Batch |
~7,000/s |
| Intel Core i9-9900K |
Single |
~1,000/s |
| Intel Core i9-9900K |
Batch |
~3,000/s |
Batch mode is 4.6x faster than single mode.
Compared to Python version:
- Single: 2.7x faster
- Batch: 5.6x faster
Install
go get github.com/ABTdomain/dksplit-go
Usage
package main
import (
"fmt"
"log"
dksplit "github.com/ABTdomain/dksplit-go"
)
func main() {
splitter, err := dksplit.New("models")
if err != nil {
log.Fatal(err)
}
defer splitter.Close()
// Single
result, _ := splitter.Split("chatgptlogin")
fmt.Println(result)
// Output: [chatgpt login]
// Batch
results, _ := splitter.SplitBatch([]string{"openaikey", "microsoftoffice"}, 256)
fmt.Println(results)
// Output: [[openai key] [microsoft office]]
}
Examples
| Input |
Output |
| chatgptlogin |
chatgpt login |
| kubernetescluster |
kubernetes cluster |
| microsoftoffice |
microsoft office |
| mercibeaucoup |
merci beaucoup |
| gutenmorgen |
guten morgen |
Real World Benchmark
Tested on Majestic Million domains:
| Input |
Output |
| amitriptylineinfo |
amitriptyline info |
| autoriteprotectiondonnees |
autorite protection donnees |
| mountaingoatsoftware |
mountain goat software |
| psychologytoday |
psychology today |
| affordablecollegesonline |
affordable colleges online |
| stephenwolfram |
stephen wolfram |
| ralphlauren |
ralphlauren |
| m12ivermectin |
m12i vermectin |
Run benchmark yourself:
wget https://downloads.majestic.com/majestic_million.csv -O top-1m.csv
go test -v -run TestRealWorldBenchmark
Accuracy Benchmark
For detailed accuracy benchmarks on 1,000 real newly registered domains (DKSplit vs WordSegment vs WordNinja vs GPT-5.2), see the Python version benchmark.
The Go and Python versions use the same model and produce identical results.
Results on Intel Core i9-9900K:
- Dataset: 10,000 unique domains (length > 10, no hyphens)
- QPS: 3,175/s
Requirements
Links
Support
If you find this useful:
License
This project is licensed under the Apache License 2.0.
Please attribute as: DKsplit by ABTdomain