mapreduce

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 25, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

README

mrkit-go

github build status Go Reference

mrkit-go is a lightweight MapReduce framework implemented in Go, ready to use out of the box, with config-driven batch flows for common big-data databases.

Compared with Hadoop-style MapReduce stacks, mrkit-go emphasizes simpler deployment, lower operational overhead on Go service hosts, and faster iteration for small-to-medium batch pipelines. The project is still actively evolving.

Primary path for new users:

  • define source / transform / sink in JSON
  • run ./cmd/batch with that config
  • support MySQL/Redis source and sink combinations

This path deploys MySQL + Redis automatically (Docker), creates required DBs, prepares synthetic data, runs all core flows (seed/m2m/m2r/r2m/r2r), and validates results.

Environment Check (one line):

command -v go docker jq bash awk python3 >/dev/null && echo "env ok" || echo "missing deps"
chmod +x scripts/quickstart.sh
./scripts/quickstart.sh

Expected final output includes:

  • all checks passed
  • MySQL tables row counts and Redis key counts equal to KEY_MOD (default 100)

Default quickstart service ports:

  • MySQL: 127.0.0.1:13306
  • Redis: 127.0.0.1:16379

Useful overrides:

KEY_MOD=200 ROWS=10000 ./scripts/quickstart.sh
MYSQL_PORT=23306 REDIS_PORT=26379 ./scripts/quickstart.sh
GO_BIN=/path/to/go ./scripts/quickstart.sh

Stop quickstart services:

docker rm -f mrkit-quickstart-mysql mrkit-quickstart-redis

Quickstart (Manual, Config-Driven)

If you already have MySQL/Redis and want to run commands manually, follow:

Quickstart (Docker)

Build local image:

docker build -t mrkit-go-batch:local .

Validate config in container:

docker run --rm \
  -v "$(pwd)/example/batch-minimal/flows:/app/flows:ro" \
  mrkit-go-batch:local \
  -check -config /app/flows/smoke/flow.mysql.count.json

Run flow in container (use host.docker.internal for local DB access):

mkdir -p /tmp/mrkit-docker-flow
jq '.source.db.host="host.docker.internal" |
    .source.db.port=13306 |
    .source.db.database="mr_source" |
    .sink.db.host="host.docker.internal" |
    .sink.db.port=13306 |
    .transform.port=22111' \
  example/batch-minimal/flows/smoke/flow.mysql.count.json > /tmp/mrkit-docker-flow/m2m.json

docker run --rm \
  -v "/tmp/mrkit-docker-flow/m2m.json:/app/flow.json:ro" \
  mrkit-go-batch:local \
  -config /app/flow.json

Note: -config uses values from JSON, not MYSQL_* env vars. For full Docker reproducible flow (seed + m2m + m2r + r2m + r2r), use docs/repro-checklist.md.

Performance note: use go run for development checks, and prebuilt binaries (go build then run) for production/performance benchmarking.

Documentation Map

Runtime Observability (M3)

master now exposes a Prometheus endpoint at:

  • default: http://127.0.0.1:<master_port+1000>/metrics (example: master :10000 -> metrics :11000)
  • override: set MR_METRICS_ADDR (example :2112 or 0.0.0.0:2112)

Exposed core metrics:

  • task_total
  • task_retry_total
  • worker_alive
  • stage_duration_seconds

Minimal dashboard JSON:

Example charts:

Throughput chart Worker alive chart

Contributions

Pull requests are always welcome.

Created and improved by Yi-fei Gao. All code is licensed under the Apache License 2.0.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var MasterIP string = ":10000"
View Source
var WorkerAdvertiseHost string

Functions

func ParseArg

func ParseArg() ([]string, string, int, int, bool)

func StartMaster

func StartMaster(input []string, plugin string, nReducer int, nWorker int, inRAM bool)

func StartMasterWithAddr added in v1.0.4

func StartMasterWithAddr(input []string, plugin string, nReducer int, nWorker int, inRAM bool, masterAddr string) error

func StartSingleMachineJob

func StartSingleMachineJob(input []string, plugin string, nReducer int, nWorker int, inRAM bool)

func StartSingleMachineJobWithAddr added in v1.0.4

func StartSingleMachineJobWithAddr(input []string, plugin string, nReducer int, nWorker int, inRAM bool, masterAddr string) error

func StartWorker

func StartWorker(input []string, plugin string, nReducer int, nWorker int, storeInRAM bool)

func StartWorkerWithAddr added in v1.0.4

func StartWorkerWithAddr(input []string, plugin string, nReducer int, nWorker int, storeInRAM bool, masterAddr string) error

Types

This section is empty.

Directories

Path Synopsis
cmd
batch command
legacy/main command
legacy/master command
legacy/worker command
mrapps
agg command
count command
crash command
merge command
minmax command
topn command
wc command
runtime

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL