Rework continuous-improve: always enrich, add deep scan, strengthen web research

This commit is contained in:
daniel
2026-02-22 22:14:18 +00:00
parent 1c8a67349f
commit a8454cf9c4
7 changed files with 50 additions and 28 deletions

View File

@@ -1,16 +1,15 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# continuous-improve.sh — Entity-by-entity continuous improvement loop # continuous-improve.sh — Continuous enrichment and quality loop
# #
# Iterates through every factbase entity, one at a time. Does mechanical # Each cycle: processes every entity (resolve reviews, enrich from outside
# cleanup in bash (fast), then only invokes an agent for entities that # sources), then runs a deep cross-document validation scan.
# actually need review resolution or enrichment.
# #
# Usage: .automate/continuous-improve.sh [options] # Usage: .automate/continuous-improve.sh [options]
# --priority reviews|stale|random Queue ordering (default: reviews) # --priority reviews|stale|random Queue ordering (default: reviews)
# --cycle-delay N Seconds between entities (default: 5) # --cycle-delay N Seconds between entities (default: 5)
# --model MODEL LLM model (default: claude-sonnet-4.6) # --model MODEL LLM model (default: claude-sonnet-4.6)
# --start N Skip first N entities in queue (resume) # --start N Skip first N entities in queue (resume)
# --no-skip Don't skip clean entities (force agent on all) # --skip-unchanged Skip entities unchanged since last pass
set -euo pipefail set -euo pipefail
@@ -19,15 +18,15 @@ PRIORITY="reviews"
CYCLE_DELAY=5 CYCLE_DELAY=5
MODEL="claude-sonnet-4.6" MODEL="claude-sonnet-4.6"
START_AT=0 START_AT=0
SKIP_CLEAN=true SKIP_UNCHANGED=false
while [[ $# -gt 0 ]]; do while [[ $# -gt 0 ]]; do
case "$1" in case "$1" in
--priority) PRIORITY="$2"; shift 2 ;; --priority) PRIORITY="$2"; shift 2 ;;
--cycle-delay) CYCLE_DELAY="$2"; shift 2 ;; --cycle-delay) CYCLE_DELAY="$2"; shift 2 ;;
--model) MODEL="$2"; shift 2 ;; --model) MODEL="$2"; shift 2 ;;
--start) START_AT="$2"; shift 2 ;; --start) START_AT="$2"; shift 2 ;;
--no-skip) SKIP_CLEAN=false; shift ;; --skip-unchanged) SKIP_UNCHANGED=true; shift ;;
*) echo "Usage: $0 [--priority reviews|stale|random] [--cycle-delay N] [--model MODEL] [--start N] [--no-skip]"; exit 1 ;; *) echo "Usage: $0 [--priority reviews|stale|random] [--cycle-delay N] [--model MODEL] [--start N] [--skip-unchanged]"; exit 1 ;;
esac esac
done done
@@ -90,7 +89,7 @@ build_queue() {
local garbage_count local garbage_count
garbage_count=$(grep -ciP '^\[\^.*\b(not a conflict|sequential|boundary overlap|not simultaneous|malformed tag|garbled|artifact|remove)\b' "$file" 2>/dev/null) || true garbage_count=$(grep -ciP '^\[\^.*\b(not a conflict|sequential|boundary overlap|not simultaneous|malformed tag|garbled|artifact|remove)\b' "$file" 2>/dev/null) || true
# Flag person docs with incomplete names (single word, alias, no space) # Flag ruler docs with incomplete names (single word, alias, no space)
local incomplete_name=0 local incomplete_name=0
local parent_dir local parent_dir
parent_dir=$(echo "$file" | sed 's|^\./||' | rev | cut -d/ -f2 | rev) parent_dir=$(echo "$file" | sed 's|^\./||' | rev | cut -d/ -f2 | rev)
@@ -260,11 +259,15 @@ STEPS — work through in order, skip any that do not apply:
tool to rename/move it. Use organize(action='"'"'move'"'"', doc_id=..., to=...) to relocate tool to rename/move it. Use organize(action='"'"'move'"'"', doc_id=..., to=...) to relocate
or update_document(id=..., title=...) to fix the title. or update_document(id=..., title=...) to fix the title.
3. ENRICH: 3. ENRICH FROM OUTSIDE SOURCES:
Search ALL your available tools for new information about this entity — factbase search, This is the most important step. Use web_search to find high-quality information about
web search, whatever you have. Use the entity name, aliases, this entity from scholarly and encyclopedic sources. Search for:
and known associations as search terms. Add any new facts not already present, following - The entity name + "archaeology" or "ancient history"
factbase authoring conventions. - Key events, dates, or relationships mentioned in the document
- Recent archaeological discoveries or revised scholarly consensus
Prefer peer-reviewed sources, university publications, museum databases, and established
encyclopedias. Add any new facts not already present, with source citations, following
factbase authoring conventions. Do NOT add speculative or poorly-sourced claims.
4. IMPROVEMENT IDEAS: 4. IMPROVEMENT IDEAS:
If you notice friction or gaps in factbase tools, file a Vikunja feature request: If you notice friction or gaps in factbase tools, file a Vikunja feature request:
@@ -329,18 +332,16 @@ process_entity() {
fi fi
# Phase 2: Decide if agent is needed # Phase 2: Decide if agent is needed
local needs_agent=false local needs_agent=true
if [[ "${incomplete_name:-0}" -eq 1 ]]; then if [[ "${incomplete_name:-0}" -eq 1 ]]; then
needs_agent=true
log " 👤 Incomplete name (ruler doc) → agent needed to resolve identity" log " 👤 Incomplete name (ruler doc) → agent needed to resolve identity"
elif [[ "$review_count" -gt 0 ]]; then elif [[ "$review_count" -gt 0 ]]; then
needs_agent=true
log " 📋 $review_count review questions → agent needed" log " 📋 $review_count review questions → agent needed"
elif [[ "$SKIP_CLEAN" == true && "$last_processed" -gt 0 && "$mtime" -le "$last_processed" ]]; then elif [[ "$SKIP_UNCHANGED" == true && "$last_processed" -gt 0 && "$mtime" -le "$last_processed" ]]; then
log " ⏭️ No questions, not modified since last pass → skipping agent" needs_agent=false
log " ⏭️ No questions, not modified since last pass → skipping (--skip-unchanged)"
else else
needs_agent=true log " 🔍 Enrichment + review pass"
log " 🔍 Enrichment pass → agent needed"
fi fi
if [[ "$needs_agent" == true ]]; then if [[ "$needs_agent" == true ]]; then
@@ -378,10 +379,26 @@ process_entity() {
[[ "$status" == "UPDATED" ]] && return 0 || return 1 [[ "$status" == "UPDATED" ]] && return 0 || return 1
} }
# ═══════════════════════════════════════════
# DEEP CROSS-DOCUMENT SCAN (once per cycle)
# ═══════════════════════════════════════════
run_deep_scan() {
log "🔬 Running deep cross-document validation scan..."
local output
output=$(kiro-cli chat --trust-all-tools --no-interactive --model "$MODEL" \
"Run check_repository with deep_check=true. Review any new issues found — answer what you can, defer what you cannot. Then commit." 2>&1) || {
log "❌ Deep scan agent failed, continuing..."
return 1
}
echo "$output"
do_commit "deep scan: cross-document validation"
log "✅ Deep scan complete"
}
# ═══════════════════════════════════════════ # ═══════════════════════════════════════════
# MAIN LOOP # MAIN LOOP
# ═══════════════════════════════════════════ # ═══════════════════════════════════════════
log "🚀 Starting continuous improvement loop (priority=$PRIORITY, model=$MODEL, start=$START_AT, skip_clean=$SKIP_CLEAN)" log "🚀 Starting continuous improvement loop (priority=$PRIORITY, model=$MODEL, start=$START_AT, skip_unchanged=$SKIP_UNCHANGED)"
log "Docs dir: $DOCS_DIR" log "Docs dir: $DOCS_DIR"
log "State file: $STATE_FILE" log "State file: $STATE_FILE"
log "Press Ctrl+C to stop" log "Press Ctrl+C to stop"
@@ -424,6 +441,8 @@ while true; do
log "" log ""
log "═══ Pass $PASS complete: $PROCESSED processed, $UPDATED updated ═══" log "═══ Pass $PASS complete: $PROCESSED processed, $UPDATED updated ═══"
run_deep_scan
START_AT=0 START_AT=0
log "Looping back to start..." log "Looping back to start..."
sleep "$CYCLE_DELAY" sleep "$CYCLE_DELAY"

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -26,12 +26,13 @@ The Battle of Adrianople (378 CE) was a catastrophic Roman defeat in which the V
--- ---
[^1]: Ammianus Marcellinus, *Res Gestae* 31.1213 [^1]: Ammianus Marcellinus, *Res Gestae* 31.1213
[^2]: Burns, T.S. *Barbarians Within the Gates of Rome* (1994) [^2]: Burns, T.S. *Barbarians Within the Gates of Rome* (1994)---
---
## Review Queue ## Review Queue
<!-- factbase:review --> <!-- factbase:review -->
- [x] `@q[missing]` Line 10: "Date: 9 August 378 CE @t[=0378]" - what is the source?
> Well-established historical date from Ammianus Marcellinus, Res Gestae (Book 31), the primary contemporary source for the battle. Also corroborated by later sources including Orosius and Zosimus.
- [x] `@q[temporal]` Line 11: "Location: Adrianople (modern Edirne, Turkey)" - when was this true? - [x] `@q[temporal]` Line 11: "Location: Adrianople (modern Edirne, Turkey)" - when was this true?
> Static historical fact. Battle occurred 9 August 378 CE. No temporal tag needed. > Static historical fact. Battle occurred 9 August 378 CE. No temporal tag needed.
- [x] `@q[temporal]` Line 12: "Belligerents: Eastern Roman Empire vs. Visigoths" - when was this true? - [x] `@q[temporal]` Line 12: "Belligerents: Eastern Roman Empire vs. Visigoths" - when was this true?

View File

@@ -27,12 +27,13 @@ Pompeii was a Roman city near modern Naples, Italy, buried by the eruption of Mo
--- ---
[^1]: Beard, M. *Pompeii: The Life of a Roman Town* (2008) [^1]: Beard, M. *Pompeii: The Life of a Roman Town* (2008)
[^2]: Wallace-Hadrill, A. *Houses and Society in Pompeii and Herculaneum* (1994) [^2]: Wallace-Hadrill, A. *Houses and Society in Pompeii and Herculaneum* (1994)---
---
## Review Queue ## Review Queue
<!-- factbase:review --> <!-- factbase:review -->
- [x] `@q[missing]` Line 11: "Destroyed: 24 August 79 CE (eruption of Vesuvius) @t[=0079]" - what is the source?
> Well-established historical date from Pliny the Younger's letters to Tacitus (Epistulae VI.16 and VI.20), the primary eyewitness account. The traditional date of 24 August 79 CE comes from the manuscript tradition of Pliny's letters, though some recent archaeological evidence (e.g. coins, seasonal food remains) suggests the eruption may have occurred in October 79 CE.
- [x] `@q[temporal]` Line 10: "Location: Near modern Naples, Campania, Italy" - when was this true? - [x] `@q[temporal]` Line 10: "Location: Near modern Naples, Campania, Italy" - when was this true?
> Static historical fact. No temporal tag needed. > Static historical fact. No temporal tag needed.
- [x] `@q[temporal]` Line 12: "Population at destruction: ~11,00020,000" - when was this true? - [x] `@q[temporal]` Line 12: "Population at destruction: ~11,00020,000" - when was this true?

View File

@@ -30,12 +30,13 @@ The Roman road network was one of the greatest engineering achievements of the a
--- ---
[^1]: Laurence, R. *The Roads of Roman Italy* (Routledge, 1999) [^1]: Laurence, R. *The Roads of Roman Italy* (Routledge, 1999)
[^2]: Chevallier, R. *Roman Roads* (University of California Press, 1976) [^2]: Chevallier, R. *Roman Roads* (University of California Press, 1976)---
---
## Review Queue ## Review Queue
<!-- factbase:review --> <!-- factbase:review -->
- [x] `@q[conflict]` Line 22: Cross-check with Roman Roads: Via Appia (312 BCE): "Queen of Roads," Rome to Brindisi — Fact 1 states Via Appia went from Rome to Capua, but Fact 4 states it went from Rome to Brindisi. These are contradictory endpoints for the same road.
> Not a conflict. The Via Appia was originally built in 312 BCE from Rome to Capua (line 11), and was later extended to Brindisi/Brundisium (~264 BCE). Both statements are correct for different phases of the road's construction. Line 11 describes the initial construction; line 22 describes the completed route. The document could clarify this by noting the extension, but the facts are not contradictory.
- [x] `@q[temporal]` Line 10: "Total network: ~400,000 km (80,000 km paved)" - when was this true? - [x] `@q[temporal]` Line 10: "Total network: ~400,000 km (80,000 km paved)" - when was this true?
> Static historical fact. No temporal tag needed. > Static historical fact. No temporal tag needed.
- [x] `@q[temporal]` Line 11: "First major road: Via Appia (312 BCE), Rome to Capua" - when was this true? - [x] `@q[temporal]` Line 11: "First major road: Via Appia (312 BCE), Rome to Capua" - when was this true?