ADR-07: Source Abstraction Interface
🇰🇷 한국어 버전
| Date | Author | Repos |
|---|---|---|
| 2025-12-23 | @KubrickCode | core |
Status: Accepted
Context
Problem Statement
The Scanner needs to parse test files from multiple data sources:
- Local filesystem: Direct access for CLI tools and local development
- Remote Git repositories: Clone and scan for web service integration
- Future sources: S3 buckets, GitLab API, or other storage backends
Original Design Issue
Scanner was tightly coupled to filesystem paths:
// Before: Scanner directly accessed filesystem
func (s *Scanner) Scan(rootPath string) (*ScanResult, error) {
filepath.WalkDir(rootPath, func(path string, ...) {
content, _ := os.ReadFile(path) // Direct filesystem dependency
// ...
})
}Problems:
- Adding remote repository support required modifying Scanner internals
- No clean resource cleanup mechanism for temporary clones
- Difficult to unit test without real filesystem
Requirements
- Source Independence: Scanner should work with any data source without modification
- Resource Management: Clean up temporary resources (cloned repos) reliably
- Security: Prevent path traversal attacks
- Testability: Enable mock sources for unit testing
Decision
Define a Source interface to abstract data source access.
The Scanner receives a Source implementation and reads files through it, regardless of where files physically reside.
type Source interface {
Root() string
Open(ctx context.Context, path string) (io.ReadCloser, error)
Stat(ctx context.Context, path string) (fs.FileInfo, error)
Close() error
}Options Considered
Option A: Source Interface Abstraction (Selected)
Introduce an interface that abstracts file access, with implementations for different source types.
Pros:
- Source-agnostic scanning: Add new sources without Scanner changes
- Clear resource lifecycle:
Close()ensures cleanup (temp directories, connections) - Testability: Mock sources for unit tests
- Extensibility: Future sources (S3, GitLab) fit the same interface
Cons:
- Additional abstraction layer
- GitSource requires full clone (no partial file access)
Option B: Embed Source Logic in Scanner
Scanner handles local and remote access internally.
Pros:
- Simple initial implementation
- No abstraction overhead
Cons:
- SRP violation: Scanner handles both parsing and source access
- Limited extensibility: New source types require Scanner modification
- Testing difficulty: Requires real filesystem or network
Option C: Direct Remote API Access
Use GitHub/GitLab APIs to fetch files on-demand without cloning.
Pros:
- No local clone needed
- Efficient for single-file access
Cons:
- API rate limits: Large repos hit limits quickly
- No directory traversal: APIs don't support efficient tree walking
- Complex authentication: Each provider needs custom implementation
- Inconsistent state: Files may change between requests
Interface Design
Core Interface
type Source interface {
// Root returns the base path for the source.
Root() string
// Open opens a file relative to Root for reading.
// Caller must close the returned ReadCloser.
Open(ctx context.Context, path string) (io.ReadCloser, error)
// Stat returns file metadata relative to Root.
Stat(ctx context.Context, path string) (fs.FileInfo, error)
// Close releases resources held by the source.
// Idempotent: safe to call multiple times.
Close() error
}Design Principles
- Standard Go interfaces: Uses
io.ReadCloserandfs.FileInfofor compatibility - Context support: All I/O operations support cancellation
- Relative paths: All operations use paths relative to
Root() - Caller responsibility: Caller must close both file handles and the source itself
Sentinel Errors
var (
ErrInvalidPath = errors.New("source: invalid path")
ErrGitCloneFailed = errors.New("source: git clone failed")
ErrRepositoryNotFound = errors.New("source: repository not found")
)Implementations
LocalSource
Direct filesystem access with security measures.
type LocalSource struct {
root string
}Features:
- Path validation: Rejects paths escaping root directory
- Symlink safety: Resolves symlinks and validates final path
- No-op Close: No resources to release
Security: resolvePath() prevents directory traversal:
func (s *LocalSource) resolvePath(path string) (string, error) {
fullPath := filepath.Join(s.root, filepath.Clean(path))
// Block path escape via ".."
if !strings.HasPrefix(fullPath, s.root+string(filepath.Separator)) {
return "", ErrInvalidPath
}
// Block symlink escape
resolvedPath, _ := filepath.EvalSymlinks(fullPath)
resolvedRoot, _ := filepath.EvalSymlinks(s.root)
if !strings.HasPrefix(resolvedPath, resolvedRoot+string(filepath.Separator)) {
return "", ErrInvalidPath
}
return fullPath, nil
}GitSource
Remote repository access via shallow clone.
type GitSource struct {
local *LocalSource // Delegates file access
tempDir string
commitSHA string
branch string
}Features:
- Shallow clone:
--depth 1 --single-branchfor minimal transfer - Temporary directory: Auto-cleaned on
Close() - Credential safety: Strips credentials from error messages
- Metadata access: Provides
CommitSHA()andBranch()
Clone Options:
type GitOptions struct {
Branch string // Target branch (default: default branch)
Depth int // Clone depth (default: 1)
Credentials *GitCredentials // Optional authentication
}Delegation Pattern: After cloning, GitSource delegates all file operations to an internal LocalSource:
func (s *GitSource) Open(ctx context.Context, path string) (io.ReadCloser, error) {
return s.local.Open(ctx, path)
}Usage in Scanner
func (s *Scanner) Scan(ctx context.Context, src source.Source) (*ScanResult, error) {
rootPath := src.Root()
// Discovery still walks local filesystem
// (GitSource clones to temp directory first)
testFiles := s.discoverTestFiles(ctx, src)
for _, file := range testFiles {
content, _ := readFileFromSource(ctx, src, file)
// Parse content...
}
return result, nil
}
// Caller is responsible for cleanup
func main() {
src, _ := source.NewGitSource(ctx, repoURL, nil)
defer src.Close() // Removes temp directory
result, _ := scanner.Scan(ctx, src)
}Consequences
Positive
Source Independence
- Scanner works with LocalSource, GitSource, or future implementations
- Adding S3Source or GitLabSource requires no Scanner changes
Clear Resource Management
defer src.Close()pattern ensures cleanup- GitSource removes cloned temp directory
- Prevents resource leaks in long-running services
Improved Testability
- Unit tests can use mock Source implementations
- No real filesystem or network required for Scanner tests
Security by Design
- Path traversal prevention built into LocalSource
- Credential sanitization in GitSource error messages
Negative
Full Clone Required
- GitSource must clone entire repository (shallow, but still full tree)
- Cannot access single file without cloning
- Mitigation: Shallow clone (
--depth 1) minimizes transfer
Directory Discovery Still Local
discoverTestFilesusesfilepath.WalkDiron local filesystem- Works because GitSource clones to temp directory
- Mitigation: Acceptable trade-off; alternatives have worse limitations
Trade-off Summary
| Aspect | Source Interface | Direct Filesystem | Remote API |
|---|---|---|---|
| Extensibility | Excellent | Poor | Moderate |
| Resource safety | Excellent | Poor | Good |
| Testability | Excellent | Poor | Good |
| Performance | Good | Best | Variable |
| Implementation | Moderate | Simple | Complex |
