Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d7b27bf
docs: clarify Fory codec naming guidance
chaokunyang Jun 11, 2026
375750d
docs: add deserialization security model
chaokunyang Jun 11, 2026
241400e
docs: clarify deserialization security classification
chaokunyang Jun 11, 2026
d2f651c
docs: allow bounded skip materialization
chaokunyang Jun 11, 2026
9fb97ff
docs: specify stream byte read policy
chaokunyang Jun 11, 2026
f32bb51
docs: refine stream result allocation policy
chaokunyang Jun 11, 2026
d9d8adc
docs: refine stream byte allocation policy
chaokunyang Jun 11, 2026
8b67fc6
docs: generalize primitive wire array reads
chaokunyang Jun 11, 2026
2be1054
docs: use readability checks for stream arrays
chaokunyang Jun 11, 2026
e654c2e
docs: simplify readability check policy
chaokunyang Jun 11, 2026
8907cc8
docs: align container preallocation checks
chaokunyang Jun 11, 2026
9d817ab
fix: validate deserialization sizes before allocation
chaokunyang Jun 11, 2026
28cd44c
fix: validate readable bytes before allocation
chaokunyang Jun 11, 2026
b106eb9
fix: validate readable bytes before preallocation
chaokunyang Jun 11, 2026
8af8956
style: fix security docs and js lint
chaokunyang Jun 11, 2026
da1d794
perf: keep java read checks on fast paths
chaokunyang Jun 12, 2026
3ed325b
fix: bound stream fill buffer growth
chaokunyang Jun 12, 2026
1a301e7
fix: validate utf16 byte strings
chaokunyang Jun 12, 2026
a2f937c
fix: avoid reserving after pending read errors
chaokunyang Jun 12, 2026
0ad210d
fix: keep Go slice byte checks local
chaokunyang Jun 12, 2026
417ae62
fix: bound Java stream preallocation
chaokunyang Jun 12, 2026
b0bebbd
update .agents/languages/java.md
chaokunyang Jun 12, 2026
6720b79
fix: double grow stream fill buffers
chaokunyang Jun 12, 2026
45caebf
fix: remove stream-backed buffer type check
chaokunyang Jun 12, 2026
3e1b192
fix(java): align stream buffer fast paths
chaokunyang Jun 12, 2026
7cac03d
docs: link deserialization security guidance
chaokunyang Jun 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .agents/languages/java.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ Load this file when changing anything under `java/` or when Java drives a cross-
- Do not add normal-JVM process-global caches keyed by user classes, generated classes, serializer classes, classloaders, or class-bound method handles. Prefer per-runtime state, immutable shared metadata, or build-time-only template data.
- Concrete serializers may opt into sharing only after auditing retained fields. Treat serializers retaining `TypeResolver`, `RefResolver`, mutable scratch buffers, runtime state, or classloader-sensitive state as non-shareable unless that state is externalized.
- Resolver and serializer hot paths should keep the fast-path/null-slow-path shape obvious. Hoist repeated buffer or cache-state access into locals for multi-step operations and keep rebuild/restoration logic cold.
- Do not use `instanceof` in Java hot paths, including per-value, per-field, per-element,
read/write/copy, resolver, serializer, codec, and buffer paths. Choose concrete
implementations during cold setup or code generation, cache final/static-final shape decisions,
or move type checks behind cold one-time dispatch instead.
- Hot-path feature gates that are runtime constants must be `static final` fields read directly in
the branch. Do not hide them behind helper methods such as `jdkInternalFieldAccess()`, because
that obscures branch folding and can leave avoidable call/inlining work in hot serializers.
Expand Down
4 changes: 3 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ This is the entry point for AI guidance in Apache Fory. Read this file first, th
- `.agents/docs-and-formatting.md`: documentation, specification, and markdown rules.
- `.agents/ci-and-pr.md`: CI triage, PR expectations, and commit conventions.
- `.agents/testing/integration-tests.md`: `integration_tests/` prerequisites, regeneration rules, and commands.
- `docs/security/deserialization.md`: security boundaries for untrusted deserialization classification.
- `.agents/languages/java.md`
- `.agents/languages/csharp.md`
- `.agents/languages/cpp.md`
Expand All @@ -27,10 +28,11 @@ This is the entry point for AI guidance in Apache Fory. Read this file first, th
- Preserve architecture. Do not introduce new layers, parallel flows, or public APIs unless explicitly requested; prefer local repair in the existing owner over shared-infra expansion, and stop if a fix conflicts with an ADR, spec, or invariant.
- Respect ownership. Keep logic, state, and helpers in their natural owner, and do not move serializer-local, context-local, runtime-type-local, or protocol-local problems into global utilities.
- Check the spec before implementation. For wire behavior and xlang mapping, use the specs as the source of truth and never copy one runtime's bug into another runtime just to make tests pass.
- For untrusted deserialization, read `docs/security/deserialization.md` before changing allocation, stream filling, skip, reference, metadata, or policy validation behavior. Variable-length deserialization must not allocate or reserve from attacker-declared lengths or counts before the byte owner has proven proportional readable bytes with `checkReadableBytes` or the runtime equivalent.
- Reject semantic hacks. Do not bypass broken semantics by deleting cases, simplifying callers, adding coercion hooks, or using workaround fallbacks; fix the underlying bug and prove it with focused tests.
- Protect hot paths. Avoid per-call allocations, callback objects, result tuples or records, unnecessary runtime branches, and wrapper-class substitutions in hot codec/runtime paths; prefer conditional imports and allocation-free concrete implementations where they fit the language.
- Keep public APIs minimal. Public APIs must match user ownership and mental model, not internal implementation details; generated flows stay type-owned, while manual serializer registration stays explicit.
- Use semantic naming only. Name things after protocol or domain concepts, not history, runtime origin, or workaround style; avoid vague names such as `Internal`, `java_style_*`, `Runtime`, `Session`, `Plan`, or `Binding` when they do not name the real concept. Keep class, method, function, and variable names concise; do not encode the whole scenario or implementation history into one identifier. Never name a class or method with a `Plan` suffix; use the real domain concept instead.
- Use semantic naming only. Name things after protocol or domain concepts, not history, runtime origin, or workaround style; avoid vague names such as `Internal`, `java_style_*`, `Runtime`, `Session`, `Plan`, `Payload`, or `Binding` when they do not name the real concept. Keep class, method, function, and variable names concise; do not encode the whole scenario or implementation history into one identifier. Never name a class or method with a `Plan` suffix; use the real domain concept instead. For Fory codec/read APIs, do not use generic `payload` naming; name the exact owner and data shape, such as bytes, body, frame, field, string, list, map, compressed bytes, or primitive-array encoding.
- Keep one implementation path. Do not keep parallel helpers, serializers, harnesses, wrappers, or registration flows for the same concept; extend the existing owner path instead of inventing another one.
- Follow current scope exactly. The latest explicit user instruction overrides earlier plans, and when scope narrows, remove leaked out-of-scope edits immediately.
- Preserve user corrections. When a user corrects code behavior, ownership, invariants, or review feedback in a way that should prevent repeat mistakes, encode the corrected rule where future agents will see it: prefer the nearest source comment for non-obvious code invariants, or the owning docs/spec for user-visible or protocol behavior. If the correction changes API usage, defaults, generated output, tests, or cross-runtime behavior, update the matching docs, examples, or source comments in the same task so future agents do not repeat the violation. Keep the note concise, English-only, and avoid comments that merely restate obvious code.
Expand Down
8 changes: 7 additions & 1 deletion cpp/fory/meta/meta_string.cc
Original file line number Diff line number Diff line change
Expand Up @@ -273,8 +273,11 @@ MetaStringTable::read_string(Buffer &buffer, const MetaStringDecoder &decoder) {
return Unexpected(std::move(error));
}
(void)hash_code; // hash_code is only used for Java-side caching.
bytes.resize(len);
if (len > 0) {
if (FORY_PREDICT_FALSE(!buffer.ensure_readable(len, error))) {
return Unexpected(std::move(error));
}
bytes.resize(len);
buffer.read_bytes(bytes.data(), len, error);
if (FORY_PREDICT_FALSE(!error.ok())) {
return Unexpected(std::move(error));
Expand All @@ -294,6 +297,9 @@ MetaStringTable::read_string(Buffer &buffer, const MetaStringDecoder &decoder) {
uint8_t enc_byte = static_cast<uint8_t>(enc_byte_res);
FORY_TRY(enc, to_meta_encoding(enc_byte));
encoding = enc;
if (FORY_PREDICT_FALSE(!buffer.ensure_readable(len, error))) {
return Unexpected(std::move(error));
}
bytes.resize(len);
buffer.read_bytes(bytes.data(), len, error);
if (FORY_PREDICT_FALSE(!error.ok())) {
Expand Down
33 changes: 18 additions & 15 deletions cpp/fory/serialization/array_serializer.h
Original file line number Diff line number Diff line change
Expand Up @@ -145,11 +145,12 @@ struct Serializer<
return std::array<T, N>();
}

uint32_t length = size_bytes / sizeof(T);
if (length != N) {
ctx.set_error(Error::invalid_data("Array size mismatch: expected " +
std::to_string(N) + " but got " +
std::to_string(length)));
constexpr size_t expected_bytes = N * sizeof(T);
if (static_cast<size_t>(size_bytes) != expected_bytes) {
ctx.set_error(Error::invalid_data("Array byte size mismatch: expected " +
std::to_string(expected_bytes) +
" but got " +
std::to_string(size_bytes)));
return std::array<T, N>();
}

Expand Down Expand Up @@ -368,11 +369,12 @@ template <size_t N> struct Serializer<std::array<float16_t, N>> {
if (FORY_PREDICT_FALSE(ctx.has_error())) {
return std::array<float16_t, N>();
}
uint32_t length = size_bytes / sizeof(float16_t);
if (length != N) {
ctx.set_error(Error::invalid_data("Array size mismatch: expected " +
std::to_string(N) + " but got " +
std::to_string(length)));
constexpr size_t expected_bytes = N * sizeof(float16_t);
if (static_cast<size_t>(size_bytes) != expected_bytes) {
ctx.set_error(Error::invalid_data("Array byte size mismatch: expected " +
std::to_string(expected_bytes) +
" but got " +
std::to_string(size_bytes)));
return std::array<float16_t, N>();
}
std::array<float16_t, N> arr;
Expand Down Expand Up @@ -480,11 +482,12 @@ template <size_t N> struct Serializer<std::array<bfloat16_t, N>> {
if (FORY_PREDICT_FALSE(ctx.has_error())) {
return std::array<bfloat16_t, N>();
}
uint32_t length = size_bytes / sizeof(bfloat16_t);
if (length != N) {
ctx.set_error(Error::invalid_data("Array size mismatch: expected " +
std::to_string(N) + " but got " +
std::to_string(length)));
constexpr size_t expected_bytes = N * sizeof(bfloat16_t);
if (static_cast<size_t>(size_bytes) != expected_bytes) {
ctx.set_error(Error::invalid_data("Array byte size mismatch: expected " +
std::to_string(expected_bytes) +
" but got " +
std::to_string(size_bytes)));
return std::array<bfloat16_t, N>();
}
std::array<bfloat16_t, N> arr;
Expand Down
Loading
Loading