No description
  • C++ 97.6%
  • Python 1.5%
  • CMake 0.8%
Find a file
MohamedElashri a6c232d5aa
Add sanad.lock.json and update GitHub Actions to use specific action versions
Authored-by: Mohamed Elashri <mail@elashri.com>
2026-06-20 11:12:55 +02:00
.github Add sanad.lock.json and update GitHub Actions to use specific action versions 2026-06-20 11:12:55 +02:00
cmake Complete re-write with Awkward compatibility WIP. 2026-06-20 10:50:24 +02:00
docs Update README and compatibility documentation 2026-06-20 11:00:23 +02:00
examples Complete re-write with Awkward compatibility WIP. 2026-06-20 10:50:24 +02:00
include Complete re-write with Awkward compatibility WIP. 2026-06-20 10:50:24 +02:00
scripts Add single header file usage and update docs. 2026-06-20 11:12:26 +02:00
tests Complete re-write with Awkward compatibility WIP. 2026-06-20 10:50:24 +02:00
.gitignore Complete re-write with Awkward compatibility WIP. 2026-06-20 10:50:24 +02:00
awkward.h Add single header file usage and update docs. 2026-06-20 11:12:26 +02:00
CMakeLists.txt Add single header file usage and update docs. 2026-06-20 11:12:26 +02:00
LICENSE Create LICENSE 2023-09-07 09:36:31 +00:00
Makefile Complete re-write with Awkward compatibility WIP. 2026-06-20 10:50:24 +02:00
README.md Add single header file usage and update docs. 2026-06-20 11:12:26 +02:00

jaggedcpp

jaggedcpp is a small header-only C++20 library for jagged arrays: arrays whose rows can have different lengths. Its primary interface is the Awkward Array-inspired ak::* API, built around immutable layouts, validated buffers, checked access, and simple CMake packaging.

Requirements

  • C++20 compiler.
  • CMake 3.16 or newer.
  • No third-party runtime or test dependencies.

Build And Test

cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build
ctest --test-dir build --output-on-failure

Enable AddressSanitizer and UndefinedBehaviorSanitizer with GCC or Clang:

cmake -S . -B build-sanitized -DCMAKE_BUILD_TYPE=Debug -DJAGGEDCPP_ENABLE_SANITIZERS=ON
cmake --build build-sanitized
ctest --test-dir build-sanitized --output-on-failure

Awkward compatibility work is tracked in docs/awkward-compatibility.md. Optional Python golden fixture generation is gated behind Awkward 2.9.1:

python scripts/generate_awkward_goldens.py --output-dir build/awkward-goldens
cmake -S . -B build-golden -DJAGGEDCPP_AWKWARD_GOLDEN_TESTS=ON
ctest --test-dir build-golden --output-on-failure

The Makefile is a thin wrapper around the same workflow:

make
make test

Existing jagged::Array<T> users can follow the migration guide to move to the primary ak::* API.

Installation & Usage

Because jaggedcpp is a header-only library, we distribute a single-header version that is incredibly easy to use.

1. Single Header (Easiest)

The absolute simplest way to use jaggedcpp is to download the awkward.h file from the root of this repository and drop it directly into your project's source tree.

You do not need to configure CMake or install anything system-wide.

#include "awkward.h"

int main() {
    auto array = ak::from_iter({1, 2, 3});
    // ...
}

Note for developers: If you modify the library's internal headers, you can regenerate this file by running python scripts/amalgamate.py.

2. FetchContent (Modern CMake)

If you prefer to manage dependencies via CMake, FetchContent will automatically download the library at configure time. Examples and tests are automatically disabled.

include(FetchContent)

FetchContent_Declare(
    jaggedcpp
    GIT_REPOSITORY https://github.com/scikit-hep/jaggedcpp.git
    GIT_TAG        main # or a specific version tag
)
FetchContent_MakeAvailable(jaggedcpp)

# Link to your target
target_link_libraries(my_target PRIVATE awkward::awkward)

3. Git Submodule & add_subdirectory

If you vendor your dependencies or use git submodules:

git submodule add https://github.com/scikit-hep/jaggedcpp.git third_party/jaggedcpp
add_subdirectory(third_party/jaggedcpp)

# Link to your target
target_link_libraries(my_target PRIVATE awkward::awkward)

4. System Install

You can build and install the package system-wide (or to a specific prefix) and find it using find_package:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build --prefix /path/to/my/prefix

Then in your CMakeLists.txt:

find_package(jaggedcpp CONFIG REQUIRED)
target_link_libraries(my_target PRIVATE awkward::awkward)

Note

: For backward compatibility, the library name remains jaggedcpp and provides a legacy jagged::jagged alias pointing to the same headers.

Basic Example

#include <awkward/awkward.hpp>

#include <iostream>
#include <vector>

int main() {
    const auto array = ak::from_iter({{1, 2, 3}, {}, {4, 5}, {6, 7}});

    std::cout << array.length() << '\n';
    std::cout << ak::to_list(array) << '\n';

    const auto values = ak::flatten(array);
    std::cout << ak::to_list(values) << '\n';

    const auto rebuilt = ak::unflatten(values, std::vector<std::size_t>{3, 0, 2, 2});
    std::cout << ak::to_list(rebuilt) << '\n';
}

Missing Values

The compatibility API supports missing primitive values and recursively nested missing values. Static from_iter construction covers the common flat and one-level forms; use ArrayBuilder for deeper mixed input:

#include <awkward/awkward.hpp>

auto array = ak::from_iter<int>({{1, ak::none}, {}, {ak::none, 4}});
auto mask = ak::is_none(array, 1);
auto dense = ak::fill_none(array, 0);
auto dropped = ak::drop_none(array);
auto padded = ak::pad_none(array, 3);

auto row_masked = ak::mask(
    ak::from_iter({{1, 2}, {3}, {4, 5}}),
    ak::from_iter({true, false, true}));

Records

Records preserve field order and can be projected by field name:

#include <awkward/awkward.hpp>

auto points = ak::zip({
    {"x", ak::from_iter({1, 2, 3})},
    {"y", ak::from_iter({10, 20, 30})},
});

auto xs = ak::field(points, "x");
auto with_labels = ak::with_field(points, ak::from_iter({"a", "b", "c"}), "label");
auto fields = ak::fields(with_labels); // {"x", "y", "label"}

When all zipped fields have matching one-level list structure, ak::zip creates records inside those lists. Use ak::ZipOptions{.depth_limit = 1} or ak::zip_no_broadcast to create top-level records that contain list fields.

Broadcasting And Elementwise Operations

The compatibility API supports scalar and matching ragged broadcasting for the implemented layouts:

#include <awkward/awkward.hpp>

auto array = ak::from_iter({{1, 2}, {}, {3}});
auto shifted = ak::add(array, 10);               // [[11, 12], [], [13]]
auto scaled = ak::multiply(array, ak::from_iter({10, 20, 30}));
auto selected = ak::where(ak::greater(array, 1), array, 0);
auto close = ak::isclose(ak::from_iter({1.0, 2.0}), ak::from_iter({1.0, 2.1}));
auto zeros = ak::zeros_like(array);

Missing values propagate through elementwise operations. Record field broadcasting is available when record fields match, including reordering fields with ak::broadcast_fields.

Buffers And Forms

Implemented layouts can be serialized to deterministic C++ form metadata plus typed buffers and reconstructed without Python dependencies:

#include <awkward/awkward.hpp>

auto array = ak::from_iter({{1, 2}, {}, {3}});
auto buffers = ak::to_buffers(array);
auto roundtrip = ak::from_buffers(buffers);

auto binary = ak::to_binary(buffers.buffers);
auto restored_buffers = ak::buffers_from_binary(binary);

Buffer keys are stable for a packed layout, such as node0-offsets and node1-data. ak::to_json(form) and ak::form_from_json(json) provide a deterministic dependency-free representation of the C++ form metadata. Form JSON uses schema version 1 and rejects missing, duplicate, and unknown members. The binary buffer format is also versioned and rejects malformed or trailing input.

Types And Metadata

array.type() and ak::type(array) return concrete recursive type objects; array.typestr() remains the compact string form. array.ndim() reports the layout depth, and checked scalar access can be wrapped as ak::Scalar through array.scalar_at(index) or ak::scalar(value).

Attrs, behavior placeholders, and named axes are immutable array metadata:

auto array = ak::from_iter({{1, 2}, {}, {3}});
array = ak::with_attrs(array, {{"source", "example"}});
array = ak::with_named_axis(array, "rows", 0);

Metadata is propagated by free array operations. Multi-array operations merge metadata and reject conflicting values instead of silently selecting one input's metadata.

ak::Value::is<T>() and get<T>() provide checked access to its exact variant alternative. A flat primitive layout can be read without copying through an exact physical-type view:

auto flat = ak::from_iter(std::vector<int>{1, 2, 3});
auto view = flat.view<int>();
for (int value : view) {
    // read-only access
}

ak::ReducerResult result = ak::sum_result(array);
if (const auto* scalar = std::get_if<ak::Scalar>(&result)) {
    // scalar reduction
} else {
    const auto& reduced = std::get<ak::Array>(result);
}

ak::sum and the existing reducers continue to return ak::Value for source compatibility; ak::sum_result and ak::mean_result expose the explicit std::variant<ak::Scalar, ak::Array> form.

Incremental Building

ak::ArrayBuilder incrementally constructs values when a compile-time C++ container type cannot describe the input. Lists, tuples, and records must be explicitly opened and closed:

#include <awkward/awkward.hpp>

ak::ArrayBuilder builder;
builder.begin_record();
builder.field("name");
builder.string("point");
builder.field("coordinates");
builder.begin_list();
builder.real(1.5);
builder.real(2.5);
builder.end_list();
builder.end_record();

auto array = builder.snapshot();

Each snapshot owns an immutable layout and is unaffected by later appends to the builder. A builder is mutable and requires external synchronization.

Strings

Strings use Awkward-compatible offset and UTF-8 byte buffers with string and char form parameters. String operations recurse through nested lists and preserve missing values:

#include <awkward/awkward.hpp>

auto words = ak::from_iter<std::string>({"One", "two words", "123"});
auto upper = ak::str::upper(words);
auto alphabetic = ak::str::is_alpha(words);
auto pieces = ak::str::split_pattern(words, " ");
auto rebuilt = ak::str::join(pieces, "-");

String classification, casing, slicing, and reversal are deterministic byte-level operations: ASCII follows the corresponding Python string behavior, while non-ASCII UTF-8 bytes are preserved by case transforms and rejected by classification predicates other than is_ascii. Slicing and reversal can split or reorder UTF-8 code units; Unicode code-point semantics remain future work.

Legacy API

jagged::Array<T> remains available for existing code, but new development should use ak::Array and the ak::* operations above. The compatibility API supports immutable layout sharing, records, options, unions, broadcasting, buffer interop, builders, and strings that are not represented by the legacy container. See Migrating from jagged::Array<T> for API mappings and behavioral differences.

Safety Contract

ak::Array shares an immutable ak::Content layout. Initial layout constructors validate offsets, regular sizes, and primitive content lengths. ak::is_valid and ak::validity_error expose the checked layout state.

jagged::Array<T> owns its data with standard containers. The public flat-storage constructor validates that offsets are non-empty, start at zero, are monotonic, and end at values.size().

All const methods are reentrant and safe for concurrent reads of the same object. Mutating methods, including ArrayBuilder append methods and legacy append_row, clear, reserve_values, and reserve_rows, require external synchronization.

values() and row() return non-owning std::span views for non-bool element types. These views are invalidated by later mutation of the same array. Array<bool> supports value-copy APIs such as to_rows, flatten, and at, but does not expose spans because std::vector<bool> is not a contiguous bool buffer.

Bad construction input throws std::invalid_argument. Bad row or column indexing throws std::out_of_range.

License

This project is licensed under the BSD 3-Clause License. See LICENSE for details.