This is an attempt to refactor the environments to be more performant at runtime. The idea is, to shift the dynamic hashmap environment lookups from runtime to compile time.
Currently the environments hold hashmaps that contain binding identifiers, values and additional information that is needed to identify some errors. Because bindings in outer environments are accessible from inner environments, this can lead to a traversal through all environments (in the worst case to the global environment).
This change to the environment structure pushes most of the work that is needed to access bindings to the compile time. At compile time, environments and bindings in the environments are being assigned indices. These indices are then stored instead of the `Sym` that is currently used to access bindings. At runtime, the indices are used to access bindings in a fixed size `Vec` per environment. This brings multiple benefits:
- No hashmap access needed at runtime
- The number of bindings per environment is known at compile time. Environments only need a single allocation, as their size is constant.
- Potential for optimizations with `unsafe` https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked
Additionally, this changes the global object to have it's bindings directly stored on the `Realm`. This should reduce some overhead from access trough gc objects and makes some optimizations for the global object possible.
The benchmarks look not that great on the first sight. But if you look closer, I think it is apparent, that this is a positive change. The difference is most apparent on Mini and Clean as they are longer (still not near any real life js but less specific that most other benchmarks):
| Test | Base | PR | % |
|------|--------------|------------------|---|
| Clean js (Compiler) | **1929.1±5.37ns** | 4.1±0.02µs | **+112.53%** |
| Clean js (Execution) | 1487.4±7.50µs | **987.3±3.78µs** | **-33.62%** |
The compile time is up in all benchmarks, as expected. The percentage is huge, but if we look at the real numbers, we can see that this is an issue of orders of magnitude. While compile is up `112.53%`, the real change is `~+2µs`. Execution is only down `33.62%`, but the real time changed by `~-500µs`.
Co-authored-by: Iban Eguia <razican@protonmail.ch>
This builds on top of #1758 to try to bring #1763 to life.
Something that should probably be done here would be to convert `JsString` to a `Sym` internally. Then, further optimizations could be done adding common strings to a custom interner type (those that we know statically).
This is definitely work in progress, but I would like to have feedback on the API, and feel free to contribute.
Co-authored-by: raskad <32105367+raskad@users.noreply.github.com>
It changes the following:
- Adjust the `context` methods `compile` and `execute` to avoid clones on `StatementList` and `CodeBlock`
Co-authored-by: raskad <32105367+raskad@users.noreply.github.com>
This Pull Request is part of #279.
It adds a string interner to Boa, which allows many types to not contain heap-allocated strings, and just contain a `NonZeroUsize` instead. This can move types to the stack (hopefully I'll be able to move `Token`, for example, maybe some `Node` types too.
Note that the internet is for now only available in the lexer. Next steps (in this PR or future ones) would include also using interning in the parser, and finally in execution. The idea is that strings should be represented with a `Sym` until they are displayed.
Talking about display. I have changed the `ParseError` type in order to not contain anything that could contain a `Sym` (basically tokens), which might be a bit faster, but what is important is that we don't depend on the interner when displaying errors.
The issue I have now is in order to display tokens. This requires the interner if we want to know identifiers, for example. The issue here is that Rust doesn't allow using a `fmt::Formatter` (only in nightly), which is making my head hurt. Maybe someone of you can find a better way of doing this.
Then, about `cursor.expect()`, this is the only place where we don't have the expected token type as a static string, so it's failing to compile. We have the option of changing the type definition of `ParseError` to contain an owned string, but maybe we can avoid this by having a `&'static str` come from a `TokenKind` with the default values, such as "identifier" for an identifier. I wanted for you to think about it and maybe we can just add that and avoid allocations there.
Oh, and this depends on the VM-only branch, so that has to be merged before :)
Another thing to check: should the interner be in its own module?
* Added CLEAN_JS and MINI_JS benches
These benches are arbitrary code which is subject to change,
functionally the programs are identical.
* Forgot semicolon
* Adding parsing benchmarks for CleanJS & MiniJS
Added identical benchmark cases for parser.
* Migrating Clean and Mini benchmarks to new format
Adding js scripts in bench_scripts & moving exec, full and parser
benchmarks to new include_str!() macro.