This PR implements `Hidden Classes`, I named them as `Shapes` (like Spidermonkey does), calling them maps like v8 seems confusing because there already is a JS builtin, likewise with `Hidden classes` since there are already classes in JS.
There are two types of shapes: `shared` shapes that create the transition tree, and are shared between objects, this is mainly intended for user defined objects this makes more sense because shapes can create transitions trees, doing that for the builtins seems wasteful (unless users wanted to creating an object with the same property names and the same property attributes in the same order... which seems unlikely). That's why I added `unique` shapes, only one object has it. This is similar to previous solution, but this architecture enables us to use inline caching.
There will probably be a performance hit until we implement inline caching.
There still a lot of work that needs to be done, on this:
- [x] Move Property Attributes to shape
- [x] Move Prototype to shape
- [x] ~~Move extensible flag to shape~~, On further evaluation this doesn't give any benefit (at least right now), since it isn't used by inline caching also adding one more transition.
- [x] Implement delete for unique shapes.
- [x] If the chain is too long we should probably convert it into a `unique` shape
- [x] Figure out threshold ~~(maybe more that 256 properties ?)~~ curently set to an arbitrary number (`1024`)
- [x] Implement shared property table between shared shapes
- [x] Add code Document
- [x] Varying size storage for properties (get+set = 2, data = 1)
- [x] Add shapes to more object:
- [x] ordinary object
- [x] Arrays
- [x] Functions
- [x] Other builtins
- [x] Add `shapes.md` doc explaining shapes in depth with mermaid diagrams :)
- [x] Add `$boa.shape` module
- [x] `$boa.shape.id(o)`
- [x] `$boa.shape.type(o)`
- [x] `$boa.shape.same(o1, o2)`
- [x] add doc to `boa_object.md`
This PR implements an optimizer, It currently implements the [constant folding optimization][cfo]. this optimization is responsible for "folding"/evaluating constant expressions.
For example:
```js
let x = ((1 + 2 + -4) * 8) << 4
```
Generates the following instruction(s) (`cargo run -- -t`):
```
000000 0000 PushOne
000001 0001 PushInt8 2
000003 0002 Add
000004 0003 PushInt8 4
000006 0004 Neg
000007 0005 Add
000008 0006 PushInt8 8
000010 0007 Mul
000011 0008 PushInt8 4
000013 0009 ShiftLeft
000014 0010 DefInitLet 0000: 'x'
```
With constant folding it generates the following instruction(s) (`cargo run -- -t -O`):
```
000000 0000 PushInt8 -128
000002 0001 DefInitLet 0000: 'x'
```
It changes the following:
- Implement ~~WIP~~ constant folding optimization, ~~only works with integers for now~~
- Add `--optimize, -O` flag to boa_cli
- Add `--optimizer-statistics` flag to boa_cli for optimizer statistics
- Add `--optimize, -O` flag to boa_tester
After I finish with this, will try to implement other optimizations :)
[cfo]: https://en.wikipedia.org/wiki/Constant_folding
I'm creating this draft PR, since I wanted to have some early feedback, and because I though I would have time to finish it last week, but I got caught up with other stuff. Feel free to contribute :)
The main thing here is that I have divided `eval()`, `parse()` and similar functions so that they can decide if they are parsing scripts or modules. Let me know your thoughts.
Then, I was checking the import & export parsing, and I noticed we are using `TokenKind::Identifier` for `IdentifierName`, so I changed that name. An `Identifier` is an `IdentifierName` that isn't a `ReservedWord`. This means we should probably also adapt all `IdentifierReference`, `BindingIdentifier` and so on parsing. I already created an `Identifier` parser.
Something interesting there is that `await` is not a valid `Identifier` if the goal symbol is `Module`, as you can see in the [spec](https://tc39.es/ecma262/#prod-LabelIdentifier), but currently we don't have that information in the `InputElement` enumeration, we only have `Div`, `RegExp` and `TemplateTail`. How could we approach this?
Co-authored-by: jedel1043 <jedel0124@gmail.com>
Small (ish?) step towards having proper realm records
This PR changes the following:
- Moves `Intrinsics` to `Realm`.
- Cleans up the initialization logic of our intrinsics to not depend on `Context`, unblocking things like #2314.
- Adds hooks to initialize the global object and the global this per the corresponding [`InitializeHostDefinedRealm ( )`](https://tc39.es/ecma262/#sec-initializehostdefinedrealm) hook. Though, this is currently broken because the vm uses `GlobalPropertyMap` instead of the `JsObject` API to initialize global properties.
Slightly related to #2411 since we need an API to pass module files, but more useful for #1760, #1313 and other error reporting issues.
It changes the following:
- Introduces a new `Source` API to store the path of a provided file or `None` if the source is a plain string.
- Improves the display of `boa_tester` to show the path of the tests being run. This also enables hyperlinks to directly jump to the tested file from the VS terminal.
- Adjusts the repo to this change.
Hopefully, this will improve our error display in the future.
This Pull Request fixes/closes #1987.
It changes the following:
- Add a prototype to the global object.
- Implement `__get_prototype_of__`, `__set_prototype_of__`, `__get_own_property__`, and `__own_property_keys__` for the global object
This is an attempt to refactor the environments to be more performant at runtime. The idea is, to shift the dynamic hashmap environment lookups from runtime to compile time.
Currently the environments hold hashmaps that contain binding identifiers, values and additional information that is needed to identify some errors. Because bindings in outer environments are accessible from inner environments, this can lead to a traversal through all environments (in the worst case to the global environment).
This change to the environment structure pushes most of the work that is needed to access bindings to the compile time. At compile time, environments and bindings in the environments are being assigned indices. These indices are then stored instead of the `Sym` that is currently used to access bindings. At runtime, the indices are used to access bindings in a fixed size `Vec` per environment. This brings multiple benefits:
- No hashmap access needed at runtime
- The number of bindings per environment is known at compile time. Environments only need a single allocation, as their size is constant.
- Potential for optimizations with `unsafe` https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get_unchecked
Additionally, this changes the global object to have it's bindings directly stored on the `Realm`. This should reduce some overhead from access trough gc objects and makes some optimizations for the global object possible.
The benchmarks look not that great on the first sight. But if you look closer, I think it is apparent, that this is a positive change. The difference is most apparent on Mini and Clean as they are longer (still not near any real life js but less specific that most other benchmarks):
| Test | Base | PR | % |
|------|--------------|------------------|---|
| Clean js (Compiler) | **1929.1±5.37ns** | 4.1±0.02µs | **+112.53%** |
| Clean js (Execution) | 1487.4±7.50µs | **987.3±3.78µs** | **-33.62%** |
The compile time is up in all benchmarks, as expected. The percentage is huge, but if we look at the real numbers, we can see that this is an issue of orders of magnitude. While compile is up `112.53%`, the real change is `~+2µs`. Execution is only down `33.62%`, but the real time changed by `~-500µs`.
Co-authored-by: Iban Eguia <razican@protonmail.ch>
This builds on top of #1758 to try to bring #1763 to life.
Something that should probably be done here would be to convert `JsString` to a `Sym` internally. Then, further optimizations could be done adding common strings to a custom interner type (those that we know statically).
This is definitely work in progress, but I would like to have feedback on the API, and feel free to contribute.
Co-authored-by: raskad <32105367+raskad@users.noreply.github.com>
It changes the following:
- Adjust the `context` methods `compile` and `execute` to avoid clones on `StatementList` and `CodeBlock`
Co-authored-by: raskad <32105367+raskad@users.noreply.github.com>
This Pull Request is part of #279.
It adds a string interner to Boa, which allows many types to not contain heap-allocated strings, and just contain a `NonZeroUsize` instead. This can move types to the stack (hopefully I'll be able to move `Token`, for example, maybe some `Node` types too.
Note that the internet is for now only available in the lexer. Next steps (in this PR or future ones) would include also using interning in the parser, and finally in execution. The idea is that strings should be represented with a `Sym` until they are displayed.
Talking about display. I have changed the `ParseError` type in order to not contain anything that could contain a `Sym` (basically tokens), which might be a bit faster, but what is important is that we don't depend on the interner when displaying errors.
The issue I have now is in order to display tokens. This requires the interner if we want to know identifiers, for example. The issue here is that Rust doesn't allow using a `fmt::Formatter` (only in nightly), which is making my head hurt. Maybe someone of you can find a better way of doing this.
Then, about `cursor.expect()`, this is the only place where we don't have the expected token type as a static string, so it's failing to compile. We have the option of changing the type definition of `ParseError` to contain an owned string, but maybe we can avoid this by having a `&'static str` come from a `TokenKind` with the default values, such as "identifier" for an identifier. I wanted for you to think about it and maybe we can just add that and avoid allocations there.
Oh, and this depends on the VM-only branch, so that has to be merged before :)
Another thing to check: should the interner be in its own module?
* Added CLEAN_JS and MINI_JS benches
These benches are arbitrary code which is subject to change,
functionally the programs are identical.
* Forgot semicolon
* Adding parsing benchmarks for CleanJS & MiniJS
Added identical benchmark cases for parser.
* Migrating Clean and Mini benchmarks to new format
Adding js scripts in bench_scripts & moving exec, full and parser
benchmarks to new include_str!() macro.