I've been enjoying zig myself quite a bit, I'm fairly confident I could do some larger projects in it (excluding comptime since it's missing features I sorely need for some of my current projects.) I like it a bit more than C/C++ in a lot of cases, I need it to be pushed just a tiny bit further before I can really dedicate effort towards large projects in it. I was even curious if I could implement the features I need myself (there is even a proposal already), but got the answer "Just don't." (I don't blame andrew, he already has a lot on his plate and I'm a stranger to him.) So I'm at the point of either waiting for the feature or forking it, haven't decided what I'm going to do though.
More on topic for the project though, did you have any project ideas for this? I think I could use this for my opencv node editor, I just did the naive method of marking all outputs dirty as their inputs nodes got reprocessed. I assume this would fix the potential problem of recomputing a node twice? I also see you mention "Cycle detection and cycle reporting." but what specifically happens if I do have a cycle? Does it just give up or is it something like best effort?
show comments
jitl
For something like task scheduling in a build system, if you use an up-front partition of the tasks like this, you may have a task that has all its dependencies fulfilled because tasks may have different durations, but still remains unscheduled.
For example, with this input:
$ cat data.file
root: parentA parentB
parentA: C D
parentB: E F
Asking the package tool for the sorted sets gives me:
$ ./zig-out/bin/toposort-cli --data data.file
Processing succeeded.
Topologically sorted sets: [ { C D E F } { parentA parentB } { root } ]
Topologically sorted list: [ C D E F parentA parentB root ]
Nodes: [ root parentA parentB C D E F ]
Dependency tree:
[ C D E F ]
C -> [ parentA ]
parentA -> [ root ]
root ->
D -> [ parentA ]
parentA -> [ root ]
root ->
E -> [ parentB ]
parentB -> [ root ]
root ->
F -> [ parentB ]
parentB -> [ root ]
root ->
If we follow the README's advice to parallel process set-by-set, we can easily have starvation: once `C` and `D` are finished, we are ready to start `parentA`, even if E and F are still work-in-progress.
How would I use the API of this package to detect and start the task `parentA` as soon as `C` and `D` are finished? I guess when a task is finished, I can ask for the list of dependent tasks, and for each of them, check if all of their dependencies are finished, and if so start the task. But this real-world need feels un-met; it feels odd to me to focus on the sorted sets instead of more practical scheduling.
That is kind of doing Kahn's algorithm iteratively during the build. It would be cool to try and optimize that for maximum performance.
show comments
emmanueloga_
For those that have not implemented toposort or don't remember it: 1) only directed graphs without cycles can be topologically sorted (DAGs) 2) there can be more than one topological order and 2) reverse post order of depth first traversal from every unvisited node shields a topological order.
In JavaScript:
function toposort(graph = { a: ['b', 'c'], b: ['d'] }) {
const order = [];
const visited = new Map();
function dfs(node) {
const status = visited.get(node);
if (status === 1) throw new Error("Cycle found.");
if (status === 2) return;
visited.set(node, 1); // In progress.
const adjacent = graph[node] ?? [];
for (const neighbor of adjacent) dfs(neighbor);
visited.set(node, 2); // Done.
order.unshift(node); // Reverse post order.
}
for (const node in graph) {
if (!visited.has(node)) dfs(node);
}
return order;
}
chubot
Hm this reminds me that the Python stdlib has grown a module for topological sorting fo a graph:
I haven't used it yet, I'd be curious if anyone has
show comments
tediousgraffit1
I really like this, this is the perfect size project for exploring a new piece of tech. I especially like that you implemented an actual cli and not just tests.
show comments
duped
> Generating dependence-free subsets for parallel processing.
Unsure how this is defined (*) but graph cutting approaches to concurrent task scheduling is both pessimistic (poor utilization of available resources) and (iirc) NP-hard, so you pay an big cost upfront.
On the other hand, if you know the indegree/outdegree of each node at the time they are visited (meaning the graph is static) you can run Kahn's algorithm concurrently and put each node into a shared work queue. You can optimize that further by having per-thread work queues and stealing between them. Depending on what the nodes represent there are even more optimizations and heuristics, concurrent task scheduling is a hard problem.
* imagine the graph
(a, b)
(a, c)
(b, d)
(c, d)
Is it possible to get nodes b and c in parallel "subsets" in your library?
show comments
hbbio
Congrats! I like Zig a lot even though I never implemented a full project with it.
I've been enjoying zig myself quite a bit, I'm fairly confident I could do some larger projects in it (excluding comptime since it's missing features I sorely need for some of my current projects.) I like it a bit more than C/C++ in a lot of cases, I need it to be pushed just a tiny bit further before I can really dedicate effort towards large projects in it. I was even curious if I could implement the features I need myself (there is even a proposal already), but got the answer "Just don't." (I don't blame andrew, he already has a lot on his plate and I'm a stranger to him.) So I'm at the point of either waiting for the feature or forking it, haven't decided what I'm going to do though.
More on topic for the project though, did you have any project ideas for this? I think I could use this for my opencv node editor, I just did the naive method of marking all outputs dirty as their inputs nodes got reprocessed. I assume this would fix the potential problem of recomputing a node twice? I also see you mention "Cycle detection and cycle reporting." but what specifically happens if I do have a cycle? Does it just give up or is it something like best effort?
For something like task scheduling in a build system, if you use an up-front partition of the tasks like this, you may have a task that has all its dependencies fulfilled because tasks may have different durations, but still remains unscheduled.
For example, with this input:
Asking the package tool for the sorted sets gives me: If we follow the README's advice to parallel process set-by-set, we can easily have starvation: once `C` and `D` are finished, we are ready to start `parentA`, even if E and F are still work-in-progress.How would I use the API of this package to detect and start the task `parentA` as soon as `C` and `D` are finished? I guess when a task is finished, I can ask for the list of dependent tasks, and for each of them, check if all of their dependencies are finished, and if so start the task. But this real-world need feels un-met; it feels odd to me to focus on the sorted sets instead of more practical scheduling.
That is kind of doing Kahn's algorithm iteratively during the build. It would be cool to try and optimize that for maximum performance.
For those that have not implemented toposort or don't remember it: 1) only directed graphs without cycles can be topologically sorted (DAGs) 2) there can be more than one topological order and 2) reverse post order of depth first traversal from every unvisited node shields a topological order.
In JavaScript:
Hm this reminds me that the Python stdlib has grown a module for topological sorting fo a graph:
https://docs.python.org/3/library/graphlib.html
I haven't used it yet, I'd be curious if anyone has
I really like this, this is the perfect size project for exploring a new piece of tech. I especially like that you implemented an actual cli and not just tests.
> Generating dependence-free subsets for parallel processing.
Unsure how this is defined (*) but graph cutting approaches to concurrent task scheduling is both pessimistic (poor utilization of available resources) and (iirc) NP-hard, so you pay an big cost upfront.
On the other hand, if you know the indegree/outdegree of each node at the time they are visited (meaning the graph is static) you can run Kahn's algorithm concurrently and put each node into a shared work queue. You can optimize that further by having per-thread work queues and stealing between them. Depending on what the nodes represent there are even more optimizations and heuristics, concurrent task scheduling is a hard problem.
* imagine the graph
(a, b) (a, c) (b, d) (c, d)
Is it possible to get nodes b and c in parallel "subsets" in your library?
Congrats! I like Zig a lot even though I never implemented a full project with it.
FWIW we had to build a related lib in TypeScript: https://github.com/okcontract/graph as part of the runtime of https://github.com/okcontract/cells
Curious: have you benchmarked it against any similar tools in other languages (like Go or Rust) for comparison?
Good job! This is really cool!